Kubernetes 存储体系深度解析:PV、PVC 与 CSI
1. 存储抽象层次
Kubernetes 存储体系分为三层抽象:
1 2 3 4 5 6 7 8 9 10 11 12 13
| ┌─────────────────────────────────────────────────────────┐ │ 应用层(Pod) │ │ PersistentVolumeClaim(PVC) │ │ "我需要 10GB 的 SSD 存储" │ ├─────────────────────────────────────────────────────────┤ │ 抽象层 │ │ PersistentVolume(PV) │ │ "这里有一块 20GB 的 SSD" │ │ StorageClass(动态供给) │ ├─────────────────────────────────────────────────────────┤ │ 基础设施层 │ │ 实际存储:AWS EBS / GCE PD / NFS / Ceph / 本地磁盘 │ └─────────────────────────────────────────────────────────┘
|
2. PersistentVolume(PV)
PV 是集群级别的存储资源,由管理员预先创建或动态供给:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
| apiVersion: v1 kind: PersistentVolume metadata: name: pv-nfs-001 spec: capacity: storage: 100Gi volumeMode: Filesystem accessModes: - ReadWriteMany persistentVolumeReclaimPolicy: Retain storageClassName: nfs-storage mountOptions: - hard - nfsvers=4.1 nfs: path: /exports/data server: nfs-server.example.com
|
访问模式
| 模式 |
缩写 |
说明 |
| ReadWriteOnce |
RWO |
单节点读写(最常用) |
| ReadOnlyMany |
ROX |
多节点只读 |
| ReadWriteMany |
RWX |
多节点读写(需要 NFS/CephFS) |
| ReadWriteOncePod |
RWOP |
单 Pod 读写(1.22+) |
回收策略
| 策略 |
说明 |
| Retain |
保留数据,需手动清理 |
| Delete |
自动删除底层存储(云盘等) |
| Recycle |
已废弃,简单清空数据 |
3. PersistentVolumeClaim(PVC)
PVC 是用户对存储的申请:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
| apiVersion: v1 kind: PersistentVolumeClaim metadata: name: my-pvc namespace: default spec: accessModes: - ReadWriteOnce volumeMode: Filesystem resources: requests: storage: 10Gi storageClassName: fast-ssd selector: matchLabels: environment: production
|
PVC 绑定流程
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| PVC 创建 │ ▼ PV Controller 寻找匹配的 PV │ ├── 容量满足(PV >= PVC) ├── 访问模式匹配 ├── StorageClass 匹配 └── 标签选择器匹配 │ ▼ 绑定(PV.claimRef = PVC,PVC.volumeName = PV) │ ▼ PVC 状态变为 Bound
|
PVC 在 Pod 中使用
1 2 3 4 5 6 7 8 9 10
| spec: volumes: - name: data persistentVolumeClaim: claimName: my-pvc containers: - name: app volumeMounts: - name: data mountPath: /data
|
4. StorageClass(动态供给)
StorageClass 定义了存储的”类型”,支持动态创建 PV:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
| apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: fast-ssd annotations: storageclass.kubernetes.io/is-default-class: "true" provisioner: ebs.csi.aws.com parameters: type: gp3 iops: "3000" throughput: "125" encrypted: "true" kmsKeyId: "arn:aws:kms:..." volumeBindingMode: WaitForFirstConsumer reclaimPolicy: Delete allowVolumeExpansion: true mountOptions: - debug
|
volumeBindingMode
1 2 3 4 5 6 7
| Immediate(立即绑定): PVC 创建时立即绑定 PV 问题:PV 可能在不同可用区,Pod 无法调度
WaitForFirstConsumer(延迟绑定,推荐): 等到 Pod 被调度到节点后,再在同一可用区创建 PV 解决了跨可用区问题
|
5. CSI(Container Storage Interface)
CSI 是 Kubernetes 与存储插件之间的标准接口:
1 2 3 4 5 6 7 8
| kubelet / external-provisioner │ │ gRPC ▼ CSI Driver ├── Identity Service(身份信息) ├── Controller Service(创建/删除/快照卷) └── Node Service(挂载/卸载卷)
|
CSI 组件架构
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
| ┌─────────────────────────────────────────────────────────┐ │ Kubernetes 控制平面 │ │ external-provisioner external-attacher external-resizer│ │ │ │ │ │ └─────────┼────────────────────┼──────────────────┼────────┘ │ gRPC │ gRPC │ gRPC ┌─────────┼────────────────────┼──────────────────┼────────┐ │ CSI Driver(DaemonSet + Deployment) │ │ ┌──────────────────────────────────────────────────┐ │ │ │ CSI Controller Plugin(Deployment) │ │ │ │ - CreateVolume / DeleteVolume │ │ │ │ - ControllerPublishVolume(Attach) │ │ │ │ - CreateSnapshot / DeleteSnapshot │ │ │ └──────────────────────────────────────────────────┘ │ │ ┌──────────────────────────────────────────────────┐ │ │ │ CSI Node Plugin(DaemonSet) │ │ │ │ - NodeStageVolume(格式化/挂载到全局目录) │ │ │ │ - NodePublishVolume(bind mount 到 Pod 目录) │ │ │ └──────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────┘
|
常用 CSI 驱动
| 存储 |
CSI 驱动 |
| AWS EBS |
ebs.csi.aws.com |
| AWS EFS |
efs.csi.aws.com |
| GCE PD |
pd.csi.storage.gke.io |
| Azure Disk |
disk.csi.azure.com |
| Ceph RBD |
rbd.csi.ceph.com |
| CephFS |
cephfs.csi.ceph.com |
| NFS |
nfs.csi.k8s.io |
| Local |
kubernetes.io/no-provisioner |
6. 卷快照(VolumeSnapshot)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
| apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshotClass metadata: name: csi-aws-vsc driver: ebs.csi.aws.com deletionPolicy: Delete
---
apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshot metadata: name: my-snapshot spec: volumeSnapshotClassName: csi-aws-vsc source: persistentVolumeClaimName: my-pvc
---
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: restored-pvc spec: dataSource: name: my-snapshot kind: VolumeSnapshot apiGroup: snapshot.storage.k8s.io accessModes: - ReadWriteOnce resources: requests: storage: 10Gi
|
7. 卷扩容
1 2 3 4 5 6 7 8 9 10 11
|
kubectl patch pvc my-pvc -p '{"spec":{"resources":{"requests":{"storage":"20Gi"}}}}'
kubectl get pvc my-pvc -w
|
8. 本地存储(Local Volume)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
| apiVersion: v1 kind: PersistentVolume metadata: name: local-pv-node1 spec: capacity: storage: 500Gi volumeMode: Filesystem accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Delete storageClassName: local-storage local: path: /mnt/disks/ssd1 nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - node1
|
1 2 3 4 5 6 7
| apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: local-storage provisioner: kubernetes.io/no-provisioner volumeBindingMode: WaitForFirstConsumer
|
9. StatefulSet 存储实践
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
| apiVersion: apps/v1 kind: StatefulSet metadata: name: mysql spec: serviceName: mysql replicas: 3 selector: matchLabels: app: mysql template: spec: containers: - name: mysql image: mysql:8.0 volumeMounts: - name: data mountPath: /var/lib/mysql volumeClaimTemplates: - metadata: name: data spec: accessModes: ["ReadWriteOnce"] storageClassName: fast-ssd resources: requests: storage: 50Gi
|
StatefulSet 会为每个 Pod 创建独立的 PVC:
1 2 3
| mysql-0 → data-mysql-0 (PVC) → PV mysql-1 → data-mysql-1 (PVC) → PV mysql-2 → data-mysql-2 (PVC) → PV
|
10. 存储性能优化
10.1 选择合适的存储类型
1 2 3 4 5 6 7 8 9 10 11
| 高 IOPS 场景(数据库): AWS: gp3/io2 EBS GCP: pd-ssd 本地: NVMe SSD
高吞吐场景(大数据): AWS: st1 EBS / EFS 分布式: CephFS / GlusterFS
共享存储场景: NFS / CephFS / EFS(ReadWriteMany)
|
10.2 I/O 限制
1 2 3 4 5
|
parameters: iops: "3000" throughput: "125"
|
11. 常见问题排查
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
| kubectl describe pvc my-pvc
kubectl get pods -n kube-system | grep csi kubectl logs -n kube-system <csi-controller-pod>
kubectl get pv
kubectl get events --field-selector reason=ProvisioningSucceeded kubectl get events --field-selector reason=ProvisioningFailed
|
12. 总结
Kubernetes 存储体系的核心设计:
- 三层抽象:PVC(需求)→ PV(资源)→ 实际存储
- StorageClass:动态供给,按需创建存储
- CSI 标准接口:存储插件与 K8s 解耦
- WaitForFirstConsumer:解决跨可用区调度问题
- VolumeSnapshot:数据备份与恢复
- StatefulSet + PVC 模板:有状态应用的标准存储模式