Kubernetes CSI
High-Level CSI Driver Architecture
** Controller Plugin:** Runs as a Deployment and manages volume provisioning and deletion.
Node Plugin: Runs as a DaemonSet and handles volume attachment, mounting, and unmounting.
Sidecars: Handle tasks like external provisioning (csi-provisioner
), attaching (csi-attacher
), and monitoring
(csi-node-driver-registrar
).
Finding CSI Driver Logs for a Specific PVC
- Identify the Node Where the PVC is Mounted
Get the pod name using the persistent volume claim
kubectl get pods -A -o \ jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.volumes[*].persistentVolumeClaim.claimName}{"\n"}{end}' | \ grep <PVC_NAME>
Find the node the pod is bound tokubectl get pods -A -o \ jsonpath='{range .items[*]}{.spec.nodeName}{"\t"}{.spec.volumes[*].persistentVolumeClaim.claimName}{"\n"}{end}' | \ grep <PVC_NAME>
- Find the CSI driver pod on that node
Find the CSI driver pod
kubectl get pods -n <CSI_NAMESPACE> -o wide | grep <NODE_NAME>
- Get Logs from the node plugin
Get the CSI driver pod logs
kubectl logs -n <CSI_NAMESPACE> <CSI_NODE_POD> -c <DRIVER_CONTAINER>
Troubleshooting NVMe-Related Errors
If the error is NVMe-related (e.g., volume attachment failure, device not found), follow these steps.
-
Ensure that
nvme-cli
is installedsudo dnf install -y nvme-cli
sudo apt install -y nvme-cli
-
Verify if the nvme-tcp kernel module is loaded
Check NVMe/TCP kernel module is loadedlsmod | grep nvme_tcp
If not available, the driver can be loaded temporarily using the following command:
Load NVMe/TCP kernel modulesudo modprobe nvme-tcp
However, to ensure it is automatically loaded at system startup, it should be persisted as following:
echo "nvme-tcp" | sudo tee -a /etc/modules-load.d/nvme-tcp.conf
echo "nvme-tcp" | sudo tee -a /etc/modules
-
Check NVMe Connection Status
Check NVMe-oF connectionsudo nvme list-subsys
If the expected NVMe subsystem is missing, reconnect manually:
Manually reconnect the NVMe-oF devicesudo nvme connect -t tcp \ -n <NVME_SUBSYS_NAME> \ -a <TARGET_IP> \ -s <TARGET_PORT> \ -l <CTRL_LOSS_TIMEOUT> \ -c <RECONNECT_DELAY> \ -i <NR_IO_QUEUES>
-
If the issue persists, gather kernel logs and provide them to the simplyblock support team:
Collect logs for supportsudo dmesg | grep -i nvme