Maintenance
It is possible to retire, exchange, and add individual devices, as well as entire storage nodes (servers) while a cluster is running. It is also possible to upgrade a server’s hardware, firmware or operating system within a running cluster or to upgrade the storage node software itself without any downtime.
The following section will explain the administrative functionality used to support maintenance operations via CLI. The equivalent functionality is available via API.
Remove a Storage Device (SSD)
A storage device can be hot-plug removed for physical examination or to retire it. Before doing so, it is recommended to soft-remove the device via the API or CLI.
This step will exclude the device from cluster operations before it is physically removed:
sbcli sn list-devices UUID
sbcli sn remove-device UUID
After this step the device can be removed.
Devices should not be removed from a cluster for a longer period without _failing _ them.
Total cluster performance is reduced as data from the missing device has to be recovered. In a cluster with n devices, minimum performance degradation (total IOPS output) is to be expected by about 1/n, but depending on the specific erasure coding schema used, the impact can be worse.
If a device is broken or of bad health, instead of removing it, it should be ‘failed’.
If a removed device is inserted into a cluster again it will be auto-recognized. As a note, the device needs to be inserted into the same storage node again. To re-activate a removed device after its physical insertion, use:
sbcli sn list-devices UUID
sbcli sn restart-device UUID
Once a removed device has been inserted and restarted, a data migration task is submitted to each storage node, which will cause some data to be migrated back to the inserted device. This is necessary to rebalance the cluster. Migration tasks are performed fully sequentially on a per-node basis.
Fail a Storage Device (SSD)
A device should be failed if it reaches its end-of-life or its health deteriorated otherwise. SMART device health checks are not yet automated (future release), but can be performed manually using the nvme-cli on the node to which the device is attached. Use:
nvme --smart-log-add /dev/nvme0n1
Also, if a device shows recurrent IO errors on write and/or read, it should be replaced. IO errors of a device can be seen and filtered from the cluster log:
sbcli cluster get-logs UUID
To fail a device use the following command:
sbcli sn fail-device UUID
If a device has transitioned into the failed
state, a background migration task is started and data stored on the device is re-distributed into the remaining cluster so that the cluster is fully rebalanced after device migration. It is critical that enough of free capacity is available in the cluster to successfully perform the migration. The remaining capacity has to be bigger as the utilized capacity of the device, otherwise the cluster will run full and all logical volumes will be switched into read-only mode! If needed, add a fresh device or node to the cluster before failing existing ones!
Depending on the size of the device, this can take some time. It is possible to list all current migration tasks of a cluster and their completion (or failure) state (failed tasks will be retried automatically). Migration tasks for multiple devices or of different types are always serialized per node! Only one migration task will be active at a point in time for each storage node!
sbcli cluster list-tasks
A failed device will not be auto-recognized as an existing device with data on it any longer. However, a failed device can be recognized as a new (empty) device on node restart and can be added to the cluster (see below).
Adding a new Device to the Cluster
Once devices are inserted into an existing storage node, which is part of the cluster, they will be auto-recognized and placed into the NEW state. They are not yet part of the cluster. To add them to the cluster, you need to run:
sbcli sn list-devices UUID
sbcli sn add-device UUID
Auto-recognition happens on every node restart or whenever a device scan is run. To run a device scan, use:
TBD !!!
Adding a new Storage Node to the Cluster
It is also possible to add an entire storage node to an active cluster at once. In such a case, the data migration service will add each device detected in the node one-by-one to the cluster (for each device, a data migration task is created and tasks are executed serially per node). Adding a node during cluster operation uses the same command as initial node deployment to an empty cluster:
sbcli sn add UUID --max-lvol MAXLVOL --max-snap MAXSNAP --max-prov MAXPROV
Specify the maximum amount of logical volumes to be provisioned to that node, as well as the maximum amount of active snapshots allowed and the maximum amount of storage provisioned in GB. The maximum allowed amount of these parameters depends on the available RAM in the system. If not enough of RAM is available, adding the node fails immediately.
Stopping a Storage Node for Maintenance Work
Storage nodes can be placed into maintenance mode to perform upgrades of hardware, firmware, operating system, or storage node software itself. To stop a storage node, use:
sbcli sn suspend UUID
sbcli sn shutdown UUID
In case a particular storage node hangs in shutdown or at restart, you may use the --force
flag:
sbcli sn shutdown UUID --force
Shutting down a storage node degrades the cluster redundancy and performance. Therefore, the planned outage time frame (or maintenance window) should be as short as possible.
If you are expecting a long outage, for example to wait for hardware to arrive, better remove the node from the cluster entirely (see below) and then add it back as a new node once ready.
In a cluster with n nodes, you must expect a performance (total IOPS output) degradation of 1/n. For example, in a cluster with 5 nodes, and a total IOPS output of 1,000,000, performance may be degraded by 20% to 800,000 IOPS max.
To restart a node after maintenance work, use:
sbcli sn restart UUID --max-lvol MAXLVOL --max-snap MAXSNAP --max-prov MAXPROV
Optionally, specify the maximum amount of logical volumes to be provisioned to that node as well as the maximum amount of active snapshots allowed and the maximum amount of storage provisioned in GB, if that parameters have changed since the node was added due to a hardware upgrade (RAM expansion).
Removing a Storage Node from a Cluster
It is possible to remove a node from a cluster. When a node is removed, all SSDs attached to the node are automatically failed and their data is re-distributed one-by-one into the cluster. For this purpose, migration tasks are created and executed serially (one-by-one) on each of the storage nodes in the cluster:
sbcli sn remove UUID