Storage Pools and Volumes

Storage Pools and Volumes

Working with Pools

Pools are logical constructs, which group logical volumes (lvols) and limit them into total and per lvol provisionable capacity, as well as total IOPS and throughput (read and write speeds in mb/s).

ℹ️

To create logical volumes and to connect to storage clusters from k8s environments, at least one pool is required. One pool is linked to each Kubernetes storage class.

Different classes can specify different pools or the same pool, but one class can specify only one pool.

Adding a Pool

Create a pool with 1,000 GB of total provisioning capacity (total provisioned size of all lvols in pool may not exceed this capacity):

sbcli pool add --pool-max 1000G pool01

To list pools, use:

sbcli pool list

Create a pool with assigned quotas:

sbcli pool add --pool-max 2000G --lvol-max 200G  --lvol-max 1000G --max-iops 250000 --max-r-mbytes 1000 --max-w-mbytes 1000 pool02
ℹ️
The quotas can only be applied at time of provisioning. The sum of provisioning quotas of all the logical volumes in the pool may not exceed the pool provisioning quotas.

Changing Pool Quotas

It is possible to change pool quotas later on. The change will only impact provisioning of additional volumes, not already provisioned ones. It is not possible to lower the current quotas below actually provisioned lvol quotas.

sbcli pool set --pool-max 500G --lvol-max 100G --max-iops 100000 --max-r-mbytes 500 --max-w-mbytes 500 8ce9b324-d3dc-488b-ad90-e88ec7e05ca3

Disabling a Pool

It is not possible to provision logical volumes into disabled pools. Existing lvols in disabled pools continue functioning. Example:

sbcli pool disable 8ce9b324-d3dc-488b-ad90-e88ec7e05ca3

Deleting a Pool

To delete a provisioned pool, use pool delete, for example:

sbcli pool delete 8ce9b324-d3dc-488b-ad90-e88ec7e05ca3
ℹ️
It is only possible to delete empty pools (no lvols or snapshots).

Working with Volumes

Introduction

Logical volumes are the provisioning entities of block storage. They are exposed as NVMe-oF volumes to hosts. The lvol CLI is used to manually provision and manage host (server) storage. Storage provisioning and lifecycle management may also alternatively be automated via the API. Simplyblock uses its own API to provide storage lifecycle automation for kubernetes environments.

Volumes are provisioned within the provisioning capacity and QoS limits of a pool, unless the pool is set up without such limits.

Managing logical volumes can be performed both via CLI (from within the management container) or API.

Depending on their ha-type, a logical volume is either accessible via a single storage node or does it support multi-pathing (accessible via three storage nodes).

⚠️
The underlying data of each lvol is always equally distributed across all cluster nodes. If a node to which a single-mode lvol is attached goes offline, the volume can be moved instantly to another node. This transition will cause a short interruption of IO (disconnect and connect), but will not impact data.

Provisioning a Volume

Volumes are provisioned with a number of parameters and options:

  • name: Unique user-defined name for the logical volume (lvol)
  • pool: The pool in which the lvol will be created
  • size: The thin provisioning capacity. E.g. 2500M, 250G. The volume will not use or reserve this capacity immediately, but is allowed to allocate this space over time.
⚠️
The following parameters can have significant impact on the degree of data protection and performance and should be chosen in dependence of application >requirements (performance, data protection) and cluster size (number of devices, number of nodes). For more information see Redundancy for more details!
  • distr-ndcs: Stripe size. The number of data chunks in the stripe. Usually 1, 2, 4, 8 or 16. The size chosen has an important impact on data storage efficiency (raw to effective data ratio), performance and cluster sizing requirements (number of devices in cluster, number of nodes).

  • distr-ndps: Parity chunks in stripe. This can be set to 0, 1 or 2. 2 protects the data of the lvol and its availability from two concurrent device or node failures. Full node failures can only be compensated if there are a sufficient amount of nodes in the system. To compensate a single node failure, a minimum of 3 nodes are required. To compensate a concurrent or overlapping two-node failure, at least 4 nodes are required.

  • distr-bs : Block size of the logical volume - can be 4096 or 512 (default: 4096)

  • distr-chunk-bs: Chunk size of the logical volume - can be 512, 4096 or a multiple - it should be either a multiple of block size or dividable by block size

  • ha-type: single or ha.

  • encrypt: Encryption of individual logical volumes with a separate private key (not shared between volumes) - the key is cached in memory on storage nodes, but not stored on nvme disks (rather, it is stored in encrypted form in the key value store in management containers). If this option is chosen, it is necessary to specifically the key in two parts: use --crypto-key1 and --crypto-key2.

The following parameters are mandatory, if they are set on pool level. They limit the performance of the lvol to avoid negative effects on the cluster:

  • max-rw-iops: Maximum rw iops for the lvol. The limit can be set with a granularity of 200 IOPS.
  • max-r-mbytes: Maximum read throughput in MB/s for the lvol.
  • max-w-mbytes: Maximum write throughput in MB/s for the lvol.
  • max-rw-mbytes: Maximum read and write throughput in MB/s for the lvol.

Please note, that each primary and secondary lvol instance requires a certain amount of memory. This amount consists of a fixed and a variable part. The variable part depends on the utilized (not provisioned!) size of the lvol. When adding a node, users must specify the maximum amount of lvols that can be provisioned to that node. At startup of a node, it is checked if enough of memory is available to create and fully utilize the specified maximum amount of lvols on the node. If this is not the case, the startup will fail. However, as this check is based on some assumptions and memory could also be allocated outside simplyblock, out-of-memory situations on adding or using lvols may occur, if not enough of reserve is planned for. For calculations, please see Deployment Requirements.

Example:

sbcli lvol add --max-rw-iops 25000 --max-r-mbytes 3500 --max-w-mbytes 1500 \
      100G mylvol_01 ksi_pool_01 

To liste existing volumes with parameters, use:

sbcli lvol list 

To get all details of a volume, use:

sbcli lvol get UUID 

To get the capacity of a volume, use:

sbcli lvol get-capacity UUID 

To get the io statistics of a volume, use:

sbcli lvol get-io-stats 1e040dc6-188c-4864-b67a-19ac2b424bb7

Out-of-the-box, a scheduler selects the storage node from which the logical volume will be served (and two additional nodes for fault tolerance in case of ha-type). It is possible, to instantly re-locate a lvol between storage nodes later on.

⚠️

Logical volumes will be automatically published to the NVMe/TCP fabric and can be immediately connected from any remote host, if the NVMe over TCP port (4420 by default) is reachable.

Connected lvols will be placed automatically into online state. They will be switched into offline state, if they are of type single and the node to which they are attached goes offline or if a general IO error for the lvol is reported back (this is usually only the case, if the node got disconnected from the storage network or the whole cluster became dysfunctional).

The following command creates an NVMe-oF connect string for linux, which can be copy-pasted and used at the host to remotely connect the lvol to the host:

sbcli lvol connect d38199f5-68f5-429b-abc6-6d56bdb09e2f

For connecting NVMe/TCP volumes to VMWare, see the VMware documentation.

Logical volumes can be resized (but not below the currently allocated space):

sbcli lvol resize ec7701e9-4fe9-49a9-8d8c-80152bb98c8f 50G

Changes to QoS parameters can be applied (quotas can only be increased in accordance with remaining free quotas on the pool):

sbcli lvol set-qos ---max-rw-iops 15000 ec7701e9-4fe9-49a9-8d8c-80152bb98c8f

To migrate a volume from one host to another one, use: The following command retrieves the current provisioned capacity:

sbcli lvol move UUID target-node-UUID

Logical volumes can finally be deleted:

sbcli lvol delete ec7701e9-4fe9-49a9-8d8c-80152bb98c8f
⚠️
If you delete a lvol, all data will be gone without means of recovery. If the lvol has active snapshots, it will only be soft-deleted though. Storage allocation of the volume is not released until all active snapshots have been deleted.

Working with Snapshots

Snapshots are a read-only point-in-time copies of a lvol. They are taken instantly.

⚠️

Snapshots do not require additional disk space when taken, but each write to the base lvol after the snapshot was taken will result in additional demand in disk space. The total allocated size of a lvol and all of its snapshots must fit into its provisioned size of a lvol!

When a snapshot is created, it will allocate the identical size as the current base lvol, while the allocated size of the base lvol will be reset to 0. If snapshotted data is overwritten on the base lvol, it’s allocated size will start to grow again. The allocated size of the snapshot will remain constant.

Active snapshots also consume some main memory and CPU, so the amount of active snapshots per lvol can be limited when a storage node is added or restarted. The system will not allow to take snapshots when this limit is reached for a particular lvol or the system runs too low on main memory.

It is possible to list snapshots of a logical volume, delete them (only, if there are no active clones and this is the last available snapshot that was taken), restore them (set back the logical volume to the data state of the snapshot) and take clones of them.

Create a snapshot:

sbcli lvol create-snapshot ec7701e9-4fe9-49a9-8d8c-80152bb98c8f snap-23-06-05.134455

List all snapshots:

sbcli snapshot list ec7701e9-4fe9-49a9-8d8c-80152bb98c8f 

Deleting a snapshot is almost instant for the latest snapshot only. Otherwise, this operation may require to merge the predecessor of the snapshot with its successor, which can take time and disk resources:

sbcli snapshot delete 01235625-7172-4174-a8db-af570af67541

Working with Clones

Clones are logical volumes, which were created as a copy-on-write copy from existing snapshots.

At their creation time they are of zero size and grow only once new data is written to them (they start to deviate from the snapshot they were taken from).

They inherit all parameter options from their parent snapshot (indirectly from the volume the snapshot was taken of):

sbcli snapshot create-clone 01235625-7172-4174-a8db-af570af67541 clone_001_2306051433

Clones are otherwise treated as regular logical volumes. Commands to list logical volumes, to resize them or change QoS parameters will include or work with clones. The provisioning size of a clone must fit into the remaining free provisioning capacity of a pool. Otherwise, the creation fails.

Deleting clones is only possible once all snapshots taken from that clone have been deleted.

It is also possible to inflate clones. This creates a copy of the logical volume, which is fully independent of the originating snapshot / volume. This also duplicates the actual space allocation:

sbcli lvol inflate-clone 1e040dc6-188c-4864-b67a-19ac2b424bb7