Deployment
A simplyblock deployment consists of three dependent parts to be deployed in the following order:
- Deploy the control plane (usually deploy once per organization or region, multiple storage clusters can connect) - the management plane can also be hosted by simplyblock
- Deploy disaggregated storage clusters (multiple k8s clusters and many hosts can be connected to one control plane)
- Deploy the CSI driver and (optionally) Caching Nodes and co-located storage nodes per k8s cluster
Simplyblock provides cli functions to deploy the control plane and create one or more cluster objects. Once the control plane is deployed and a cluster object has been created, cli and api functions can then be used to deploy disaggregated storage nodes into the cluster and connect them to the control plane.
To deploy storage clusters straight into kubernetes environments, a helm chart is available. This helm chart internally interacts with the control plane api for deployment and deploys both the co-located storage nodes, optional caching nodes and the CSI driver.
On top of the deployment functions, an auto-deployer is available. It can scratch-deploy entire environments (compute instances, virtual networking, control plane nodes, disaggregated storage cluster nodes, a k3s test cluster) using terraform and is currently tested for aws and gcp. The auto-deployer can separately deploy either control plane or a storage cluster or both: simplyblock auto-deployer.
Manual Control Plane Deployment
The control plane can be deployed on one or three nodes. Currently, these nodes must run RHEL 9 or the Rocky 9 equivalent. The following services run on those nodes:
Control Plane Services:
fdb-server
StorageNodeMonitor
MgmtNodeMonitor
CachingNodeMonitor
LVolStatsCollector
CachedLVolStatsCollector
PortStatsCollector
HAProxy
CapacityAndStatsCollector
CapacityMonitor
HealthCheck
DeviceMonitor
LVolMonitor
CleanupFDB
TasksRunnerRestart
TasksRunnerMigration
TasksRunnerFailedMigration
TasksRunnerNewDeviceMigration
NewDeviceDiscovery
WebAppAPI
TasksNodeAddRunner
Monitoring Services:
mongodb
opensearch
graylog
promagent
pushgateway
grafana
thanos
node-exporter
Ensure the following network ports are open on the hosts and into or from the subnet (the subnet can be shared only, if the management plane resides on the same AZ as the storage clusters):
The load balancer:
Direction | Source or target nw | ports | protocol |
---|---|---|---|
ingress | mgmt | 80 | tcp |
ingress | mgmt | 3000 | tcp |
ingress | mgmt | 9000 | tcp |
egress | all | all | all |
For Management Nodes:
Service | Direction | Source or target nw | ports | protocol |
---|---|---|---|---|
API (http/s) | ingress | loadbalancer | 80 | tcp |
SSH | ingress | mgmt | 22 | tcp |
Grafana | ingress | loadbalancer | 3000 | tcp |
Graylog | ingress | loadbalancer | 9000 | tcp |
EFS (aws only | ingress | internal subnet | 2049 | tcp |
Docker Swarm | ingress | storage clusters, internal s. | 2375, | tcp |
2377, | tcp | |||
7946, | tcp | |||
7946, | udp | |||
4789 | udp | |||
Graylog | storage clusters,internal s. | 12201, | tcp | |
12201 | udp | |||
fdb | storage clusters, internal s. | 4800, | tcp | |
4500 | tcp | |||
All traffic | egress | [0.0.0.0/0] | -1 | all |
Important: On AWS, the API, Grafana and Graylog are accessed via the load-balanced API gateway.
To deploy the management plane, first install the first node:
sudo yum -y install python3-pip
pip install sbcli-release --upgrade
sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
sudo sysctl -w net.ipv6.conf.default.disable_ipv6=1
sbcli cluster create
sbcli cluster list
The following important parameters may be used on cluster create
:
Parameter | Value Range | Description |
---|---|---|
–distr-ndcs | 1,2,4,8 | Data chunks per stripe. Min. amount of devices in cluster must be ndcs+2*npcs. Larger ndcs improves raw/effective storage ratio (ratio: ndcs/(ndcs+ndps)). But it will reduce performance on small (4K) random writes. If ndcs>1, local node affinity does not work. |
–distr-npcs | 0,1,2 | npcs=0: only for ephimeral storage. data is lost, if single node or device is lost. npcs=1: tolerates one node failure at a time. To avoid data loss, cluster will stop writing, if more than one node is down. npcs=2: tolerates up to two concurrent node failures. To avoid data loss, cluster will stop writing, if more than two nodes are down. To achieve single HA at least ndcs+2npcs nodes are required in case of npcs=1, and at least ndcs/2+npcs nodes are required in case of npcs=2. To achieve dual HA in case of npcs=2, at least ndcs+2npcs are required. |
–enable-node-affinity | on/off | If enabled, first data strip (chunk) is always placed on the node to which the logical volume is local. This reduces nw traffic on write and avoids nw traffic on read. |
–log-del-interval | 1d-355d | 7d per default, after retention period, container logs are emptied (AWS: backed up to S3). If increased, you must increase root disk size on management nodes. |
–metrics_retention_period | 1d-355d | 7d per default, after retention period hot statistics are removed from internal db and statistics (affects api, cli and grafanandashboards). If increased, you must increase root disk size on management nodes. |
–cap-warn | percentage | default: 92%. Will create repeated entries in cluster log and monitoring if reached. |
–cap-crit | percentage | default: 98%. Total cluster storage utilization. If reached, cluster stops writing. Data must be deleted or cluster must be expanded. |
–prov-cap-warn | percentage | default: 150%. Creates repeated warnings in cluster log, if reached: prov. capacity / total cluster size |
–prov-cap-crit | percentage | default: 200%. New lvol provisioning stopped, if reached. |
–ifname | linux dev name | default: eth0. Default interface of data traffic on storage nodes. |
To install the second and third control plane node (for a HA cluster):
sudo yum -y install python3-pip
pip install sbcli-release --upgrade
sbcli mgmt add <FIRST-MGMT-NODE-IP> <CLUSTER-UUID> eth0
To verify the management domain is up and running, retrieve the cluster secret (sbcli cluster get-secret
) and log into the following services using the external IP of one of the three nodes. To log in to Graylog, Grafana and Prometheus the username is admin
and the password is the cluster secret. For the API, the cluster UUID and the secret are required:
Service | Port | User | Secret |
---|---|---|---|
API (http/s) | 443 | uuid* | cluster secret |
grafana | 3000 | admin | cluster secret |
grafana | 3000 | uuid** | cluster secret |
graylog | 9000 | admin | cluster secret |
graylog | 9000 | uuid** | cluster secret |
HA-Proxy | 8404 | ||
*uuid of cluster created first | |||
**uuid of any additional cluster created - a separate login / access rights per cluster can be used for multi-tenancy (different tenants use the control plane for their clusters) |
All services are reachable via all (external) IPs of the management nodes.
Storage Plane (Cluster) Deployment
Storage nodes can be installed once the control plane is running. It is required to differentiate a co-located deployment (storage nodes running on k8s workers) or a disaggregated deployment. Combinations are possible (some nodes in the cluster on workers, others disaggregated).
The following ports must be opened within a storage cluster subnet (*for disaggregated nodes only, **for k8s co-located nodes only):
Service | Direction | Source or target nw | ports | protocol |
---|---|---|---|---|
Lvol Connect | ingress | compute hosts, storage cluster | 4420 | tcp |
sNode API | ingress | Management Nodes | 5000 | tcp |
SNode API | ingress | Management Nodes, storage cluster | - | ICMP |
SPDK Proxy | ingress | Management Nodes | 8080 | tcp |
Docker API | ingress | Management Nodes, storage cluster | 2375 | tcp |
Docker Swarm* | ingress | Management Nodes, storage cluster | 2377, 7946 | tcp |
Docker Swarm* | ingress | Management Nodes, storage cluster | 7946, 4789 | udp |
k8s node communication** | ingress | storage cluster | 10250 | tcp |
DNS resolution from worker nodes** | ingress | storage cluster | 53 | tcp |
UDP traffic on ephemeral ports** | ingress | storage cluster | 1065 –> 65535 | tcp |
SSH | ingress | mgmt | 22 | tcp |
All traffic | egress | [0.0.0.0/0] | -1 | udp |
Storage Nodes can be provisioned automatically in AWS against an existing control plane within the same account using the provided API Service.
Storage nodes currently require RHEL 9 or Rocky 9. The storage node installation consists of two parts. First, storage nodes are prepared for installation:
sudo yum -y install python3-pip
pip install sbcli-release --upgrade
sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
sudo sysctl -w net.ipv6.conf.default.disable_ipv6=1
sbcli sn deploy
Note the IP address and port on which the storage node (sn) is listening. Then from one of the management nodes, storage nodes must be added to the cluster via the cli:
sbcli sn add
Description of parameters:
Service | Description | Example |
---|---|---|
cluster_id | Cluster-UUID | 8ce9b324-d3dc-488b-ad90-e88ec7e05ca3 |
node_ip | Storage node IP and mgmt port (SNodeAPI listener) | 172:168.0.5:5000 |
ifname | Network interface to use for mgmt and data | eth0 |
–number-of-distribs | default 4: Number of worker threads for the node, depends on node size (recommended: 4-6) | 4 |
–max-prov | Maximum amount of GB to be provisioned via that node | 5000 |
–max-lvol | Maximum lvols per node (affects memory demand) | 100 |
–max-snap | Default: 500 - Maximum snapshots per node | 1000 |
–cpu-mask | Default: All cores but 0; Core mask used to start storage node | 0xFE |
–partitions | Default 1: Partitioning of devices for performance reasons. Change only for large disks. | 2 |
–data-nics | If your storage nw is on separate virtual or physical ports from your mgmt nw, list them here. | eth1, eth2 |
max-prov
, max-lvol
and max-snap
and is calculated when the node is added. It is recommended to reserve a reasonable amount of system memory as huge page memory early on after starting/rebooting an instance hosting a storage node, because it is possible that at the time the storage node is added, not enough of huge page memory can be claimed even if enough of system memory is available. This is due to system-internal memory fragmentation issues. Use the following command to reserve 6G of system memory for huge pages, which is the recommended minimum:
sysctl vm.nr_hugepages=3072
See here for memory requirements memory requirements.cpu mask
Currently, the following amount of cores in the mask is supported per node: 4,5,7,9,11,13,15,17,19,21,23. If more cores are assigned in the core mask, the next lower number from this table will be used.
Remarks:
- avoid assigning core 0 to simplyblock, it should stay with the operating system
- if the host or vm is used exclusively as a storage node (disaggregated model), all but core 0 can be assigned to the simplyblock storage node
- if you run on x86, it is recommended to turn off hyper-threading (pair hyper-threads into physical cores on the linux operating system) before starting a storage node
Example of core mask (assign 5 cores, leave out core 0): 0x3E (or 111110)
If it is not feasible to turn off hyper-threading, because simplyblock runs co-located with other pods on kubernetes workers, it is recommended to define the core mask so that hyper-threads, which belong to the same core, are all included into the simplyblock core mask. For example, if your system provides 24 hyper-threads (12 physical cores) and you want to assign 7 hyper-threads to simplyblock, use the following mask:
0111 1000 0000 1110 0000 0000 (0xC80E00)
Simplyblock on Linux
To connect storage volumes to Linux hosts, no specific drivers or admin software is required. NVMe over TCP is available on all current Linux distributions.
For exact pre-requisites and a compatibility matrix, see “Using simplyblock on Linux Hosts”.
Simplyblock on Windows Server and VMWare
NVMe over TCP will be natively supported on Windows Server 2025/vNext. For earlier versions of Windows Server, StarWind provides the necessary initiator implementation.
VMWare supports NVMe over TCP in vSphere 8.0 or later. For more information see the VMware documentation.
CSI Driver and Caching Node Deployment
The repository with the driver and documentation can be found here: https://github.com/simplyblock-io/spdk-csi/tree/master/docs and here: docs.