Simplyblock Operations Manual

Simplyblock Operations Manual

Introduction

Simplyblock is a high-performance, distributed storage orchestration layer built for cloud-native environments. It supplies NVMe over TCP (also known as NVMe/TCP) block storage to hosts and file and block storage to containers by the Simplyblock CSI driver.

Simplyblock runs on all hosts and platforms supporting NVMe over TCP, including all major Linux distributions and VMWare; No additional host-side drivers or other software needs to be installed.

Simplyblock is also tightly integrated into the kubernetes ecosystem and the Simplyblock CSI driver provides volume (PVC) life-cycle management with features such as provisioning, resizing, snapshotting and instant (copy-on-write) cloning, all of which are instant operations.

Distributed and Tiered Storage

The simplyblock storage interface provides access to storage via NVMe over TCP and file systems. The underlying data is sliced, protected and distributed across nodes with local instance storage (SSDs) connected into storage clusters (hot tier), cloud block storage volumes (warm tier: SSD) and reliable object storage or other capacity storage (High-density HDD pools) for cold data, backups and asynchronous replication.

Cloud Agnosticity

Simplyblock is designed and built as a cloud-agnostic product, which aims to run on hyperscalers (the major clouds) as well as other cloud providers including regional or specialized providers, bare metal provisioners and private clouds, including virtualized and bare-metal kubernetes environments. There are only a few cloud-specific integration points and they are implemented using a plug-in architecture and auto-detection of the specific cloud environment. Simplyblock has currently been tested on aws with dedicated SLAs for performance (iops, bandwidth, access latency) and reliability (data protection, availability).

Containerization

Simplyblock is a fully containerized solution. It consists of

  • storage nodes: A storage node is a container stack, which provides distributed data services accessed via via NVMe over TCP, and connects to other storage nodes to form storage clusters.
  • management nodes: A management node is a container stack, which provide control and management services. A replicated set of management nodes is referred to as control plane.

A storage cluster exports a a highly performant, scalable (both in terms of capacity and performance) and reliable block storage pool. From this pool, thinly provisioned virtual block storage volumes are “carved out” and provisioned. They can be provisioned with different capacity and performance characteristics and are accessed from hosts both locally or over the network as NVMe over TCP volumes or namespaces.

Multiple storage clusters can be connected to a single control plane.

Platform Support

Storage and management nodes can be deployed on both VMs and bare metal. Storage nodes run on both on x86 and arm, the control plane requires x86 instances.

Deployment Options

Both storage nodes and management nodes can be installed on plain linux instances, including RHEL 9, Rocky 9 and higher and AL 2 and higher. In this case, all pre-requisites such as python3 and docker are auto-deployed as part of the automated node deployment.

Simplyblock offers a cli, which can be boostrapped using pip, to deploy individual storage and management nodes and entire clusters. Once the cli is installed, a deploy function prepares the node (installs pre-requisites and dependencies), while an add function, adds and configures the respective container stack.

On top of it, a separate auto-deployer implements full stack provisioning of simplyblock clusters based on terraform and this API.

Storage nodes can also be deployed into kubernetes workers using our helm chart. These workers have a requirement towards the minimum linux kernel version (5.1x) and the nvme-tcp kernel module must be loaded (modprobe nvme-tcp).

Storage node deployment to plain linux is used for disaggregated storage, while deployment into kubernetes is used to co-locate storage nodes with compute workloads.

Storage Cluster - Basics

A simplyblock storage cluster consists of at least three storage nodes and exports a performant, reliable, highly available and scaleable shared data storage pool. Single or dual node outages and storage drive failures do not impact availability and internal data migration ensures continuous re-balancing of the cluster in case of expansion, drive failure and outages of nodes.

Individual virtual volumes can be (thinly) provisioned, de-provisioned, resized, snapshotted and cloned from this shared pool using the CLI or API. Resizing, snapshotting and cloning are instant operations. Virtual volumes can be organized in multiple virtual storage pools (tenants for hierarchical management) and receive quotas (iops, throughput, capacity).

QoS is a feature to classify virtual volumes in terms of latency and IO priority.

Distributed Data Placement and Protection

Simplyblock is based on a statistical data placement algorithm in combination with a meta-data journal to distribute, locate, collect and re-distribute individual data segments.

Simplyblock uses erasure coding to protect data. Erasure coding slices data segments into equally-sized chunks and calculates so called parity to create redundancy. Individual data and parity chunks are then distributed to different devices and nodes in the cluster using placement policies.

If data segments become unavailable due to a node outage or is lost because of a permanent drive failure, they can be rebuilt on the fly from other data segments in the same stripe.

Simplyblock currently supports n+k erasure coding schemas with n=1,2,4 or 8 and k=0,1 or 2 (n is the number of data chunks per stripe and k is the number of parity chunks). The RDP algorithm provides an efficient implementation of the k=2 case.

Performance Density and Scalability

Simplyblock combines very high performance density of storage (low access latency, high IOPS per TiB) with very high scalability.

Simplyblock can achieve access latency close to the performance of a local nvme (less than 100 us), while with a single cpu core we produce up to about 200.000 IOPS.

At the same time it is possible to add a nearly arbitrary amount of storage nodes and nvme devices while achieving nearly linear scalability of capacity and performance. This means that both access latency and IOPS/TiB remain nearly constant when growing a cluster.

In this sense, Simplyblock combines the high performance density of a high-end SAN system with the ultra-high scalability of software-defined storage systems such as ceph (block, file, object) or lustre (distributed file system).

Management Services

Simplyblock Management nodes provide both a CLI and an API to create and manage storage clusters, storage nodes, devices, logical volumes, snapshots and pools. All service functions are available via both CLI and API.

Kubernetes controllers monitor node activity and take action, e.g. to restart storage nodes. In addition, a number of pre-defined k8s jobs wrap api service calls. The management nodes also provide storage node and cluster health monitoring and repair services, log management (greylog), IO statistics, event management and alerting (prometheus) and Grafana dashboards.

Multi-Attach

Multi-Attach is a feature, which allows to attach the same nvme-oF volume to multiple instances. This can be very important to implement high-availability of database or application services efficiently. Instead of replicating data between a pair of HA instances, the same volume is attached to both of them. The secondary instance is only allowed to read from the volume, while the first instance is active. If the secondary instance takes over (leadership acquisition), it can start to write to the volume as well.

QoS

QOS is a feature, which allows the prioritization of IO within an entire cluster. At the moment, QOS supports two priority classes to which a logical volume can be assigned: fast and standard. IO in the fast class will be preferred and as long as the amount of fast class IO does not reach the performance saturation of a cluster, it’s access latency is close to the theoretical optimum.

Roadmap Features of Simplyblock Storage (October 2024)

While the current release of simplyblock covered within this documentation entirely focuses on clusters consisting of nodes with local nvme instance storage (hot tier), simplyblock is designed to do significantly more.

Asynchronous Replication to cold storage

Object storage (S3 and so on) and HDD capacity storage are about 4 to 5 times cheaper per TB than hot storage.

Moreover, they are available or can be deployed using advanced data protection. This includes very high data durability with cross-zone protection to survive zone-level disaster as well as full delete and write protection over a defined period, a critical feature to protect data from cybersecurity attacks.

Simplyblock provides a crash-consistent, cluster-level asynchronous replication feature, which replicates the entire write stream or parts of it to a cold storage tier while leveraging a two-level cache (in memory and nvme) and massive parallelism to write at high sustained throughput. Stale data blocks are marked for removal and data is compacted in regular intervals to prevent infinite growth.

This features allows for crash-consistent point-in-time recovery of data after a loss or a cybersecurity attack (encryption event).

Storage Tiering

Storage tiering moves data segments between the hot tier and the cold tier based on data access frequency using smart eviction to expand the amount of data required. This way, the demand for hot storage can often be reduced by up to 90%. In addition, data can also be written straight through to the cold tier based on retention and tiering policies.

Cloud Block Storage Volume Pooling

Instead of using local instance storage, it will also be possible to pool Cloud Block Storage Volumes attached to different nodes in a storage cluster. In AWS ec2, these are typically gp2 and gp3 volumes.

This way users, who do not want to use local instance storage, but prefer scale-from-zero flexibility of cloud block storage, will also benefit from thin provisioning of capacity, iops and throughput, instant resize, snapshots and volume cloning, volume multi-attach, increased scalability of the amount of volumes per instance and the total capacity and performance of a single volume, increased volume reliability, storage tiering and asynchronous replication across zones.