Skip to content

25.6

Simplyblock is happy to release the general availability release of Simplyblock 25.6.

New Features

Simplyblock strives to provide a strong product. Following is a list of the enhancements and features that made it into this release.

  • General: Renamed sbcli to sbctl. The old sbcli command is deprecated but still available as a fallback for scripts.
  • Storage Plane: Increased maximum of available logical volumes per storage node to 1,000.
  • Storage Plane: Added the option to start multiple storage nodes in parallel on the same host. This is useful for machines with multiple NUMA nodes and many CPU cores to increase scalability.
  • Storage Plane: Added NVMe multipathing independent high-availability between storage nodes to harden resilience against network issues and improve failover.
  • Storage Plane: Removed separate secondary storage nodes for failover. From now on, all storage nodes act as primary and secondary storage nodes.
  • Storage Plane: Added I/O redirection in case of failover to secondary to improve cluster stability and failover times.
  • Storage Plane: Added support for CPU Core Isolation to improve performance consistency. Core masks and core isolation are auto-applied on disaggregated setups.
  • Storage Plane: Added sbctl storage-node configure command to automate the configuration of new storage nodes. See Configure a Storage Node for more information.
  • Storage Plane: Added optimized algorithms for the 4+1 and 4+2 erasure coding configurations.
  • Storage Plane: Reimplemented the Quality of Service (QoS) subsystem with significant less overhead than the old one.
  • Storage Plane: Added support for namespaced logical volumes (experimental).
  • Storage Plane: Reimplemented the initialization of a new page to significantly improve the performance of first write to page issues.
  • Storage Plane: Added support for optional labels when using strict anti-affinity.
  • Storage Plane: Added support for node affinity in case of a device failure to try to recover onto another device on the host.
  • Proxmox: Added support for native Proxmox node migration.
  • Talos: Added support to deploy on Talos-based OS-images.
  • AWS: Added Bottlerocket support.
  • AWS: Added multipathing support for Amazon Linux 2, Amazon Linux 2023, Bottlerocket.
  • GCP: Added support for Google Compute Engine.

Fixes

  • Storage Plane: Fixed a critical issue on cluster expansion during rebalancing.
  • Storage Plane: Optimized internal journal compression of meta-data to use fewer CPU resources.
  • Storage Plane: Significantly improved the fail-back time in failover situations.
  • Storage Plane: Fixed a CRC checksum error that occurred in rare situations after a node outage.
  • Storage Plane: Fixed a conflict resolution issue with could lead to data corruption in failover scenarios.
  • Storage Plane: Fixed a segmentation fault after resizing multiple logical volumes in a fast sequence.
  • Storage Plane: Fixed data placement issues which could lead to unexpected I/O interruptions after a sequence of outages.
  • Storage Plane: Reduced huge pages consumption by about 1.5x. Huge pages are automatically recalculated on node restart.
  • Storage Plane: Fixed an RPC issue on clusters with eight or more storage nodes.
  • Storage Plane: Fixed a race condition on storage node restarts or cluster reactivations.
  • Storage Plane: Hardened a health-check issue which affected multiple services.
  • Storage Plane: Improved shared buffers calculation on large storage nodes.
  • Storage Plane: Fixed an issue in the metadata journal which could lead to a temporary conflict and I/O interruption on the fail-back of large logical volumes.
  • Storage Plane: Fixed an issue where a storage node would stay unhealthy after a cluster upgrade.
  • Storage Plane: Fixed an issue where the internal journal devices would fail to automatically reconnect after a cluster outage.
  • Storage Plane: Fixed an issue where a restart of one storage node could lead to a crash on another storage node.
  • Storage Plane: Fixed reattaching Amazon EBS volumes when migrating a storage node to a new host.
  • Control Plane: Improved error handling on the internal controller code base.
  • Control Plane: Fixed a range of false-positive detections that lead to unexpected storage node restarts or, in rare cases, cluster suspension.
  • Control Plane: Cleaned up the API from unnecessary calls and fixed smaller response content issues.
  • Control Plane: Improved handling of Greylog in case of primary management node failover.
  • Control Plane: Fixed multiple issues regarding spill-over and outages in case of a management node disk running full.
  • Control Plane: Fixed an issue where the generated logical volume connection string is missing the logical volume id in the NQN.
  • Control Plane: Fixed multiple build issues with ARM64 CPUs.
  • Control Plane: Fixed multiple issues when deleting logical volumes and snapshots which could lead to dangling garbage and inconsistencies.
  • Control Plane: Fixed a primary storage node restart bug.
  • Control Plane: Improved NVMe device detection by switching from serial number to PCIe address.
  • Control Plane: Fixed an issue, related to logical volume operations, where FoundationDB's memory consumption would continue to increase over time.
  • Control Plane: Fixed an issue where migration tasks would stale with status "mig error: 8, retrying".
  • Control Plane: Improved observability by polling thread information from SPDK and store it in Prometheus.
  • Control Plane: Improved the performance of parallel logical volume creations.
  • Control Plane: Fixed data unit of read_speed in the Grafana cluster dashboard.
  • Control Plane: Fixed an issue where the PromAgent image wouldn't be upgraded on a cluster upgrade.
  • Kubernetes: Fixed an issue where the CSI driver would hang if it tries to delete a snapshot in error state.
  • Kubernetes: Fixed hanging NVMe/TCP connections in the CSI driver on storage node restarts or failovers.
  • Kubernetes: Fixed an issue with failing volume snapshots.
  • Kubernetes: Improved the version pinning of required services.
  • Kubernetes: Fixed an issue where NVMe/TCP connections in multipathing setups would be disconnected in the wrong order.
  • Proxmox: Improved automatic reconnecting of volumes after storage node restarts and failovers.

Important Changes

  • Architecture: Separate secondary nodes have been removed as a concept. Instead, in a high-availability cluster, every deployed primary storage node also acts as a secondary storage node to another primary.
  • Storage Plane: NVMe devices are now identified by their serial number to enable PCIe renumbering in case of changes to the system configuration.
  • Firewall rules adjustment: An existing port range TCP/8080-8890 was changed to TCP/8080-8180. The firewall configuration and AWS Security Groups need to be adjusted accordingly.
  • Firewall rules adjustment: An existing port range TCP/9090-9900 was changed to TCP/9100-9200. The firewall configuration and AWS Security Groups need to be adjusted accordingly.
  • Firewall rules adjustment: A new port range TCP/9030-9059 was added. The firewall configuration and AWS Security Groups need to be adjusted accordingly.
  • Firewall rules adjustment: A new port range TCP/9060-9099 was added. The firewall configuration and AWS Security Groups need to be adjusted accordingly.
  • Firewall rules adjustment: An existing port TCP/4420 has been removed. The firewall configuration and AWS Security Groups need to be adjusted accordingly.
  • Image registry: The image registry moved from Amazon ECR to DockerHub.

Known Issues

Simplyblock always seeks to provide a stable and strong release. However, smaller known issues happen. Following up is a list of known issues for the current simplyblock release.

  • GCP: On GCP, multiple Local SSDs are connected as NVMe Namespace devices. There is a bug that prevents more than one Local SSD from being added to a storage node. For the time being, use one Local SSD per storage node. The storage node must be sized accordingly.