Asynchronous Replication
Simplyblock provides a snapshot-based asynchronous replication mechanism that replicates volumes at regular intervals. For each interval, simplyblock takes a copy-on-write snapshot on the source and replicates it to a volume in a storage pool on the target storage cluster.
For the architecture background, see Replication Concepts.
Scope and Prerequisites
- Asynchronous replication with automatic failover and controlled failback is a Kubernetes-only feature.
- It is managed by the Simplyblock Manager (Kubernetes operator) using the
SnapshotReplicationCRD. - Source and target storage clusters must be attached to the same simplyblock control plane.
- The two simplyblock clusters (source and target) must have network interconnectivity.
- Both clusters must be activated and have storage nodes online.
Note
For multi-site setups (for example, DR / offsite failover), using a distributed control plane is highly recommended. A typical setup is 2 management nodes on the main site and 3 management nodes on the failover site, so quorum / consensus can be maintained during a site failure.
Enabling Replication on Volumes
Replication participation is controlled on volumes by setting replicate: true.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: encrypted-volumes
provisioner: csi.simplyblock.io
parameters:
replicate: true
... other parameters
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
Configuring Source and Target
Replication also requires a source/target cluster and target storage pool configuration via SnapshotReplication.
apiVersion: storage.simplyblock.io/v1alpha1
kind: SnapshotReplication
metadata:
name: simplyblock-snapshot-replication
namespace: simplyblock
spec:
sourceCluster: <SOURCE_SIMPLYBLOCK_CLUSTER>
targetCluster: <TARGET_SIMPLYBLOCK_CLUSTER>
targetPool: <TARGET_CLUSTER2_STORAGE_POOL>
interval: <REPLICATION_INTERVAL>
SnapshotReplication Spec Fields
| Field | Type | Description |
|---|---|---|
sourceCluster |
string | Source simplyblock cluster name. Required. |
targetCluster |
string | Target simplyblock cluster name. Required. |
targetPool |
string | Target storage pool for replicated volumes. Required. |
interval |
int | Interval for creating new replication snapshots. Required. |
timeout |
int | Per-task replication timeout. Optional. Defaults to 60 seconds (control plane default). |
action |
string | Lifecycle action. Use failback to trigger failback after source recovery. |
sourcePool |
string | Source storage pool, required for failback workflows. |
includeVolumeIDs |
[]string | Optional list of volumes to include in replication/failback. |
excludeVolumeIDs |
[]string | Optional list of volumes to exclude from replication/failback. |
Replication Cycle and Queue Behavior
At each configured interval:
- A copy-on-write snapshot is taken for the logical volume.
- A replication task is created and added to the replication queue.
- Queue tasks are processed one-by-one.
SnapshotReplication.spec.interval defines how often new snapshots are scheduled.
SnapshotReplication.spec.timeout limits the maximum runtime of a replication task. By default, the control plane
uses 60 seconds if not explicitly configured.
To avoid queue exhaustion from stacked tasks (for example when replication is slower than snapshot creation, or the
target cluster is unreachable), set timeout lower than or up to a maximum of approximately 1.5x the interval.
Monitoring Replication Status
The operator tracks replication progress per volume in the SnapshotReplication status.
kubectl get snapshotreplication \
simplyblock-snapshot-replication \
-n simplyblock -o yaml
Status includes per-volume replication details such as the last replicated snapshot, counters, and timestamps.
Failover
Failover to the target cluster happens automatically when the source cluster becomes unhealthy or unavailable.
- The source cluster status is
suspended, or - All storage nodes in the source cluster are
unreachable.
When these conditions are met, replicated volumes are switched to the target cluster and begin serving I/O automatically.
Warning
Data written on the source after the last successful replication snapshot is not available on the target. The data gap is at least the configured replication interval, plus any replication lag.
Failback
Failback is exclusively user-initiated. It is an active/manual process that must be triggered by the user through a
SnapshotReplication CRD with action: failback.
Important
Failback requires a minimal downtime window during the final synchronization/cutback phase. For production workloads, plan and schedule a maintenance window whenever needed.
The failback process:
- Create a snapshot on the target and replicate it back to the source.
- Wait for replication completion.
- Suspend target I/O to freeze the final delta.
- Replicate the remaining delta to the source.
- Remove the failover copy on the target.
- Resume serving I/O from the source.
apiVersion: storage.simplyblock.io/v1alpha1
kind: SnapshotReplication
metadata:
name: simplyblock-snap-replication-failback
namespace: simplyblock
spec:
sourceCluster: simplyblock-cluster
targetCluster: simplyblock-cluster2
targetPool: simplyblock-pool2
sourcePool: simplyblock-pool
action: failback
Note
The two-phase failback replication minimizes the I/O freeze window by transferring most data before the final short delta synchronization step.