NUMA Considerations
Modern multi-socket servers use a memory architecture called NUMA (Non-Uniform Memory Access) ⧉. In a NUMA system, each CPU socket has its own local memory and I/O paths. Accessing local resources is faster than reaching across sockets to remote memory or devices. Simplyblock is fully NUMA-aware.
On a host with more than one socket, by default one or two storage nodes are deployed per socket.
Two storage nodes per socket are deployed if: - more than 32 vCPUs (cores) per NUMA socket are dedicated to simplyblock per socket - more than 10 NVMe devices are connected to the NUMA socket
Users can change this behavior. Either by setting the appropriate Helm Chart parameters (in case of Kubernetes-based
storage node deployment) or by manually modifying the initially created configuration file on the storage node
(after running sbctl sn configure
).
It is critical for performance that all NVMe devices of a storage node are directly connected to the NUMA socket to which the storage node is deployed.
If a socket has no NVMe devices connected, it will not qualify to run a simplyblock storage node.
It is also important that the NIC(s) used by simplyblock for storage traffic are connected to the same NUMA socket. However, simplyblock does not auto-assign a NIC and users have manually to take care of that.
Checking NUMA Configuration
Before configuring simplyblock, the system configuration should be checked for multiple NUMA nodes. This can be done
using the lscpu
tool.
lscpu | grep -i numa
root@demo:~# lscpu | grep -i numa
NUMA node(s): 2
NUMA node0 CPU(s): 0-31
NUMA node1 CPU(s): 32-63
In the example above, the system has two NUMA nodes.
Recommendation
If the system consists of multiple NUMA nodes, it is recommended to configure simplyblock with multiple storage nodes per storage host. The number of storage nodes should match the number of NUMA nodes.
Ensuring NUMA-Aware Devices
For optimal performance, there should be a similar number of NVMe devices per NUMA node. Additionally, it is recommended to provide one Ethernet NIC per NUMA node.
To check the NUMA assignment of PCI-e devices, the lspci
tool and a small script can be used.
yum install pciutils
#!/bin/bash
for i in /sys/class/*/*/device; do
pci=$(basename "$(readlink $i)")
if [ -e $i/numa_node ]; then
echo "NUMA Node: `cat $i/numa_node` ($i): `lspci -s $pci`" ;
fi
done | sort