Benchmarking the upcoming AMD EPYC speed boost on Linux 5.18, thanks to the scheduler/NUMA improvement review
Earlier this month I noted a Linux scheduler change queued in sched/core ahead of Linux 5.18 cycle that should help AMD EPYC processors and other select Zen processors in various workloads . The change has been in the works for several months and consists of adjusting the allowed NUMA imbalance when it spans multiple LLCs. I’ve now done some of my own performance testing on the EPYC hardware and, indeed, am further increasing the performance of the Linux kernel.
The patch talks about nice performance gains… Test in progress on my side.
This change queued in “sched/core” ahead of next month’s Linux 5.18 merge window is not an AMD-specific change, but one that benefits Zen CPUs due to their cache layout. Mel Gorman, the author of the patch, explained: “[A kernel scheduler change from 2020] allowed an imbalance between NUMA nodes so that communicating tasks were not separated by the load balancer. This works well when there is a 1:1 relationship between the LLC and the node, but may be suboptimal for multiple LLCs if independent tasks prematurely use the CPU sharing cache. Zen* has multiple LLCs per node with local memory channels and due to the allowed imbalance it is much more difficult to tune some workloads to perform optimally than on hardware that has 1 LLC per node. This fix allows an imbalance to exist to the point where LLCs should be balanced across nodes.”
Can confirm excellent performance improvement on EPYC 7003 series on a variety of workloads with latest TIP planning/kernel changes.
Mel’s own benchmarks when working on this patch saw up to 272% improvement for the Stream memory benchmark and also other big gains like 10% better performance in Coremark and maxing out at 17 %, SPECjbb Java performance increased by up to 18%, NPB’s embarrassing parallel benchmark was about 17% faster, and other notable gains. Given the very promising results reported, I’ve done some of my own testing locally and the numbers I’m seeing are equally exciting – especially for this Linux kernel scheduler change which is only about 50 lines of new codes!
First, some benchmarks run with an AMD EPYC 75F3 2P server built around an ASRockRack ROME2D16-2T motherboard and running Ubuntu 21.10. Performance was compared between Linux 5.17 Git and later sched/core Git having the fix “sched/fair: Adjust NUMA imbalance allowed when SD_NUMA spans multiple LLCs” in question. The same Kconfig kernel between the two kernels was used.