Intel Sapphire Rapids HBM ‘Xeon Scalable’ processors with 64GB of HBM2e memory deliver up to 3x more performance than Ice Lake Xeons

Intel has once again showcased its upcoming Sapphire Rapids HBM Xeon Scalable processors with up to 64GB of HBM2e memory in various workloads.

Intel promises 3x the performance with its line of next-gen Sapphire Rapids HBM “Xeon Scalable” processors

According to Intel, the Sapphire Rapids-SP will be available in two package variants, a standard configuration and an HBM configuration. The standard variant will feature a chiplet design consisting of four XCC dies which will feature a die size of around 400mm2. This is the die size for a singular XCC die and there will be four in total on the top Sapphire Rapids-SP Xeon chip. Each chip will be interconnected via EMIB which has a 55u pitch size and a 100u core pitch.

Intel Unveils Rialto Bridge: Next-Generation AI Successor to Ponte Vecchio Xe-HPC GPU with up to 160 Xe Cores, 20,000+ ALUs, OAM 2.0, 2023 Sampling

The Intel Xeon processor, named Sapphire Rapids with High Bandwidth Memory (HBM), is a prime example of how we leverage advanced packaging technologies and silicon innovations to deliver substantial improvements in performance, bandwidth and power saving for HPC. With up to 64 gigabytes of high-bandwidth HBM2e memory in the package and in-processor accelerators, we are able to unleash memory bandwidth-bound workloads while delivering significant performance improvements in key HPC use cases.

Comparing 3rd Gen Intel Xeon Scalable processors to upcoming Sapphire Rapids HBM processors, we see a two- to three-fold performance increase in weather, energy, manufacturing, and physics research workloads2. During the keynote, Ansys CTO Prith Banerjee also shows that Sapphire Rapids HBM delivers up to 2x better performance on real Ansys Fluent and ParSeNet workloads.

The standard Sapphire Rapids-SP Xeon chip will feature 10 EMIB interconnects and the entire package will measure in at a mighty 4446mm2. Moving to the HBM variant, we get an increased number of interconnects which sits at 14 and are required to interconnect the HBM2E memory to the cores.

All four HBM2E memory packages will feature 8-Hi stacks, so Intel is opting for at least 16GB of HBM2E memory per stack for a total of 64GB on the Sapphire Rapids-SP package. Speaking of the set, the HBM variant will be 5700mm2 or 28% larger than the standard variant. Compared to recently leaked EPYC Genoa figures, the HBM2E package for Sapphire Rapids-SP would end up being 5% larger while the standard package would be 22% smaller.

  • Intel Sapphire Rapids-SP Xeon (standard package) – 4446mm2
  • Intel Sapphire Rapids-SP Xeon (HBM2E package) – 5700mm2
  • AMD EPYC Genoa (12 CCD package) – 5428mm2

Intel Falcon Shores XPU will push high-performance computing to the extreme with multi-tile x86 processor and Xe GPU configurations, targeting more than 5x performance per watt

Intel also states that the EMIB link provides twice the bandwidth density improvement and 4 times the power efficiency compared to standard enclosure designs. Interestingly, Intel is calling the latest Xeon lineup logically monolithic, which means they’re referring to the interconnect that will offer the same functionality as a single die, but technically there are four chiplets that will be interconnected together. You can read all the details about the standard 56-core, 112-thread Sapphire Rapids-SP Xeon processors here.

Intel Xeon SP Families (Preview):

Family branding Skylake-SP Lac Cascade-SP/AP Cooper Lake-SP Ice Lake-SP sapphire rapids emerald rapids granite rapids Diamond Rapids
Process node 14nm+ 14nm++ 14nm++ 10nm+ Intel 7 Intel 7 Intel3 Intel 3?
Platform name Intel Purley Intel Purley Intel Cedar Island Intel Whitley Intel Eagle Stream Intel Eagle Stream Intel Mountain Stream
Intel Birch Stream
Intel Mountain Stream
Intel Birch Stream
Basic Architecture celestial lake Cascades Lake Cascades Lake sunny cove golden cove Raptor Cove Redwood Bay? Lion’s Cove?
Improved CPI (Vs Prev Gen) ten% 0% 0% 20% 19% 8%? 35%? 39%?
References MCP (Multi-Chip Package) Nope Yes Nope Nope Yes Yes To be determined (maybe yes) To be determined (maybe yes)
Socket LGA 3647 LGA 3647 LGA 4189 LGA 4189 LGA 4677 LGA 4677 To be determined To be determined
Maximum number of cores Up to 28 Up to 28 Up to 28 Up to 40 Up to 56 Up to 64? Up to 120? Up to 144?
Maximum number of threads Up to 56 Up to 56 Up to 56 Up to 80 Up to 112 Up to 128? Up to 240? Up to 288?
Maximum L3 Cache 38.5 MB L3 38.5 MB L3 38.5 MB L3 60 MB L3 105 MB L3 120 MB L3? 240 MB L3? 288 MB L3?
Vector engines AVX-512/FMA2 AVX-512/FMA2 AVX-512/FMA2 AVX-512/FMA2 AVX-512/FMA2 AVX-512/FMA2 AVX-1024/FMA3? AVX-1024/FMA3?
Memory support 6 channel DDR4-2666 DDR4-2933 6 channels Up to 6 DDR4-3200 channels Up to 8 DDR4-3200 channels Up to 8 DDR5-4800 channels Up to 8 channels DDR5-5600? Up to 12 DDR5-6400 channels? Up to 12 DDR6-7200 channels?
PCIe generation support PCIe 3.0 (48 lanes) PCIe 3.0 (48 lanes) PCIe 3.0 (48 lanes) PCIe 4.0 (64 lanes) PCIe 5.0 (80 lanes) PCIe 5.0 (80 lanes) PCIe 6.0 (128 lanes)? PCIe 6.0 (128 lanes)?
TDP range (PL1) 140W-205W 165W-205W 150W-250W 105-270W Up to 350W Up to 375W? Up to 400W? Up to 425W?
Xpoint Optane 3D DIMMs N / A Apache pass Barlow Collar Barlow Collar Raven Pass Raven Pass? Donahue Pass? Donahue Pass?
Competition AMD EPYC Naples 14nm AMD EPYC Rome 7nm AMD EPYC Rome 7nm AMD EPYC Milan 7nm+ AMD EPYC Genoa ~5nm AMD Next-Gen EPYC (after Genoa) AMD Next-Gen EPYC (after Genoa) AMD Next-Gen EPYC (after Genoa)
Launch 2017 2018 2020 2021 2022 2023? 2024? 2025?

As for the performance footnotes for the Intel Sapphire Rapids HBM ‘Xeon Scalable’ processor, you can see them below:

CloverLeaf

  • Tested by Intel on 2022-04-26. 1 node, 2 Intel® Xeon® Platinum 8360Y processors, 72 cores, HT enabled, Turbo enabled, total memory 256 GB (16 x 16 GB DDR4 3200 MT/s), SE5C6200.86B.0021.D40.2101090208, Ubuntu 20.04, kernel 5.10, 0xd0002a0, ifort 2021.5, Intel MPI 2021.5.1, build buttons: -xCORE-AVX512 –qopt-zmm-usage=high
  • Tested by Intel on 4/19/22. 1 node, 2 pre-production Intel® Xeon® Scalable processors named Sapphire Rapids Plus HBM, > 40 cores, HT ON, Turbo ON, total memory 128 GB (HBM2e at 3200 MHz), BIOS version EGSDCRB1.86B.0077 .D11 .2203281354, revision ucode=0x83000200, CentOS Stream 8, Linux version 5.16, ifort 2021.5, Intel MPI 2021.5.1, build buttons: -xCORE-AVX512 –qopt-zmm-usage=high

OpenFOAM

  • Tested by Intel as of 01/26/2022. 1 node, 2 Intel® Xeon® Platinum 8380 processors), 80 cores, HT enabled, Turbo enabled, total memory 256 GB (16 x 16 GB 3200 MT/s, dual rank), BIOS version SE5C6200.86B.0020. P23.2103261309, 0xd000270, Rocky Linux 8.5, Linux version 4.18., OpenFOAM® v1912, Moto 28M @ 250 iterations; Build Notes: Tools: Intel Parallel Studio 2020u4, Build Buttons: -O3 -ip -xCORE-AVX512
  • Tested by Intel as of 01/26/2022 1 node, 2 pre-production Intel® Xeon® Scalable Processors codenamed Sapphire Rapids Plus HBM, > 40 Cores, HT Disabled, Turbo Disabled, 128 GB total memory (HBM2e to 3200 MHz), preproduction platform and BIOS, CentOS 8, Linux version 5.12, OpenFOAM® v1912, Motorbike 28M @ 250 iterations; Build Notes: Tools: Intel Parallel Studio 2020u4, Build Buttons: -O3 -ip -xCORE-AVX512

FRM

  • Tested by Intel as of 05/03/2022. 1 node, 2 Intel® Xeon® 8380 processors, 80 cores, HT enabled, Turbo enabled, total memory 256 GB (16 x 16 GB 3200 MT/s, dual rank), BIOS version SE5C6200.86B.0020.P23.2103261309 , ucode revision=0xd000270, Rocky Linux 8.5, Linux version 4.18, WRF v4.2.2
  • Tested by Intel as of 05/03/2022. 1 node, 2 pre-production Intel® Xeon® Scalable processors named Sapphire Rapids Plus HBM, > 40 cores, HT ON, Turbo ON, total memory 128 GB (HBM2e at 3200 MHz), BIOS version EGSDCRB1.86B.0077 .D11 .2203281354, revision ucode=0x83000200, CentOS Stream 8, Linux version 5.16, WRF v4.2.2

YASK

  • Tested by Intel as of 05/09/2022. 1 node, 2 Intel® Xeon® Platinum 8360Y processors, 72 cores, HT enabled, Turbo enabled, total memory 256 GB (16 x 16 GB DDR4 3200 MT/s), SE5C6200.86B.0021.D40.2101090208, Rocky linux 8.5 , kernel 4.18.0, 0xd000270, Build buttons: make -j YK_CXX=’mpiicpc -cxx=icpx’ arch=avx2 stencil=iso3dfd radius=8,
  • Tested by Intel on 3/5/22. 1 node, 2 pre-production Intel® Xeon® Scalable processors named Sapphire Rapids Plus HBM, > 40 cores, HT ON, Turbo ON, total memory 128 GB (HBM2e at 3200 MHz), BIOS version EGSDCRB1.86B.0077 .D11 .2203281354, ucode revision=0x83000200, CentOS Stream 8, Linux version 5.16, Build buttons: make -j YK_CXX=’mpiicpc -cxx=icpx’ arch=avx2 stencil=iso3dfd radius=8,

Ansys Fluent

  • Tested by Intel from 2/2022 1 node, 2 Intel ® Xeon ® Platinum 8380 processors, 80 cores, HT enabled, Turbo enabled, total memory 256 GB (16 x 16 GB 3200 MT/s, dual rank), version of the BIOS SE5C6200.86B .0020.P23.2103261309, revision ucode=0xd000270, Rocky Linux 8.5, Linux version 4.18, Ansys Fluent 2021 R2 Aircraft_wing_14m; Build Notes: Retail version using Intel Compiler 19.3 and Intel MPI 2019u
  • Tested by Intel as of 2/2022 1 node, 2x Sapphire Rapids pre-production Intel® Xeon® Scalable Processor codenames with HBM, >40 cores, HT Off, Turbo Off, 128GB total memory (HBM2e to 3 200 MHz), preproduction platform and BIOS, CentOS 8, Linux version 5.12, Ansys Fluent 2021 R2 Aircraft_wing_14m; Build Notes: Retail version using Intel Compiler 19.3 and Intel MPI 2019u8

Ansys ParSeNet

  • Tested by Intel on 05/24/2022. 1 node, 2 Intel® Xeon® Platinum 8380 processors, 80 cores, HT enabled, Turbo enabled, total memory 256 GB (16 x 16 GB DDR4 3200 MT/s [3200 MT/s]), SE5C6200.86B.0021.D40.2101090208, Ubuntu 20.04.1 LTS, 5.10, ParSeNet (SplineNet), PyTorch 1.11.0, Torch-CCL 1.2.0, IPEX 1.10.0, MKL (2021.4-Product Build 20210904) , aDNN (v2.5.0)
  • Tested by Intel as of 04/18/2022. 1 node, 2 pre-production Intel® Xeon® Scalable processors named Sapphire Rapids Plus HBM, 112 cores, HT On, Turbo On, 128 GB total memory (HBM2e 3200 MT/s), EGSDCRB1.86B.0077.D11.2203281354 , CentOS Stream 8, 5.16, ParSeNet (SplineNet), PyTorch 1.11.0, Torch-CCL 1.2.0, IPEX 1.10.0, MKL (2021.4-Product Build 20210904), oneDNN (v2.5.0)


Source link

Steven L. Nielsen