Cori (retired)¶
Cori retired on May 31, 2023 at noon
Please refer to the Migrating from Cori to Perlmutter page for the detailed Cori retirement plan and information about migrating your applications to Perlmutter.
Cori was a Cray XC40 with a peak performance of about 30 petaflops. The system was named in honor of American biochemist Gerty Cori, the first American woman to win a Nobel Prize and the first woman to be awarded the prize in Physiology or Medicine. Cori was comprised of 2,388 Intel Xeon "Haswell" processor nodes, 9,688 Intel Xeon Phi "Knight's Landing" (KNL) nodes. The system also had a large Lustre scratch file system and a first-of-its kind NVRAM "burst buffer" storage device.
Cori Retirement¶
Cori had its first users in 2015, and NERSC's longest running system was a valuable resource for thousands of users and projects. With the complete Perlmutter system operational for the 2023 allocation year, NERSC decommissioned Cori on May 31, 2023 at noon.
Please refer to the Migrating from Cori to Perlmutter page for the detailed Cori retirement plan and information about migrating your applications to Perlmutter.
System Overview¶
System Partition | # of cabinets | # of nodes | Aggregate Theoretical Peak | Aggregate Memory |
---|---|---|---|---|
Login | - | 20 | - | - |
Haswell | 14 | 2,388 | 2.81 PFlops | 298.5 TB |
KNL | 54 | 9,688 | 29.5 PFlops | 1.09 PB |
System Specification¶
System Partition | Processor | Clock Rate | Physical Cores Per Node | Threads/Core | Sockets Per Node | Memory Per Node |
---|---|---|---|---|---|---|
Login | Intel Xeon Processor E5-2698 v3 | 2.3 GHz | 32 | 2 | 2 | 515 GB |
Haswell | Intel Xeon Processor E5-2698 v3 | 2.3 GHz | 32 | 2 | 2 | 128 GB |
KNL | Intel Xeon Phi Processor 7250 | 1.4 GHz | 68 | 4 | 1 | 96 GB (DDR4), 16 GB (MCDRAM) |
Each XC40 cabinet housing Haswell and KNL nodes contined 3 chassis; each chassis had 16 compute blades with 4 nodes per blade. Login nodes were in separate cabinets.
Interconnect¶
Cray Aries with Dragonfly topology with >45 TB/s global peak bisection bandwidth.
Details about the interconnect
Node Specifications¶
Please note that the amounts of memory reported in this page for each type of node represents the amount of physical memory installed, but the memory users can use may be around 5-10 GB less, due to OS processes, file system caches, etc.
Login Nodes¶
- Cori had 12 Login nodes (
cori[01-12]
) open to public. - 2 Large Memory Login nodes (
cori[22,23]
) to submit tobigmem
qos. These nodes had 750GB of memory. - 4 Jupyter nodes (
cori[13,14,16,19]
]) accessed via Jupyter - 2 Workflow nodes (
cori[20,21]
) - required approval before access to node - 1 Compile node (
cori17
) - required approval before access to node - Each node had two sockets, each socket was populated with a 2.3 GHz 16-core Haswell processor.
Haswell Compute Nodes¶
- Each node had two sockets, each socket was populated with a 2.3 GHz 16-core Haswell processor. Intel Xeon Processor E5-2698 v3.
- Each core supported 2 hyper-threads, and had two 256-bit-wide vector units
- 36.8 Gflops/core (theoretical peak)
- 1.2 TFlops/node (theoretical peak)
- 2.81 PFlops total (theoretical peak)
- Each node had 128 GB DDR4 2133 MHz memory (four 16 GB DIMMs per socket)
- 298.5 TB total aggregate memory
KNL Compute Nodes¶
- Each node was a single-socket Intel Xeon Phi Processor 7250 ("Knights Landing") processor with 68 cores @ 1.4 GHz
- Each core had two 512-bit-wide vector processing units
- Each core had 4 hardware threads (272 threads total)
- AVX-512 vector pipelines with a hardware vector length of 512 bits (eight double-precision elements).
- 44.8 GFlops/core (theoretical peak)
- 3 TFlops/node (theoretical peak)
- 29.5 PFlops total (theoretical peak)
- Each node had 96 GB DDR4 2400 MHz memory, six 16 GB DIMMs (102 GiB/s peak bandwidth)
- Total aggregate memory (combined with MCDRAM) is 1.09 PB.
- Each node had 16 GB MCDRAM (multi-channel DRAM), > 460 GB/s peak bandwidth
- Each core had its own L1 caches, with 64 KB (32 KiB instruction cache, 32 KB data)
- Each tile (2 cores) shared a 1MB L2 cache
- Processor cores were connected in a 2D mesh network with 2 cores per tile, with a 1 MB cache-coherent L2 cache shared between 2 cores in a tile, with two vector processing units per core.