Introduction
Computer server architectures continually evolve to support faster analytics with larger data sets delivered for analysis applications. Computing capabilities of Central Processing Units (CPUs), and Graphics Processing Units (GPUs) increase to enable modern applications such as Artificial Intelligence (AI) and Machine Learning (ML). As core counts for processors increase, there is more data being processed requiring more memory capacity and bandwidth. The diversity of applications means that the CPU core and memory requirements vary widely based on their utilization of memory. System vendors and managers are challenged to optimize system resources to ensure there is enough memory to support all applications while avoiding memory overprovisioning and underutilization.
CXL® (Compute Express Link®) is a memory semantic protocol that leverages the highest performance PCIe® electrical interface specifications to deliver a configurable, scalable, sharable interface for a new generation of platforms. The introduction of the CXL interface is a major step toward enabling expansion and resource sharing by increasing the flexibility of memory and processing allocation to eliminate isolated resources and allow re-provisioning to suit application requirements.
Processors of all types (CPUs, GPUs, and accelerator devices) typically implement the latest generation of DDR memory interfaces delivering the highest performance for DRAM memory. CXL enables the connection of an external memory controller device to expand the memory resources available to the processing engines – effectively attaching an extra layer of memory. Data is passed over CXL to an external memory controller that manages the read/write operations for the memory media (usually DRAM). The CXL interface is designed with very low latency, so although the performance is lower than the native DDR interface, it still delivers comparable performance.
CXL connected DDR is a primary application that allows resources to be added or removed as required. A CXL 2.0 switch enables the interconnection and re-provisioning (re-allocation) of multiple processing types and memory devices in response to changing application needs – allowing for larger memory resource deployment without the risk of being underutilized when application changes are needed.
Advantages of External CXL Memory Controllers
While the latency needs can be met through newer memory technologies – such as DDR4/DDR5 – capable of supporting 3200MT/s and 4800 MT/s respectively, increasing the number of direct-attached DRAM interfaces achieves the increased performance per core needed for high performance computing (HPC) applications. This presents several system design challenges: DRAM interfaces require a lot of device pins for high-speed signals (288), and the issue of electrical reach (distance between devices) presents layout and real estate concerns regarding the close proximity of the processors which can exacerbate localized power and cooling requirements. And while directly connected memory does ensure the highest performance, it is inflexible and does not allow any unused memory capacity to be shared with other processors in the system.
CXL-based Memory Expansion
The CXL interface is optimized for 16 lanes each at 32 GT/s which substantially reduces the pin count overhead to support memory expansion, easily supporting the necessary data bandwidth for DDR expansion. There are options to support x8 and x4 channel widths at reduced overall bandwidth and the CXL electrical interface is the same as used for PCIe Gen 5 and Gen 6 which ensures reliability and robust channel reach for system designs.
CXL has three protocols that are targeted for specific types of devices. CXL.mem and CXL.cache are load/store memory semantic protocols that ensure the lowest latency for memory access performance. CXL.io is used for device management, error, and status reporting, and is based on PCIe transactions. CXL defines three types of devices with different characteristics:
-
Type 1: Host processors
-
Type 2: Devices that have processing, caching, and memory sharing functions
-
Type 3: Memory devices
DDR Memory controllers are most commonly type-3 functions providing three major features: high performance, capacity expansion, and scalability of deployment.
The use of a CXL-based memory controller – where the CPU would connect with the controller through the CXL interface and with DRAM memory through the DDR interface – meets bandwidth requirements by achieving up to 4 GB/s through a 16 lane CXL interface.
CXL reduces the signal integrity challenges and real-estate footprint issues of direct-attach memory at DDR5 rates due to the reduced pin count needed on the processor side, and also supports more DPCs while at the same time meeting the latency needs of HPC applications.
Cost Efficiencies of CXL Implementations
CXL enables heterogeneous processing systems with memory expansion and sharing that helps to optimize memory resource utilization. Expensive and complex multi-socket CPU and memory solutions are simplified through CXL memory expansion while overall cost is reduced and application performance is improved.
A CXL-based memory controller solution provides the flexibility of supporting different memory technologies to manage costs and reuse older technologies. For example, DDR4 memories are still predominant in the market and the cost per bit is lower than DDR5. A system could be configured to use high performance DDR5 for direct attached processor memory and DDR4 for the expansion CXL attached memory, with the option to upgrade later.
CXL memory allows for greater physical distance between the processor and the memory devices that can help in the optimization of power utilization and less expensive thermal cooling solutions. Larger scale memory expansion – or disaggregation – enables pools of memory resources to be shared and reallocated on demand.
For memory applications, CXL will likely utilize DIMM (Dual In-Line Memory Module) and EDSFF (Enterprise and Datacenter Standard Form Factor) form factors, as this provides the most efficient density and compatibility. Common form factor adoption ensures interoperability between vendors and allows systems to be easily upgraded or reconfigured.
Conclusion
CXL based memory solutions provide the best technology for the needs of processing intensive applications such as cloud computing, AI/ML, cluster networks, and HPC as it facilitates:
-
Increasing memory bandwidth, capacity, and latency needs for multi-core processors
-
Sharing and reallocation of memory to reduce underutilized resources
-
Enables heterogeneous systems to serve the applications for innovative AI, ML, and Neural architectures
-
Provides a cost effective solution through ease of adoption and reduction in CPU core count
-
Increases manageability of thermal and power demands
Consider the implementation of CXL based memory solutions today to enable cost-efficient memory expansion in your future system designs.