Introduction
The memory system industry is preparing for the productization of type 3 CXL® memory devices, targeting mass production within 2024 to align with the planned release of CXL 2.0 host systems. AI, Machine Learning, in-memory database & real-time analytics use cases, bandwidth, and capacity expansions require more than what the current memory system can provide to keep up with the memory requirement demands.
With additional memory channels available with CXL using the PCIe® interface, significant improvements in memory bandwidth and capacity per CPU are possible over DDR implementations. For example, with 8 to 12 DDR5 memory channels available in newest the CPUs, an additional 4 CXL memory ports can provide 50% or more improvements in both bandwidth and capacity.
Enabling CXL Memory Expansion Module with DRAM Features
As the memory media used for CXL memory expansion modules is mainly DRAM-based, enabling CXL memory expansion modules requires careful evaluation and validation of DRAM-related functions and features. CXL is memory-agnostic in nature and does not need to be coupled with a specific memory technology, unlike how the current CPU system must be coupled with a specific DRAM technology, such as DDR5. The CXL 1.1 or CXL 2.0 specifications provide opportunities for different kinds of memory technologies to be supported under CXL.
However, the industry came to realize that functions and features specific to current DRAM memory technology had to be added to the CXL specification for better clarification and performance. With this purpose, the CXL DRAM sub-group was launched in 2021 to generate and define features for DRAM-based CXL memory solutions.
The scope of work for the DRAM sub-group includes evaluating the gaps in current CXL specification versus use case requirements, developing a prioritized list of work items, and creating a comprehensive architecture, and Engineering Change Notices (ECNs) for these items. Since the CXL specification is mainly a protocol specification, the DRAM sub-group decided not to include package, pinout, mechanicals, electrical, or timing requirements and definitions inside the scope of work.
Standardizing DRAM Features in the CXL 3.1 Specification
The DRAM sub-group received numerous proposals, discussions, and contributions from more than 10 Contributing Member companies. The following DRAM features were standardized under the CXL 3.1 specification released in November 2023:
- Memory Maintenance with hard Post-Package Repair (hPPR), soft PPR (sPPR), and memory sparing: Defines how to discover device feature capability for various maintenance operations required for DRAM-based CXL memory modules, similar to current DDR4 or DDR5 memory systems. In addition, more definitions on configuring and initiating the memory maintenance operations are included, enabling the host to request the CXL device to perform a repair operation on its media, or to replace a portion of memory with a portion of functional memory at that same device physical address.
- Memory Testing: Defines the interface to configure and initiate specified memory tests on CXL memory devices from the host interface, and how to receive the test results. In other words, this interface allows the host to request a CXL memory module to execute device-built-in test operations at the level of the CXL controller.
- Memory Scrub Control with patrol scrub and Error Check Scrub (ECS): Defines the interface to configure patrol scrub for CXL controller, and event reporting from patrol scrub and ECS logs (ECS is specifically referenced from JEDEC DDR5 DRAM specification). The device patrol scrub proactively locates and makes corrections to correctable errors during runtime, and ECS allows the DRAM to internally read, correct single-bit errors and write back to the DRAM array, while reporting error count and address information for error count that exceeds a certain threshold.
- Advanced Programmable Corrected Error Threshold ECN: Defines interface to set a threshold for correctable error reporting. The granularity of the error threshold, configuration flag, error counter expiration time, event record flags, and event thresholds for each error severity are defined as attributes of this feature.
- Component ID ECN: Component ID was added into the memory module and DRAM event records, defining use in PLDM and RDE semantics by leveraging DMTF PLDM standard-based FRU/sub-FRU identification.
- Event Record Update ECN: Updates general media, DRAM, and memory module event records to enhance the descriptions on DRAM, media link, PMIC, and other memory module-related errors.
- Memory Capacity Reduction: Defines reporting mechanism to report the event of memory capacity reduction to the host. When a system experiences an unrecoverable failure, a map-out of a portion of the memory region (channel, rank, or more granularity) causes the CXL memory module to reduce capacity to avoid catastrophic failure, enhancing the handling of unrecoverable errors.
With the following features defined in the CXL 3.1 specification, memory validation for CXL memory expansion modules using DRAM is now more robust, providing a better understanding of how to verify and validate memory features that are used in current memory systems (mainly DDR4 and DDR5), and enhancing reporting mechanisms inside the CXL infrastructure.
Harmonization among Standardization Bodies
For more enhancements in the process of enabling CXL memory expansion modules, more work is being done to harmonize CXL memory-related activities from various standardization bodies. For mechanicals, package, and pinout definitions that are outside of CXL specification, the CXL Consortium is collaborating closely with JEDEC to publish a CXL memory module base standard and memory module label for CXL. Additionally, the CXL Consortium and JEDEC are working on defining more specifications to enable more features and validation of CXL memory expansion modules into mass production. Furthermore, the CXL Consortium is working with SNIA and DMTF to provide clarification in form factor and system management definitions.
Conclusion
CXL is showing tremendous promise in memory system value enhancements with memory bandwidth and capacity expansion. Since most of the CXL memory devices are based on DRAM, stronger efforts were made in the CXL specification (specifically, the CXL 3.1 specification) to make sure that DRAM-related features are defined with more details for validation and productization to ensure that CXL memory expansion modules are as complete as possible. Although enabling memory devices on PCIe infrastructure was considered foreign for the industry, the strong efforts from the CXL Consortium and other industry organizations are enabling their members to push a successful launch of CXL memory products worldwide.