“Introducing the CXL 4.0 Specification” Webinar Q&A Recap

4 min read

In November 2025, the CXL Consortium announced the release of the CXL 4.0 specification to meet the increasing demands of emerging workloads in today’s data centers. The specification doubles bandwidth from 64GTs to 128GTs, adds support for bundled ports, and enhances memory RAS features.

Recently, Debendra Das Sharma, CXL Consortium Board Chair, and Mahesh Natu, CXL Consortium Software & Systems Working Group Co-Chair, presented a webinar introducing the new features of the 4.0 specification.

Watch On-Demand and Download the Presentation Slides

If you could not attend the live webinar, the recording is available via our YouTube channel and the webinar presentation slides are available for download on the CXL Consortium website.

Webinar Q&A

We received great questions from the audience but could not address all of them during the live webinar. Below are answers to the questions we didn’t get to.

Q1: When moving from CXL 3.0 to CXL 4.0, can you share the process for maintaining signal integrity and supporting flawless optical connectivity, especially over long channels with retimers?

[Response] CXL 4.0 is based on PCIe® 7.0 (as CXL 3.0 is based on PCIe 6.0). The maximum data rate supported in CXL 4.0 is 128 GT/s with PAM-4 signaling at -36 dB channel loss; the maximum data rate in CXL 3.0 is 64 GT/s with PAM-4 signaling at -36 dB channel loss (which is also supported in CXL 4.0). Both data rates have the same FBER (First bit error rate) of less than 10-6. So there is no change there. We have a well-defined specification along with a compliance program. The expectation is that devices will meet the relevant parts of the PCIe 7.0 specification.
Optical connectivity is obtained through retimers. They preserve the same FBER characteristics end-to-end. Fundamentally, each retimer extends the channel by retiming the bit, so either side of the retimer is like an independent channel. Again, this part uses PCIe 7.0 specification as the baseline.

Q2: How do we envision interoperability, firmware management, and technical hurdles like signal integrity for CXL device topology, particularly for CXL 4.0, where we are handling PCIe 7.0 speeds of 128GTs?

[Response] Please refer to the answer to question 1.

Q3: What are the Software and Firmware Stack evolution challenges to maintain dynamic resource allocation without compromising performance, especially for handling complicated AI-based workloads?

[Response] CXL 3.0 introduced the Dynamic Capacity Device construct, which allows almost instantaneous addition/removal of memory capacity to a compute node. This unlocks new use cases for complex AI-based workloads whose memory requirements continue to evolve. One thing to keep in mind is that the memory capacity that gets added from the memory pool device may have different performance characteristics than the locally attached memory. To take full advantage of this, the software must be NUMA optimized, which takes time and effort.

Q4: Is there any available performance analysis on topology involving CXL devices?

[Response] The specification provides target latency numbers for some transactions, such as memory read, snoops, etc. The exact performance depends on the design, configuration, and traffic.

Q5: Is the FIT rate of 2.6×10-2 vs. 5×10-8 for latency-optimized Flit accurate? That seems like a huge reduction.

[Response] FIT is failure in time, which is the number of failures in a billion hours for the link. These numbers are for an x16 link. In general, any number less than 1 is acceptable, since silicon FIT numbers tend to be 3- or 4-digit. So, 0.0026 has a minimal impact on the system FIT.

Q6: Does the CXL bundled ports mean that instead of seeing two different CXL Ports in software, it’ll be detected as a single device? Are different traffic types sent on each link in the bundled port?

[Response] There are two different views. The bus enumeration software will see multiple CXL endpoints (SLD-B* instances), but all SLD-B instances are managed as a single accelerator entity by the device driver. To the applications, it would appear like a single accelerator. The Bundled Port Device (BPD) will spray traffic across all the ports, thus getting a boost in effective bandwidth. The BPD driver may provide hints to the BPD as to how to interleave traffic.

Q7: Are there any plans to extend bundling/streamlined ports to MH-SLDs or MH-MLDs?

[Response] Not at this point, to ensure that investment by vendors in MH-SLD devices is protected.

Q8: Is Switch support necessary for MLD?

[Response] Yes. The MLD device uses a single physical CXL link to communicate with multiple hosts. The switch is responsible for mapping the traffic from individual LDs to the physical CXL link connected to the host.

Q9: Can we bundle multiple Type 3 SLD devices, in case someone does not have an MH-SLD device?

[Response] The primary reason for bundling is to be able to interleave traffic across multiple CXL links. Since Type 3 SLD memory expansion devices are often interleaved, you could argue that they already get equivalent benefit.

Q10: For bundled ports, how does a single FLIT (256B) send (“spread“) across multiple ports?

[Response] Each port is an independent link – so a Flit is not spread across multiple ports. Each Port has its own independent set of Flits.

 

* An SLD (Single Logical Device) exposed by each port of a BPD (Bundled Port Device).

Facebook
Twitter
LinkedIn