

# Compute Express Link<sup>TM</sup> (CXL<sup>TM</sup>): A Coherent Interface for Utra-Hgh-Speed Transfers

Presenter Name/Date

# Agenda



- Industry Landscape
- Compute Express Link<sup>™</sup> Overview
- Introducing CXL<sup>™</sup> Consortium
- CXL<sup>™</sup> Consortium Membership
- Summary



### Why is a New Class of Interconnect Needed?



- Industry mega-trends are driving demand for faster data processing and next-generation data center performance:
  - Proliferation of Cloud Computing
  - Growth of Artificial Intelligence and Analytics
  - Cloudification of the Network and Edge
- Need a new class of interconnect for heterogenous computing and disaggregation usages:
  - Efficient resource sharing
  - Shared memory pools with efficient access mechanisms
  - Enhanced movement of operands and results between accelerators and target devices
  - Significant latency reduction to enable disaggregated memory
- The industry needs open standards that can comprehensively address next-gen interconnect challenges



#### Today's Environment



**CXL Enabled Environment** 

### **CXL** Overview



- New breakthrough high-speed CPU-to-Device interconnect
  - Enables a high-speed, efficient interconnect between the CPU and platform enhancements and workload accelerators
  - Builds upon PCI Express® infrastructure, leveraging the PCIe® 5.0 physical and electrical interface
  - Maintains memory coherency between the CPU memory space and memory on attached devices
    - Allows resource sharing for higher performance
    - Reduced complexity and lower overall system cost
    - Permits users to focus on target workloads as opposed to redundant memory management
- Delivered as an open industry standard
  - CXL Specification 2.0 is available now
  - Future CXL Specification generations will continue to innovate to meet industry needs

# Introducing CXL



- Processor Interconnect:
  - Open industry standard
  - High-bandwidth, low-latency
  - Coherent interface
  - Leverages PCI Express®
  - Targets high-performance computational workloads
    - Artificial Intelligence
    - Machine Learning
    - HPC
    - Comms



A new class of interconnect for device connectivity

### What is CXL?



- Alternate protocol that runs across the standard PCIe physical layer
- Uses a flexible processor port that can auto-negotiate to either the standard PCIe transaction protocol or the alternate CXL transaction protocols
- First generation CXL aligns to 32 Gbps PCIe 5.0
- CXL usages expected to be key driver for an aggressive timeline to PCIe 6.0



### CXLProtocols



 The CXL transaction layer is compromised of three dynamically multiplexed sub-protocols on a single link:

#### CXL.io Discovery, configuration, register access, interrupts, etc. CXL.cache Device access to processor memory CXL.Memory Processor access to device attached memory







#### **CXL**Stack

#### Designed for Low Latency



- All 3 representative usages have latency critical elements:
  - CXL.cache
  - CXL.memory
  - CXL.io
- CXL cache and memory stack is optimized for latency:
  - Separate transaction and link layer from IO
  - Fixed message framing
- CXL io flows pass through a stack that is largely identical a standard PCIe stack:
  - Dynamic framing
  - Transaction Layer Packet (TLP)/Data Link Layer Packet (DLLP) encapsulated in CXL flits

#### CXL Stack – Low latency Cache and Mem Transactions



### Alternate Stack – for contrast



### CXLStack

#### Designed for Low Latency



- All 3 representative usages have latency critical elements:
  - CXL.cache
  - CXL.memory
  - CXL.io
- CXL cache and memory stack is optimized for latency:
  - Separate transaction and link layer from IO
  - Fixed message framing
- CXL io flows pass through a stack that is largely identical a standard PCIe stack:
  - Dynamic framing
  - Transaction Layer Packet (TLP)/Data Link Layer Packet (DLLP) encapsulated in CXL flits



# Asymmetric Complexity



#### CCI\* Model – Symmetric CCI Protocol



\*Cache Coherent Interface

#### CXL Model - Asymmetric Protocol



#### CXL Key Advantages:

- Avoid protocol interoperability hurdles/roadblocks
- Enable devices across multiple segments (e.g. client / server)
- Enable Memory buffer with no coherency burden
- Simpler, processor independent device development

### CXL's Coherence Blas









Critical access class for accelerators is "device engine to device memory" "Coherence Bias" allows a device engine to access its memory coherently without visiting the processor

### Two driver managed modes or "Biases"

HOST BIAS: pages being used by the host or shared between host and device

DEVICE BIAS: pages being used exclusively by the device

### Both biases guaranteed correct/coherent

Guarantee applies even when software bugs or speculative accesses unexpectedly access device memory in the "Device Bias" state.





### CXL20 Switching Benefits – Expansion







### CXL20 Switching Benefits - Pooling



# Memory/Accelerator Pooling with Single Logical Devices



# Memory Pooling with Multiple Logical Devices



### CXL20 Benefits and Persistent Memory



Moves Persistent Memory from Controller to CXL

Enables Standardized Management of the Memory and Interface

Supports a Wide Variety of Industry Form Factors



### CXL20 Benefits – Security



CXL 2.0 provides Integrity and Data Encryption of traffic across all entities (Root Complex, Switch, Device)







# Representative CXL Usages

### Caching Devices / Accelerators TYPE 1 **Processor PROTOCOLS** CXL.io CXL.cache Accelerator NC Cache **USAGES** PGAS NIC NIC atomics





# Heterogeneous Computing Revisited



- CXL enables a more fluid and flexible memory model
- Single, common, memory address space across processors and devices



# CXLSummary



 CXL has the right features and architecture to enable a broad, open ecosystem for heterogeneous computing and server disaggregation:

#### Coherent Interface:

Leverages PCle® with 3 mix-and-match protocols

#### Low Latency:

.Cache and .Mem targeted at near CPU cache coherent latency

#### **Asymmetric Complexity:**

Eases burdens of cache coherent interface designs

#### Open Industry Standard:

With growing broad industry support

- CXL 2.0 introduces new features & usage models
  - Switching, pooling, persistent memory support, security
  - Fully backward compatible with CXL 1.1 and 1.0
  - Built in Compliance & Interop program



















Google



intel







CXL Board of Directors



Industry Open Standard for **High Speed Communications** 

150+ Member Companies



# Introducing CXL<sup>TM</sup> Consortium



- Alibaba, Cisco, Dell EMC, Facebook, Google, Hewlett Packard Enterprise, Huawei, Intel Corporation and Microsoft announced their intent to incorporate in March 2019
- CXL Consortium Work Groups:
  - 5 Technical (Protocol, PHY, Software & Systems, Memory, Compliance) and Marketing
- This core group <u>announced</u> incorporation of the Compute Express Link (CXL) Consortium on September 17, 2019 and unveiled the names of its newly-elected, expanded Board of Directors:
  - ➤ Jim Pappas, Intel (Chairman)
  - Barry McAuliffe, HPE (*President*)
  - Kurtis Bowman, Dell (Secretary)
  - Christian Petersen, Facebook (*Treasurer*)
  - Di Xu, Alibaba
  - > Nathan Kalyanasundharam, AMD
  - Dong Wei, Arm

- > Sagar Borikar, Cisco
- > Robert Sprinkle, Google
- Alex Umansky, Huawei
- > Steve Fields, IBM
- > Larrie Carr, Microchip Technology
- > Leendert van Doorn, Microsoft
- Gaurav Singh, Xilinx

# Industry Liaisons









#### **DMTF**

As part of DMTF's Alliance Partner program, the organization and the Compute Express Link™ (CXL™) Consortium agreed to a new work register, which outlines areas of technical collaboration between the two organizations. DMTF (formerly known as the Distributed Management Task Force) creates open manageability standards spanning diverse emerging and traditional IT infrastructures including cloud, virtualization, network, servers and storage. Member companies and alliance partners worldwide collaborate on standards to improve the interoperable management of information technologies. Internationally recognized by ANSI and ISO, DMTF standards enable a more integrated and costeffective approach to management through interoperable solutions. Simultaneous development of Open Source and Open Standards is made possible by DMTF, which has the support, tools, and infrastructure for efficient development and collaboration.

#### Gen-Z Consortium

The Compute Express Link (CXL) Consortium and Gen-Z Consortium developed an execution of a Memorandum of Understanding (MOU), describing a mutual plan for collaboration between the two organizations. The agreement shows the commitment each organization is making to promote interoperability between the technologies, while leveraging and further developing complementary capabilities of each technology. Founded in 2017, the Gen-Z Consortium is an industry organization developing an open systems fabric-based architecture designed to provide high-speed, low-latency, secure access to data and devices. The Gen-Z Consortium recently showcased live demonstrations of remote memory attached across a Gen-Z link. The Consortium is nearly 70 members strong and has released 11 specifications to the public, including the Core Specification 1.1.

#### Storage Networking Industry Association (SNIA)

The Storage Networking Industry Association (SNIA) and the CXL™ Consortium have formed a strategic alliance to enable the education and to support the subsequent adoption of their technologies by interested individuals, including developers, implementors, and end users. Together the CXL Consortium and SNIA will focus on joint marketing efforts, specifically between the CXL Marketing Work Group and the SNIA Compute, Memory, and Storage Initiative (CMSI) Marketing Committee.

### Call to Action



- To join the CXL Consortium, visit <u>www.computeexpresslink.org/join</u>
- Download an evaluation copy of the CXL 2.0 specification
- Engage with us on social media









# Thank You



