Introduction¶
The Linux compute accelerators subsystem is designed to expose compute accelerators in a common way to user-space and provide a common set of functionality.
These devices can be either stand-alone ASICs or IP blocks inside an SoC/GPU. Although these devices are typically designed to accelerate Machine-Learning (ML) and/or Deep-Learning (DL) computations, the accel layer is not limited to handling these types of accelerators.
Typically, a compute accelerator will belong to one of the following categories:
- Edge AI - doing inference at an edge device. It can be an embedded ASIC/FPGA, or an IP inside a SoC (e.g. laptop web camera). These devices are typically configured using registers and can work with or without DMA. 
- Inference data-center - single/multi user devices in a large server. This type of device can be stand-alone or an IP inside a SoC or a GPU. It will have on-board DRAM (to hold the DL topology), DMA engines and command submission queues (either kernel or user-space queues). It might also have an MMU to manage multiple users and might also enable virtualization (SR-IOV) to support multiple VMs on the same device. In addition, these devices will usually have some tools, such as profiler and debugger. 
- Training data-center - Similar to Inference data-center cards, but typically have more computational power and memory b/w (e.g. HBM) and will likely have a method of scaling-up/out, i.e. connecting to other training cards inside the server or in other servers, respectively. 
All these devices typically have different runtime user-space software stacks, that are tailored-made to their h/w. In addition, they will also probably include a compiler to generate programs to their custom-made computational engines. Typically, the common layer in user-space will be the DL frameworks, such as PyTorch and TensorFlow.
Differentiation from GPUs¶
Because we want to prevent the extensive user-space graphic software stack from trying to use an accelerator as a GPU, the compute accelerators will be differentiated from GPUs by using a new major number and new device char files.
Furthermore, the drivers will be located in a separate place in the kernel tree - drivers/accel/.
The accelerator devices will be exposed to the user space with the dedicated 261 major number and will have the following convention:
- device char files - /dev/accel/accel* 
- sysfs - /sys/class/accel/accel*/ 
- debugfs - /sys/kernel/debug/accel/*/ 
Getting Started¶
First, read the DRM documentation at GPU Driver Developer’s Guide. Not only it will explain how to write a new DRM driver but it will also contain all the information on how to contribute, the Code Of Conduct and what is the coding style/documentation. All of that is the same for the accel subsystem.
Second, make sure the kernel is configured with CONFIG_DRM_ACCEL.
To expose your device as an accelerator, two changes are needed to be done in your driver (as opposed to a standard DRM driver):
- Add the DRIVER_COMPUTE_ACCEL feature flag in your drm_driver’s driver_features field. It is important to note that this driver feature is mutually exclusive with DRIVER_RENDER and DRIVER_MODESET. Devices that want to expose both graphics and compute device char files should be handled by two drivers that are connected using the auxiliary bus framework. 
- Change the open callback in your driver fops structure to accel_open(). Alternatively, your driver can use DEFINE_DRM_ACCEL_FOPS macro to easily set the correct function operations pointers structure. 
External References¶
email threads¶
- Initial discussion on the New subsystem for acceleration devices - Oded Gabbay (2022) 
- patch-set to add the new subsystem - Oded Gabbay (2022) 
Conference talks¶
- LPC 2022 Accelerators BOF outcomes summary - Dave Airlie (2022)