Thu. Aug 18th, 2022

From our intuitive feeling, DMA is not a complicated thing, and what to do is also very simple and straightforward. Therefore, the abstraction and implementation of the Linux kernel should also be concise and easy to understand. But the reality is not very optimistic (personal feeling), the implementation of the Linux kernel dmaengine framework is really a bit obscure. Why is this so?

From our intuitive feeling, DMA is not a complicated thing, and what to do is also very simple and straightforward. Therefore, the abstraction and implementation of the Linux kernel should also be concise and easy to understand. But the reality is not very optimistic (personal feeling), the implementation of the Linux kernel dmaengine framework is really a bit obscure. Why is this so?

Linux DMA function and interface analysis, do you understand?

1 Introduction

If a software module is complicated and obscure, either the designer’s skill is not enough, or it is due to demand. Of course, we dare not have the slightest suspicion and disrespect for the great gods of the Linux kernel, and we can only work hard on the demand: Is the use of DMA by the driver in the Linux kernel beyond our daily cognition?

It is not difficult to answer these questions. It is enough to sort out the functions and APIs provided by the dmaengine framework for consumers, which is the purpose of this article. Of course, you can also use this process to deepen your understanding of DMA, so that you can be more comfortable when writing drivers that require DMA transmission.

2. Slave-DMA API and Async TX API

In terms of direction, DMA transfers can be divided into four categories: memory to memory, memory to device, device to memory, and device to device. As the agent of the CPU, the Linux kernel, from its perspective, all peripherals are slaves, so these transfers involving devices (MEM2DEV, DEV2MEM, DEV2DEV) are called Slave-DMA transfers. Another memory-to-memory transmission is called Async TX.

Why emphasize this difference? Because Linux encapsulates a more concise API on top of dma engine in order to facilitate DMA-based memcpy, memset and other operations (as shown in Figure 1 below), this API is Async TX API (starts with async_, e.g. async_memcpy, async_memset, async_xor, etc.).

Figure 1 Schematic diagram of DMA Engine API

Finally, because the memory-to-memory DMA transfer has a relatively concise API, there is no need to directly use the API provided by the dma engine, and finally the API provided by the dma engine is specifically referred to as the Slave-DMA API (exclude mem2mem).

This article mainly introduces the functions and APIs provided by dma engine for consumers, so the Async TX API is no longer involved (for details, please refer to the subsequent articles on this site.

Note 1: “slave” in Slave-DMA refers to the device participating in DMA transfer. Correspondingly, “master” refers to the DMA controller itself. Be sure to understand the concept of “slave” in order to better understand the relevant terms and logic in the kernel dma engine.

3. Steps of using dma engine

Note 2: Most of the content of this article is translated from the kernel document[1]readers who like to read English can refer to it by themselves.

For device driver writers, to perform DMA transfer based on the Slave-DMA API provided by dma engine, the following steps are required:

1) Apply for a DMA channel.

2) According to the characteristics of the device (slave), configure the parameters of the DMA channel.

3) When the DMA transfer is to be performed, a descriptor (descriptor) for identifying the current transfer (transaction) is obtained.

4) Submit the current transfer (transacTIon) to the dma engine and start the transfer.

5) Wait for the transmission (transacTIon) to end.

Then, repeat 3 to 5.

The above 5 steps are relatively intuitive and easy to understand except 3 which are a bit difficult to understand. For details, please refer to the following introduction.

3.1 Apply for DMA channel

any consumer (documentation[1]It is called client in Chinese, and it can also be called slave driver. The meanings are similar, and there is no special distinction.) Before starting DMA transmission, you must apply for a DMA channel (for the concept of DMA channel, please refer to[2]in the introduction).

The DMA channel (represented by the “struct dma_chan” data structure in the kernel) is provided by the provider (or DMA controller) and used by the consumer (or client). For the consumer, there is no need to care about the specific content of the data structure (we will introduce it in detail in the introduction of the dmaengine provider).

Consumers can apply for DMA channels through the following APIs:

struct dma_chan *dma_request_chan(struct device *dev, const char *name);

This interface will return the dma channel named name bound to the specified device (dev). The provider and consumer of dma engine can use the match table of device tree, ACPI or struct dma_slave_map type to provide this binding relationship. For details, please refer to the introduction in chapter XXXX.

Finally, the dma channel obtained by the application can be released through the following API when it is not needed:

void dma_release_channel(struct dma_chan *chan);

3.2 Configure the parameters of the DMA channel

After the driver applies for a DMA channel for itself, it needs to configure the channel according to its actual situation and the capabilities of the DMA controller. The configurable content is represented by the struct dma_slave_config data structure (for details, please refer to the introduction in section 4.1). After the driver fills them into a struct dma_slave_config variable, it can call the following API to tell the DMA controller this information:

int dmaengine_slave_config(struct dma_chan *chan, struct dma_slave_config *config)

3.3 Get the transmission description (tx descriptor)

DMA transfer is an asynchronous transfer. Before starting the transfer, the slave driver needs to submit some information of the transfer (such as src/dst buffer, transfer direction, etc.) to the dma engine (essentially the dma controller driver), and the dma engine confirms After okay, return a descriptor (abstracted by struct dma_async_tx_descriptor). After that, the slave driver can use the descriptor as a unit to control and track the transfer.

For the struct dma_async_tx_descriptor data structure, please refer to the introduction in section 4.2. Depending on the transmission mode, the slave driver can use the following three APIs to obtain the transmission descriptor (for details, please refer to DocumentaTIon/dmaengine/client.txt[1]in the description):

struct dma_async_tx_descriptor *dmaengine_prep_slave_sg(

struct dma_chan *chan, struct scatterlist *sgl,

unsigned int sg_len, enum dma_data_direction direction,

unsigned long flags);

struct dma_async_tx_descriptor *dmaengine_prep_dma_cyclic(

struct dma_chan *chan, dma_addr_t buf_addr, size_t buf_len,

size_t period_len, enum dma_data_direction direction);

struct dma_async_tx_descriptor *dmaengine_prep_interleaved_dma(

struct dma_chan *chan, struct dma_interleaved_template *xt,

unsigned long flags);

dmaengine_prep_slave_sg is used for DMA transfer between the “scatter gather buffers” list and the bus device, the parameters are as follows:

Note 3: For scatterlist we have[3][2]It has been mentioned, and there will be a special article to introduce it in the follow-up, which is not listed here for the time being.

chan, the dma channel used for this transmission.

sgl, the address of the “scatter gather buffers” array to transmit;

sg_len, the length of the “scatter gather buffers” array.

direction, the direction of data transmission, for details, please refer to the definition of enum dma_data_direction (include/linux/dma-direction.h).

flags, which can be used to pass some additional information to the dma controller driver, including (for details, please refer to the definition starting with DMA_PREP_ in enum dma_ctrl_flags):

DMA_PREP_INTERRUPT, tells the DMA controller driver to generate an interrupt after the transfer is completed, and call the callback function provided by the client (you can provide the callback function by setting the relevant fields in the struct dma_async_tx_descriptor pointer after the function returns. For details, please refer to 4.2 introduction to the subsection);

DMA_PREP_FENCE, tells the DMA controller driver that subsequent transfers depend on the result of this transfer (so the controller driver will carefully organize the sequence between multiple dma transfers);

DMA_PREP_PQ_DISABLE_P, DMA_PREP_PQ_DISABLE_Q, DMA_PREP_CONTINUE, PQ-related operations, TODO.

dmaengine_prep_dma_cyclic is often used in audio and other scenarios. In the process of dma transmission of a certain length (buf_addr&buf_len), every time a certain byte (period_len) is transmitted, a callback function for the completion of the transmission will be called. The parameters include:

chan, the dma channel used for this transmission.

buf_addr, buf_len, the buffer address and length of the transmission.

period_len, how often (in bytes) the callback function is called. It should be noted that buf_len should be an integer multiple of period_len.

direction, the direction of data transmission.

dmaengine_prep_interleaved_dma can perform discontinuous and interleaved DMA transmission, which is usually used in image processing, Display and other scenarios. For details, please refer to the definition and explanation of the struct dma_interleaved_template structure (I will not introduce it in detail here. When you need it, you can learn it again. okay).

3.4 Initiate transfer

After obtaining the transfer descriptor through the API described in Section 3.3, the client driver can put the descriptor on the transfer queue through the dmaengine_submit interface, and then call the dma_async_issue_pending interface to start the transfer.

The prototype of dmaengine_submit is as follows:

dma_cookie_t dmaengine_submit(struct dma_async_tx_descriptor *desc)

The parameter is the transfer descriptor pointer, and returns a cookie that uniquely identifies the descriptor for subsequent tracking and monitoring.

The prototype of dma_async_issue_pending is as follows:

void dma_async_issue_pending(struct dma_chan *chan);

The parameter is dma channel, and there is no return value.

Note 4: From the characteristics of the above two APIs, the kernel dma engine encourages the client driver to submit multiple transfers at a time, and then the kernel (or the dma controller driver) completes these transfers uniformly.

3.5 Waiting for the end of the transfer

After the transfer request is submitted, the client driver can obtain the transfer completion message through the callback function. Of course, it can also use APIs such as dma_async_is_tx_complete to test whether the transfer is complete. No further details.

Finally, if you can’t wait, you can also use APIs such as dmaengine_pause, dmaengine_resume, and dmaengine_terminate_xxx to pause and terminate the transmission. For details, please refer to the kernel document[1]and source code.

4. Important data structure description

4.1 struct dma_slave_config

contains all possible parameters required to complete a DMA transfer, which are defined as follows:

/* include/linux/dmaengine.h */

struct dma_slave_config {

enum dma_transfer_direction direction;

phys_addr_t src_addr;

phys_addr_t dst_addr;

enum dma_slave_buswidth src_addr_width;

enum dma_slave_buswidth dst_addr_width;

u32 src_maxburst;

u32 dst_maxburst;

bool device_fc;

unsigned int slave_id;

};

direction, indicating the direction of the transfer, including (for details, please refer to the definition and comments of enum dma_transfer_direction):

DMA_MEM_TO_MEM, memory to memory transfer;

DMA_MEM_TO_DEV, memory to device transfer;

DMA_DEV_TO_MEM, device to memory transfer;

DMA_DEV_TO_DEV, device-to-device transfer.

Note 5: The controller does not necessarily support all DMA transfer directions, depending on the provider’s implementation.

Note 6: Refer to the introduction in Chapter 2, the transfer of MEM to MEM, generally does not directly use the API provided by the dma engine.

src_addr, when the transmission direction is dev2mem or dev2dev, the position of the read data (usually a fixed FIFO address). For channels of type mem2dev, this parameter does not need to be configured (it will be specified for each transmission);

dst_addr, when the transmission direction is mem2dev or dev2dev, the location of the written data (usually a fixed FIFO address). For channels of type dev2mem, this parameter does not need to be configured (it will be specified for each transmission);

src_addr_width, dst_addr_width, the width of src/dst address, including 1, 2, 3, 4, 8, 16, 32, 64 (bytes), etc. (for details, please refer to the definition of enum dma_slave_buswidth).

src_maxburst, dst_maxburst, the maximum transmittable burst size of src/dst (refer to[2]In the introduction about burst size), the unit is src_addr_width/dst_addr_width (note, not byte).

device_fc, when the peripheral is a Flow Controller, this field needs to be set to true. In the design of the connection method between DMA and external devices in the CPU, the module that determines whether the DMA transfer ends, called flow controller, DMA controller or external device, can be used as flow controller, depending on the design of the peripheral and DMA controller. The principle, signal connection method, etc. are not described in detail (interested students can refer to[4]in the introduction).

slave_id, the external device tells the dma controller who it is through the slave_id (generally corresponding to a request line). Many dma controllers do not distinguish slaves. As long as you give it information such as src, dst, and len, it can transmit, so slave_id can be ignored. And some controllers must clearly know which peripheral is the object of this transfer, and must provide the slave_id (as for how to provide it, it can be related to the hardware and driver of the dma controller, and it needs to be treated in specific scenarios).

4.2 struct dma_async_tx_descriptor

A transfer descriptor is used to describe a DMA transfer (similar to a file handle). After the client driver submits its transfer request to the dma controller driver through the API introduced in 3.3, the controller driver will return a descriptor to the client driver.

After the client driver obtains the descriptor, it can use it as a unit to perform subsequent operations (start transmission, wait for the completion of transmission, etc.). You can also provide your own callback function to the controller driver through the descriptor.

The definition of the transfer descriptor is as follows:

struct dma_async_tx_descriptor {

dma_cookie_t cookie;

enum dma_ctrl_flags flags; /* not a ‘long’ to pack with cookies */

dma_addr_t phys;

struct dma_chan *chan;

dma_cookie_t (*tx_submit)(struct dma_async_tx_descriptor *tx);

int (*desc_free)(struct dma_async_tx_descriptor *tx);

dma_async_tx_callback callback;

void *callback_param;

struct dmaengine_unmap_data *unmap;

#ifdef CONFIG_ASYNC_TX_ENABLE_CHANNEL_SWITCH

struct dma_async_tx_descriptor *next;

struct dma_async_tx_descriptor *parent;

spinlock_t lock;

#endif

};

cookie, an integer, used to track this transfer. Under normal circumstances, the dma controller driver will maintain an incrementing number internally. Whenever the client obtains the transmission description (refer to the introduction in 3.3), the number will be assigned to the cookie and then incremented by one.

Note 7: We will introduce the usage scenarios of cookies in detail in subsequent articles.

flags, the flags at the beginning of DMA_CTRL_, including:

DMA_CTRL_REUSE, indicating that this descriptor can be reused until it is cleared or released;

DMA_CTRL_ACK, if the flag is 0, it means that it cannot be reused temporarily.

phys, the physical address of the descriptor?? I don’t understand!

chan, the corresponding dma channel.

tx_submit, the callback function provided by the controller driver, is used to submit the modified descriptor to the to-be-transmitted list. Usually called by the dma engine, the client driver will not directly deal with this interface.

desc_free, the callback function used to release the descriptor, provided by the controller driver, called by the dma engine, and the client driver will not directly deal with this interface.

callback, callback_param, the callback function (and its parameters) when the transmission is completed, provided by the client driver.

The other parameters that follow, the client driver does not need to be concerned, and will not be described for the time being.

5. Reference Documentation

[1] Documentation/dmaengine/client.txt

[2] Linux DMA Engine framework(1)_Overview

[3] Linux MMC framework(2)_host controller driver

[4] https://forums.xilinx.com/xlnx/attachments/xlnx/ELINUX/10658/1/drivers-session4-dma-4public.pdf

That’s all for today’s introduction, do you understand?

The Links:   CM150TU-12F Y50KAC0T20D00127