手把手教你如何配置DMA[共享]

如果你还没有看过《嵌入式内功.葵花宝典》,那么在阅读本这篇推荐的文章之前有必要先去看看吧,你会有新的体会的。

Direct Memory Access

  • DMA...que?           /*到底啥是DMA*/
  • DMA registers      /*DMA寄存器*/
  • Some DMA routines   /*DMA的驱动*/
  • DMA demo       /*DMA的实例*/

1. DMA … que?

Direct Memory Access (DMA) is fast way of copying data from one place to another. Or, rather, a way of transferring data fast; as it can be used for copying data, but also filling memory. When you activate DMA the so-called DMA controller basically takes over the hardware (the CPU is actually halted), does the desired transfer and hands control back to the CPU before you even knew it was missing.

There are 4 DMA channels. Channel 0 has the highest priority. It is used for time-critical operations and can only be used with internal RAM. Channels 1 and 2 are used to transfer sound data to the right sound buffers for playback. The lowest priority channel, channel 3, is for general purpose copies. One of the primary uses for this channel is loading in new bitmap or tile data.

2. DMA registers

Every kind of transfer routine needs 3 things: a source, a destination and the amount of data to copy. The whence, whither and how much. For DMA, the source address is put into REG_DMAxSAD and destination address into REG_DMAxDAD. A third register, REG_DMAxCNT, not only indicates the amount to transfer, but also controls other features possible for DMA, like when it should start the transfer, chunk-size, and how the source and destination addresses should be updated after each individual chunk of data. All the DMA registers are 32bits in length, though they can be divided into two 16bit registers if so desired. Those of channel 0 start at0400:00B0h; subsequent channels start at an offset of 12 (see table 1).

Table 1: DMA register addresses
reg function address
REG_DMAxSAD source 0400:00B0h + 0Ch·x
REG_DMAxDAD destination 0400:00B4h + 0Ch·x
REG_DMAxCNT control 0400:00B8h + 0Ch·x

2.1. DMA controls

The use of the source and destination registers should be obvious. The control register needs some explaining. Although the REG_DMAxCNT registers themselves are 32bits, they are often split into two separate registers: one for the count, and one for the actual control bits.

REG_DMAxCNT @ 0400:00B8+12 x
1F 1E 1D 1C 1B 1A 19 18 17 16 15 14 13 12 11 10 F E D C B A 9 8 7 6 5 4 3 2 1 0
En I TM - CS R SA DA - N

bits name define description
00-0F N  Number of transfers.
15-16 DA _DMA_DST_INC, DMA_DST_DEC, DMA_DST_FIXED, DMA_DST_RESET Destination adjustment.
  • 00: increment after each transfer (default)
  • 01: decrement after each transfer
  • 10: none; address is fixed
  • 11: haven't used it yet, but apparently this will increment the destination during the transfer, and reset it to the original value when it's done.
17-18 SA _DMA_SRC_INC, DMA_SRC_DEC, DMA_SRC_FIXED, Source Adjustment. Works just like the two bits for the destination. Note that there is no DMA_SRC_RESET; code 3 for source is forbidden.
19 R DMA_REPEAT Repeats the copy at each VBlank or HBlank if the DMA timing has been set to those modes.
1A CS _DMA_16, DMA_32 Chunk Size. Sets DMA to copy by halfword (if clear) or word (if set).
1C-1D TM _DMA_NOW, DMA_AT_VBLANK, DMA_AT_HBLANK, DMA_AT_REFRESH Timing Mode. Specifies when the transfer should start.
  • 00: start immediately.
  • 01: start at VBlank.
  • 10: start at HBlank.
  • 11: Never used it so far, but here's how I gather it works. For DMA1 and DMA2 it'll refill the FIFO when it has been emptied. Count and size are forced to 1 and 32bit, respectively. For DMA3 it will start the copy at the start of each rendering line, but with a 2 scanline delay.
1E I DMA_IRQ Interrupt request. Raise an interrupt when finished.
1F En DMA_ON Enable the DMA transfer for this channel.

2.2. Source and destination addresses

The registers for source and destination addresses work just as you'd expect: just put in the proper addresses. Oh, I should tell you that the sizes for the source and destination addresses are 28 and 27 bits wide, respectively, and not the full 32. This is nothing to worry about though: you can't access addresses above 1000:0000h anyway. For destination addresses you can't use the section above 0800:0000h. But then, being able to copy to ROM would be kind of strange, wouldn't it?


2.3. DMA flags

The REG_DMAxCNT registers can be split in two parts: one with actual flags, and one for the number of copies to do. Either way will work but you must be careful how the flags are defined: using 32-bit #defines for 16-bit registers or vice versa is not a good idea.

There are options to control what will be the next source and destination addresses when one chunk has been transferred. By default, both will increment so that it works as a copier. But you could also keep the source constant so that it'd work more as a memory fill.

What goes into the lower half of REG_DMAxCNT is the number of transfers. This is the number of chunks, not bytes! Best be careful when using sizeof() or something similar here, missing a factor 2 or 4 is very easy. A chunk can be either 16 or 32 bit, depending on bit 26.


2.4. More on DMA timing

What the immediate DMA does is easy to imagine: it works as soon as you enable the DMA. Well actually it takes 2 cycles before it'll set in basically it'll work immediately. The other timing settings aren't that more difficult conceptually, but there is one point of confusion.

Consider the following situation: you want to do something cool to your otherwise standard backgrounds; specifically, you want to do something that requires the background registers to be updated every scanline. I just said that you can copy data at every HBlank (via the DMA_AT_HBLANK timing flag), which seems perfect for the job. If you think about it for a minute, however, you may ask yourself the following question:

When you set the timing to, say, DMA_AT_HBLANK, does it do all the N copies at the next HBlank, of one copy at each HBlank until the list is done?

There is a crucial difference between the two. The first option seems pointless because all copied would be done at once; if your destination is fixed (like they are for background registers) all copies except the last would be lost. In the case of the second one, how would you do more than one copy per HBlank? Clearly, something's amiss here. There is, on two counts.

For the record, I'm not 100% sure about what I'm going to say here, but I think it's pretty close to what's actually going on. The main thing to realize is that as long as the channel is not enabled (REG_DMAxCNT{1f} is cleared), that channel won't do squat; only after REG_DMAxCNT{1f} has been set will the DMA process be initiated. At the appropriate time (determined by the timing bits) DMA will do all N copies and then shut itself off again.

Unless, that is, the repeat-bit (REG_DMAxCNT{19}) is set. In that case it will keep doing the copies at the right time until you disable the channel yourself.

3. Some DMA routines

While it's not that much trouble to set the three registers manually, it is preferable to hide the direct interaction in subroutines. For example, you could make something like this:

// Plz don't do this
void dma_copy(int 
 
  ch, void* src, 
 
 void* dest, u32 count, u32 mode)
{
    switch(ch)
    {
    case 0:
        // set DMA 0
    case 1:
        // set DMA 1
... // etc
    }
}

You might find this in older code, but it is a bad way of coding. I mean, a function with 5 arguments and a switch block to fill 3 registers while the whole point of DMA is about speed? This isn't just bad, it's downright perverse. There are several simple and not so simple ways to improve such a function in every aspect.

The first order of business: get rid of the switch-block. As a basic rule: if your cases differ only by a single number (or variable), you're doing something wrong. There are a number of ways of fixing this, but the easiest is by mapping a struct array over the DMA registers, similar to what I did for tile memory. After that, you can just select the channel with the channel variable and simply fill in the addresses and flags.

typedef struct tagDMAREC
{
    const void *src;
    void *dst;
    u32 cnt;
} DMAREC;

#define dma_mem ((volatile DMAREC*)0x040000b0)

The following are my three of my DMA routines. First the DMA_TRANSER() macro, which is the overall macro that can be used for anything. Then two routines for general memory copies and fills using 32bit transfers with DMA 3.

// in core.h
#define DMA_TRANSFER(_dst, _src, _count, 
 
 _ch, _mode)    /
do {                                            /
    dma_mem[_ch].cnt= 0;                        /
    dma_mem[_ch].src= (const void*)(_src);      /
    dma_mem[_ch].dst= (void*)(_dst);            /
    dma_mem[_ch].cnt= (_count) | (_mode);       /
} while(0)

// in core.c
void dma_memcpy(void *dst, const void *src, u16 count)
{
    dma_mem[3].cnt = 0; // shut off any previous transfer
    dma_mem[3].src = src;
    dma_mem[3].dst = dst;
    dma_mem[3].cnt = count | DMA_32NOW;
}

void dma_memset(void *dst, volatile u32 src, u16 count)
{
    dma_mem[3].cnt = 0; // shut off any previous transfer
    dma_mem[3].src = (const void*)&src;
    dma_mem[3].dst = dst;
    dma_mem[3].cnt = count | DMA_32NOW | DMA_SRC_FIX;
}

In all cases, I disable any previously operating transfers first. After that it's simply a matter of filling the registers. Now, it so happens that there is a 2 cycle delay before any transfer really begins. This means that you could lose a transfer if you ask for transfers in immediate succession. That's why I'm using functions here rather than macros or inlines: returning from the function takes enough time to allow the transfer to begin. I somewhat doubt that this whole thing would ever become a problem, but by adding the delay we can be absolutely sure of it.


I used to have the following macro for my transfers. It use one of the more exotic capabilities of the preprocessor: the merging-operator ‘##’, which allows you to create symbol names at compile-time. It's scary, totally unsafe and generally unruly, but it does work. The other macro I gave is better, but I still like this thing too.

#define 
 
 DMA_TRANSFER(_dst, _src, _count, _ch, _mode)  /
    REG_DMA##_ch##SAD = (u32)(_src),                  /
    REG_DMA##_ch##DAD = (u32)(_dst),                  /
    REG_DMA##_ch##CNT = (_count) | (_mode)            /

As long as you are using a literal number for _ch it'll form the correct register names. And yes, those comma operators between the statements actually work. They keep the statements separate, and also guard against wrongful nesting just like the do{} while(0) construct does.

3.1. On DMA fills

DMA can be used to fill memory, but there are two problems that you need to be aware of before you try it. The first can be caught by simply paying attention. DMA fills don't work quite in the same way as memset() does. What you put into REG_DMAxSAD isn't the value you want to fill with, but its address!

“Very well, I'll put the value in a variable and put use its address.” Yes, and that brings us to our second problem, a bug which is almost impossible to find. If you try this, you'll find that it doesn't work. Well it fills with something, but usually not what you wanted to fill with. The full explanation is somewhat technical, but basically because you're probably only using the variable's address and not its value, the optimizer doesn't ever initialize it. There is a simple solution, one that we've seen before, make it volatile. Or you can use a (inline) function like dma_memset(), which has its source argument set as volatile so you can just insert a number just as you'd expect. Note that if you remove the volatile keyword there, it'll fail again.


In short: DMA fills need addresses, not direct values. Globals will always work, but if you use local variables or arguments you'll need to make them volatile. Note that the same thing holds true for the BIOS call CpuFastSet().


3.2. DMA; don't wear it out

DMA is fast, there's no question about that. It can be up to ten times as fast as array copies. However, think twice about using it for every copy. First of all, while it is fast, it doesn't quite blow ever other transfer routine out of the water. CpuFastSet() comes within 10% of it for copies and is actually 10% faster for fills. The speed gain isn't that big a deal. Another problem is that it stops the CPU, which can screw up interrupts, causing seemingly random bugs. It does have its specific uses, usually in conjunction with timers or interrupts, but for general copies, you might consider other things as well. CpuFastSet() is a good routine, but tonclib also comes with memcpy16()/32() and memset16()/32() routines that are safer than that, and less restrictions. They are assembly routines, though, so you'll need to know how to assembly, or use libraries.

4. DMA demo

Fig 1: dma_demo palette
Fig 1: palette for dma_demo.

Actually, I've been using dma_memcpy in most of the demos so far, so check out any of them to see how to use it. On second thought, never mind that; I will do a demo because there may be some confusion about how DMA timing and repeat works. In this demo I'm going to update the transparent background color (pal_bg_mem[0]) at every HBlank. Unfortunately, I should point out that the demo is slightly ruined by the fact that I screwed up the defines when I first made it (I had a DMA_SRC_RESET that had the settings belonging to DMA_DST_RESET, but the results are still interesting.

First, we make a palette for every scanline (shown on the right). This is the DMA source. the primary destination is pal_bg_mem[0]. We do 16-bit copies at an HBlank and repeat for each HBlank. The basic flag is

#define DMA_DEMO (DMA_ON | DMA_REPEAT | 
 
 DMA_AT_HBLANK | _DMA_SRC_INC | DMA_16)

I'm going to do something special with the count and destination stuff, so I left those out. The code for the main loop is:

while(1)
{
    key_poll();

    // A: 1 copy, dst fixed
    if(KEY_DOWN_NOW(KEY_A))
        DMA_TRANSFER(pal_bg_mem, pal, 1, 
 
 3, DMA_DEMO | DMA_DST_FIX)
    // B: 1 copy, dst reset
    if(KEY_DOWN_NOW(KEY_B))
        DMA_TRANSFER(pal_bg_mem, pal, 1, 
 
 3, DMA_DEMO | DMA_DST_RESET)
    // L: double copy on HBLANK, dst fixed
    else if(KEY_DOWN_NOW(KEY_L))
        DMA_TRANSFER(pal_bg_mem, pal, 2, 
 
 3, DMA_DEMO | DMA_DST_FIX)
    // R: double copy on HBlank; dst reset
    else if(KEY_DOWN_NOW(KEY_R))
        DMA_TRANSFER(pal_bg_mem, pal, 2, 
 
 3, DMA_DEMO | DMA_DST_RESET)
    else
        pal_bg_mem[0]= CLR_BLACK;

    vid_vsync();
    // disable DMA after all lines are done
    DMA_DISABLE(3);
}
Fig 2: dma_demo results
Fig 2: results of the buttons of dma_demo.

This code shouldn't be too hard to understand. The buttons A, B, L, R invoke different kinds of HBlank DMA. When nothing's pressed the screen is black. To make sure the source pointer doesn't run off the deep end, I disable the DMA at the VBlank. If you don't do this the source address will continue to increase forever. Think about what each button should do, then look at fig 2 below to see is you can understand the results. Both A and B give the correct results (except that it is offset by 1 scanline, because HBlank n follows HDraw n). Because there's only one copy, there is no difference between fixed and reset destinations. For L and R the results are quite different. Because we do two copies each HBlank now, only every other palette entry is used. With a fixed destination both copies go into pal_bg_mem[0], and you end up with the even ones. DMA_DST_RESET copies into entries 0 and 1 then resets, so you'll see the odd entries.

Oh, in case you're wondering what the funky lines at the bottom half of the L and R results are, the palette I've used is only 160 entries long so you run out of entries half-way through the screen. The rest is just the part of IWRAM that follows it.

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值