zero copy architecture in RDK of TI 8168 EVM
Work is based on EVM: 8168SoC + 5158VideoCapture + 1GB DDR2.
RDK is an open source SDK which enable you to modify and rebuild uboot, kernel, vpss package, dsp/m3 program, and the core concepts in it are links and mcfw(multichannel framework). A link is a software entity running on a core with input queue and output queue, also with a unique ID to let other link and the system find it. You can consider a link as a basic component in the software architecture, so links could be placed in different order or topology to change the system behavior. Links could be connected to each other by specifying a link's previous link and related queue number. For eg, link A could be connect to link B by setting link B's previous ID to A and queue number to 0. mcfw is built based on links which using predefined link topology resulting much easier configuration for the whole system. A link contains its own thread to process data and handling control commands.
SW Mosaic IPC OUT (M3)
(SC5 422) |
| IPC IN (M3)
On-Chip HDMI Encode
IPC BITS OUT (M3)
IPC BITS IN (A8)
graphic 1. links example
The graphic 1 shows a system containing several links to work as capture->display and encode. It is a combination of links. Changing the link order could make a custom usecase.
2 how link worksWhen a link is setup correctly and started, it would automatically pull data from previous link and push the processed data to next link. Because each core have links on it, there needs to be a way to transfer messages between them, this is called IPC(inter processor communication), which enables ARM working with DSP and M3 cores.
graph 2: Link overview
As the graphic says, links is built upon traditional software components used on TI Davince serials, syslink. Links resides on different cores so they need syslink/IPC to provide communication across different OS and cores. Syslink use shared memory which different core can access to transfer messages. Some uncached shared memory was used to format a message queue infrastructure, Some are used as heap from which big frame buffer could be allocated.
3 memory layout to make link work
The physical memory on EVM board is divided to several sections, by default 1GB EVM is divided as following:
(Unrelated sections were not listed here, SR = Shared Region.)
# 0x8000 0000, 256MB, Linux on ARM
# 0x9000 0000, 87MB, SR1, used for transfer bit stream between ARM and M3 video subsystem(codec and decodec).
# 0x9570 0000, 1MB, SR2, used for IPC M3 List MP.(Cached)
# 0xA800 0000, 361MB, SR3, used for frame buffers.
# 0xBE90 0000, 16MB, SR0, used for message queue and IPC List MP.(Uncached)
(Note: DSP and ARM core have its own MMU, but for DDR they are seeing the same physical address)
The communication is designed to transfer message encapsulated in message queue, which can also called mailbox. If ARM want to send a command to M3 or DSP, ARM need to encapsulate the information into a message and copy it to the SR0(Note, mailbox and lock
is actually implemented with hardware support, the chip have 12 "mailboxes" with related interrupts and register, the chip also have 64 hardware "spinlocks" and related register). Frame buffers are organized into lists which resides in SR0 and SR2, while
frame buffers resides in SR3, and bit stream from codec resides in SR1. SR2 is cached So it was only used between cores which have same type, in 8168 they are M3 cores.
The section divided here does not means hardware and firmware banned a core accessing the sections not belonging to them, but if it did there is risk to break the system. I have not try to change the memory layout, but it is possible by modifying uboot, kernel, and VPSS address mapping. On ARM you can access the shared region which was belonging to DSP and M3, But simply mmap() would give you a wrong virtual address. Another thing it refers is: message queue used SR0 which is uncached and only have 16MB, So do not put large data in the message. If you want to transfer a large set of data, allocated it in other region and encapsulate a pointer pointering to the allocated buffer into the message queue. Following is a demonstration that a link(not a codec link) used to tell other link on another different core that it have new data.
graphic 3 example to transfer frames between links
There are 3 pairs of data and code sections reserved for M3 VPSS, M3 Video, DSP, they are private for each core. What a user need to take care is that, the data section is 2MB, and data section is 10MB/12MB, so it suits for algorithm not memory scan likewise application.
4 Zero copy for frames
The framework allocate all frame buffers in SR3, and chains the pointer to it into a list. So what changes and transferred are only list and messages, the frame buffer were never copied except "duplication link". One frame is hold and accessed only by one link at one time.