zero copy architecture in RDK of TI 8168 EVM

原创 2012年03月20日 16:59:41
1 the RDK overview:

    Work is based on EVM: 8168SoC + 5158VideoCapture + 1GB DDR2.

    RDK is an open source SDK which enable you to modify and rebuild uboot, kernel, vpss package, dsp/m3 program, and the core concepts in it are links and mcfw(multichannel framework).  A link is a software entity running on a core with input queue and output queue, also with a unique ID to let other link and the system find it. You can consider a link as a basic component in the software architecture, so links could be placed in different order or topology to change the system behavior.  Links could be connected to each other by specifying a link's previous link and related queue number. For eg, link A could be connect to link B by setting link B's previous ID to A and queue number to 0. mcfw is built based on links which using predefined link topology resulting much easier configuration for the whole system. A link contains its own thread to process data and handling control commands.

                            Capture
                                  |                                                                  
                                 DEI
                                  |
                                  NF
                                  |
                                 DUP
                                 | |
        ------------------------------------------
        |                                                   |                                                      
     SW Mosaic                              IPC OUT (M3)
     (SC5 422)                                      |
        |                                              IPC IN  (M3)
        |                                                   |
    On-Chip HDMI                         Encode
      1080p60                                        |
                                                IPC BITS OUT (M3)
                                                            |
                                                 IPC BITS IN  (A8)

           graphic 1. links example

    The graphic 1 shows a system containing several links to work as capture->display and encode. It is a combination of links. Changing the link order could make a custom usecase.


2 how link works

    When a link is setup correctly and started, it would automatically pull data from previous link and push the processed data to next link. Because each core have links on it, there needs to be a way to transfer messages between them, this is called IPC(inter processor communication), which enables ARM working with DSP and M3 cores.


      graph 2: Link overview

    As the graphic says, links is built upon traditional software components used on TI Davince serials, syslink. Links resides on different cores so they need syslink/IPC to provide communication across different OS and cores. Syslink use shared memory which different core can access to transfer messages. Some uncached shared memory was used to format a message queue infrastructure, Some are used as heap from which big frame buffer could be allocated.



3 memory layout to make link work
     The physical memory on EVM board is divided to several sections, by default 1GB EVM is divided as following:

(Unrelated sections were not listed here, SR = Shared Region.)
#  0x8000 0000, 256MB, Linux on ARM
#  0x9000 0000, 87MB, SR1, used for transfer bit stream between ARM and M3 video subsystem(codec and decodec).
#  0x9570 0000, 1MB, SR2, used for IPC M3 List MP.(Cached)
#  0xA800 0000, 361MB, SR3, used for frame buffers.
#  0xBE90 0000, 16MB, SR0, used for message queue and IPC List MP.(Uncached)

(Note: DSP and ARM core have its own MMU, but for DDR they are seeing the same physical address)


    The communication is designed to transfer message encapsulated in message queue, which can also called mailbox. If ARM want to send a command to M3 or DSP, ARM need to encapsulate the information into a message and copy it to the SR0(Note, mailbox and lock is actually implemented with hardware support, the chip have 12 "mailboxes" with related interrupts and register, the chip also have 64 hardware "spinlocks" and related register). Frame buffers are organized into lists which resides in SR0  and SR2, while frame buffers resides in SR3, and bit stream from codec resides in SR1. SR2 is cached So it was only used between cores which have same type, in 8168 they are M3 cores.
     The section divided here does not means hardware and firmware banned a core accessing the sections not belonging to them, but if it did there is risk to break the system. I have not try to change the memory layout, but it is possible by modifying uboot, kernel, and VPSS address mapping. On ARM you can access the shared region which was belonging to DSP and M3, But simply mmap() would give you a wrong virtual address. Another thing it refers  is: message queue used SR0 which is uncached and only have 16MB, So do not put large data in the message. If you want to transfer a large set of data, allocated it in other region and encapsulate a pointer pointering to the allocated buffer into the message queue. Following is a demonstration that a link(not a codec link) used to tell other link on another different core that it have new data.


                              graphic 3 example to transfer frames between links

    There are 3 pairs of data and code sections reserved for M3 VPSS, M3 Video, DSP, they are private for each core. What a user need to take care is that, the data section is 2MB, and data section is 10MB/12MB, so it suits for algorithm not memory scan likewise application.



4 Zero copy for frames
    The framework allocate all frame buffers in SR3, and chains the pointer to it into a list. So what changes and transferred are only list and messages, the frame buffer were never copied except "duplication link". One frame is hold and accessed only by one link at one time.

基于TI 8168 RDK 4.0 多路解码

总的流程 如下: 读文件代码: static Void *readRevframeThr(Void * prm){ static UInt32 flag =0; static UInt...
  • liqinghan
  • liqinghan
  • 2015年07月21日 15:23
  • 1500

什么是Zero-Copy?

概述 考虑这样一种常用的情形:你需要将静态内容(类似图片、文件)展示给用户。那么这个情形就意味着你需要先将静态内容从磁盘中拷贝出来放到一个内存buf中,然后将这个buf通过socket传输给用户,进...
  • u013256816
  • u013256816
  • 2016年09月19日 21:28
  • 5321

【基础知识思考整理 】Zero-copy原理理解(用户角度)

基础知识思考整理 http://write.blog.csdn.net/mdeditor#!postId=52836140 关于Zero-Copy的原理。主要参照的是一篇03年的文章[1](L...
  • aganlengzi
  • aganlengzi
  • 2016年11月25日 11:53
  • 725

CUDA零拷贝内存(zerocopy memory)

为了实现CPU与GPU内存的共享,cuda采用了零拷贝内存,它值固定内存的一种,当然,也就是实际存储空间实在cpu上。 零拷贝内存的延迟高,在进行频繁的读写操作时尽量少用,否则会大大降低性能。 /...
  • Rong_Toa
  • Rong_Toa
  • 2017年11月29日 16:07
  • 118

DM8168添加DSP音频编解码算法--集成现有voice或audio codec

介绍如何集成音频算法到dm816x dvr_rdk ezsdk
  • guo8113
  • guo8113
  • 2014年11月05日 12:09
  • 2756

Linux的零拷贝技术(zero-copy)

如果应用程序可以直接访问网络接口存储,那么在应用程序访问数据之前存储总线就不需要被遍历,数据传输所引起的开销将会是最小的。应用程序或 者运行在用户模式下的库函数可以直接访问硬件设备的存储,操作系统内核...
  • zuijinhaoma8
  • zuijinhaoma8
  • 2015年08月22日 11:54
  • 2500

TI DSP库VLIB介绍及DM8168DSP库使用

VLIB介绍、DM8168DSP库使用
  • guo8113
  • guo8113
  • 2014年04月16日 13:51
  • 4921

Zero Copy 零拷贝 简介

 许多web应用都会向用户提供大量的静态内容,这意味着有很多data从硬盘读出之后,会原封不动的通过socket传输给用户。这种操作看起来可能不会怎么消耗CPU,但是实际上它是低效的:kerna...
  • u011591115
  • u011591115
  • 2013年09月26日 11:38
  • 1847

Zero Copy(零拷贝)

转载自:http://blog.csdn.net/fyxxq/article/details/20000045               http://www.cnblogs.com/metoy/...
  • zero__007
  • zero__007
  • 2016年12月22日 19:25
  • 388

有效的数据传输zero copy

转载说明:http://sesame84.iteye.com/blog/2094923 全文转载,向原译者致敬。 有效的数据传输zero copy 博客分类:  翻译 翻...
  • shaokai132333
  • shaokai132333
  • 2017年03月08日 21:00
  • 216
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:zero copy architecture in RDK of TI 8168 EVM
举报原因:
原因补充:

(最多只允许输入30个字)