General overview of architecture of TI's Davinci 8168 SoC

    Davinci 8168 is a very interesting arm-cortex A8 SOC chip which contains many co-processor unit such as DSP core, M3 core, which brings a new height of integration level  to embbed devices. The biggest difference comparing to traditional solution such as using socket communication between separate boards or using board level buses, is that the functionality unit are all in one processing unit which sharing memory resulting significant communication cost savings. With traditional solution, it would be hard to combine so much computing units into one system, the hardware and working flow design would dizzy your head, while with this chip, the hardware is almost ready to use.  What make thing not so perfect is that the complexity of software raised for that so many units should be managed in OS  and made  working together with high performance. Following is some knowledge from my understanding.


1 Hardware overview:

   

                 Graph 1.     hardware blocks overview of TI8168 overview

      From the above, we can see there are about 4 parts that have computation capabilities, 1 is ARM,  2 is DSP, 3 media processors, 4 graphic accelerator. From the user or software's point of view,  the 1 and 2 provide large programming and computation potentiality, while 3 and 4 provide limited programming capabilities in most cases. In another word they are intended to do certain limited things such as h.264 encoding. The whole system have a NoC(network on chip) L3 link, which using packet-based protocol to transfer date between different units. But what the OS saw,  is that they are still physical address so this layer should be transparent. Memory layout at hardware level could be got from 8168's hardware chip datesheet, to change the memory mapping you need to take care of Uboot and Linux kernel.

     Note: In fact, the video system contains two block: VPSS and VIDEO. VPSS is the subsystem doing video capture, deinterlacing, scaling, noise filtering etc. VIDEO is the subsystem doing encoding and decoding. Actually, they are software containing tiny real time OS and APP which run on several M3 co-processors and controlling more hardware accelator such as hardware encoder--HDVICP2,  and in SDK they are not intend to expose their details to application programmer for the reason that it is very complex and hardware related. But if you want, in RDK you can found the source code running as firmware.

    summary: this SOC provide: programmable ARM and DSP core, configurable hardware video, media and graphic subsystems. Actually there are about 5 cores inside:1 arm+ 1 DSP + 2 or 3 M3 + 1 graphic. All are programmable running at very high speed( > 500 Mhz), but in most case you would only need to building up program on ARM and DSP.


2 Software overview:

    From the above we can see the key problem is to manage the sub systems and synchronize them. The system use linux 2.6 on cortex-A8 as host OS taking the role of controller of all hardware(directly or indirectly), and the whole booting process would be: uboot->linux kernel->rootfs->optionally boot up co-processors. Linux is playing the mastering everything role in the system, while DSP actually have its own small OS.

   How to develop APP on that? generally speaking there are 4 methods:

(1) c6run:

    Ti have provided a compiler in a open source project, which accepts parameters very likely to gcc. I have tried it, the very nice things is that it is so similar to gcc that I can write a Makefile to compile one copy of source codes into 3 outputs on developing workstation: x86, arm, arm+dsp, and they can be directly run on related linux system. This is excellent, which means you can deploy your algorithm very fast on DSP, and check whether the performance satisfying.

    How it works? Basically the DSP compiler, I mean c6run compiler would compile all the codes and archive them into a static .lib file that arm gcc tool chain could link. So other codes on arm side could just include the header files as if it is a normal function on arm, and at link stage the functions are link to the .lib files which contains communication and dsp binary codes on DSP core. In another work the communication and SYNC details were hide by the c6run compiler, the codes that would run on DSP seems just as a library. There is still another way that make the DSP codes works alone instead of being a library of ARM side code, but I have not tried that. 

    But till now I have found following limitation in this way: Unable to start other co-processors, hard to debug codes on DSP(you can't use CCS or emulator to debug it as traditional DSP development). I am not sure if these would improve in the future. Blow is the calling process of C6RunLib style.

    Note: you program could ignore the existence of DSP and SYSLINK framework completely, the compiler have hide it and wrapper the function call on ARM to a message in SYSLINK, and binary code on DSP would be called which is also automatically build by the C6Run compiler. The communication is done by sharing a memory zone in DDR between DSP and ARM.


               ARM                                                                                                          DSP

                 |

normal arm APP process

calling algorithm function A():

wrapper the function A() to internal function A_syslink()

send message to DSP in SYSLINK framework  ----------------------------------------->|

                                                                               recv the message, get the function and related parameters

                                                                                            execute the DSP version code of the function

                                                                                 return it by send message to ARM in SYSLINK framework

get the result from SYSLINK, return it to the caller.<--------------------------------------|

 continue the code on ARM

                    |             

                                         Graph 2. Executing process of C6RunLib of C6EZRun


(2) c6accel

    It is very similar to C6RunLib that it also appears like a library to ARM program, but it have syslink.ko than just cmem.ko, which require the the DSP program conform to XDAIS standards, which means you are not able to deploy you algorithm quickly. But it is good for DSP program development that you can write and debug them in CCS and then get the compiled output library. More important with this framework other existing hardware or software units such as hardware based video processing could be bring up. link: http://processors.wiki.ti.com/index.php/C6EZAccel


(3) OpenMax  

    Basically OpenMax is a encapsulation at the same level as Android component organization, But till now it is a good way to start developement because some components was ready to use in SDK5.0.3, So you can skip some settings of co-processors. Generally speaking OpenMax is a software standard to let different components communicate easily, so it uses concepts "channel"  "basket" to form a data link. The link could be set underneath, I mean between hardware co-processors, or between co-processor and arm cortax-a8 core.

    It is in tunnel form in old version of SDK, but now I saw there is a new manner to function to call the components in non-tunnel mode, which is very similar to simple linux API and ioctl calls. But still it have some limitation to the hardware you are using, I mean the peripheral devices, especially the video capture decode IC.

    Note: from SDK 5.0.1 it require a 1080P IO sub board, otherwise video inputting would be a problem.  When I tried to migrate the old driver, it seems driver is  at #include <linux/vps_capture.h>,  but encoding and other things goto M3 core via syslink, and at sdk503 it is bound to 3 channel, so I guess making whole thing work need to looking for M3 code, that is too much work and hard if Ti did not provide assistance, maybe at future it would released another architecture to unbound the capturing and encoding in linux kernel, so I decide to give up that now.


(4) RDK

    Built for multichannel vision usage, especially video recorder. It is quite similar to channel style usage of OpenMax, and using the Framework called MCFW(Multi-Channel Framework). It have all the source code and tools that M3 core runs, so it would be easier if you want to modify the hardware and use it to build multi-channel D1 APP.It is based on the "links" object, that is what I am using.


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值