A deeper look at hardware and software of TI 8168 EVM
This SoC is very complex and I'd like to point out something special.
1) Video subsystem: actually it is a combination of two M3 core and 2/3 HDVICP2 co-processors. HDVICP2 plays a similar role as SGX530 3D unit except it improve video stream not 3D graphic.
2) ARM core: there is a floating -point unit called NEON, so it accept float calculating binary. ARM plays the role of administrator for the whole system including M3, DSP cores.
3) DSP core:
graph1. DSP core in 8168
from the above, we can see first it have a bus connection direct to video subsystem, so it is faster to transfer data between M3 and DSP, instead of ARM and DSP, and the data paths are different. Second, DSP have DMA unit and capability to access peripherals which could be used by software. Third, there are independent register sets FILE A and FILE B, So software need to be optimized for parallel computation to make full use the DSP core. Last, keep the cache size in mind when you write programs on DSP, and there are a small buffer called SPLOOP inside DSP core help to execute iteration in parallel so arrange your loop in "C" wisely.
4) M3: Two RISC M3 cores resides in HDVICP2 controlling video accelerating hardware. They are programmable just as DSP and ARM.
5) Communication between cores: ARM manages M3 and DSP cores, it download binaries to M3 and DSP sub-system. Two hardware components are involved for communication: mailbox, spinlock. Following shows that mailbox module are 12 mail boxs with 4 interrupt
to RISC cores. Spinlock module are 64 hardware semaphores.
graph2. mailbox hardware module
6) memory layout: 8168 uses unified memory layout, here we call it L3&L4 address, which is used by the system to accessing devices and ports.
graph3 how cores accessing resources outer L2 cache
The ARM have its own MMU used by linux OS enabling virtual addressing, and the physical address would be translated later to L3 address by a subsystem in ARM which system can not touch. The DSP is similar to ARM, it have its own MMU called DEMMU, and the physical address on DSP is also translated to L3 address by a subsystem in DSP core, which used different mapping from ARM, but luckily for DDR sections physical address are mapped same on ARM and DSP. Physical and L3 address mapping are not changeable on ARM and DSP, only MMUs could be used by OS on DSP and ARM. M3 cores have no MMU so they directly access L3 address. Not all L3/L4 address are reachable by every core, it depends on the hardware interconnect, but address could be found in a table.
1) For each devices, you must find out which core "own" it first. For eg, before RDK1.09, I2C is owned by M3 cores, so you will find some drivers missing in Linux OS, because they are on M3.
2) Many video operations requires high performance is done on M3 to use video related hardware, the ARM only configure and start the components on M3.
3) SDK is used for develop applications for 1080P HD standard.
graph 4 SDK components
It is showed as graph 4 : (1) is the OpenMax framework, which tries to encapsulate all video details in channel style or non-channel style, and let user to use APIs only and suggest user to encapsulate his component in the constraints of OpenMax, just as Android framework. If you are doing video development with HD standard, this framework need to be used. (2) is the components used to build up 2D/3D UI. (3) is 2 special framework(more specially it is something like compiler plus basic software components), to build up application running on DSP quickly, and make ARM program ignores the different that DSP and ARM have. For now I did not confirm whether the video functionality could co-operate with c6run/c6accel, but if you want a test or validation of algorithm, it works well. (4) is the core components utilizing hardware mailbox and spinlock to do inter-processor communication.
4) RDK is used for video surveillance, best for SD standard.
graph5 an example of mcfw in RDK
RDK used mcfw framework which is very similar to OpenMax channel style, but it focus more on the flexibility on multichannel video streams. Up to 16 D1 video could be processed at the same time. mcfw is based on links, from the above we can see the mcfw
framework contains 4 major components: VCAP, VDIS, VENC, VDEC, which containing several links inside. They are expect to be configurated by application programmer.