H.264 video streaming system on Embedded platform

H.264 video streaming system on Embedded platform



The adoption of technological products like digital television and video conferencing has made video streaming an active research area.

This report presents the integration of a video streamer module into a baseline H.264/AVC encoder running a TMSDM6446EVM embedded platform. The main objective of this project is to achieve real-time streaming of the baseline H.264/AVC video over a local area network (LAN) which is a part of the surveillance video system.

The encoding of baseline H.264/AVC and the hardware components of the platform are first discussed. Various streaming protocols are studied in order to implement the video streamer on the DM6446 board. The multi-threaded application encoder program is used to encode raw video frames into H.264/AVC format onto a file. For the video streaming, open source Live555 MediaServer was used to stream video data to a remote VLC client over LAN.

Initially, file streaming was implemented from PC to PC. Upon successfully implementation on PC, the video streamer was ported to the board. The steps involved in porting the Live555 application were also described in the report. Both unicast and multicast file streaming were implemented in the video streamer.

Due to the problems of file streaming, the live streaming approach was adopted. Several methodologies were discussed in integrating the video streamer and the encoder program. Modification was made both the encoder program and the Live555 application to achieve live streaming of H.264/AVC video. Results of both file and live streaming will be shown in this report. The implemented video streamer module will be used as a base module of the video surveillance system.

Chapter 1: Introduction
1.1. Background

Significant breakthroughs have been made over the last few years in the area of digital video compression technologies. As such applications making use of these technologies have also become prevalent and continue to be of active research topics today. For example, digital television and video conferencing are some of the applications that are now commonly encountered in our daily lives. One application of interest here is to make use of the technologies to implement a video camera surveillance system which can enhance the security of consumer's business and home environment.

In typical surveillance systems, the captured video is sent over a cable networks to be monitored and stored at remote stations. As the captured raw video contains large amount of data, it will be of advantage to first compress the data by using a compression technique before it is transferred over the network. One such compression technique that is suitable for this type of application is the H.264 coding standard.

H.264 coding is better than the other coding technique for video streaming as it is more robust to data losses and coding efficiency, which are important factors when streaming is performed over a shared Local Area Network. As there is an increasing acceptance of H.264 coding and the availability of high computing power embedded systems, digital video surveillance system based on H.264 on embedded platform is hence a feasible and a potentially more cost-effective system.

Implementing a H.264 video streaming system on an embedded platform is a logical extension of video surveillance systems which are still typical implemented using high computing power stations (e.g. PC). In a embedded version, a Digital Signal Processor (DSP) forms the core of the embedded system and executes the intensive signal processing algorithm. Current embedded systems typical also include network features which enable the implementation of data streaming applications. To facilitate data streaming, a number of network protocol standards have also being defined, and are currently used for digital video applications.

1.2. Objective and Scope

The objective of this final year project is to implement a video surveillance system based on the H.264 coding standard running on an embedded platform. Such a system contains extensive scopes of functionalities and would require extensive amount of development time if implemented from scratch. Hence this project is to focus on the data streaming aspect of a video surveillance system.

After some initial investigation and experimentation, it is decided to confine the main scope of the project to developing a live streaming H.264 based video system running on a DM6446 EVM development platform. The breakdown of the work to be progressive performed are then identified as follows:

1. Familiarization of open source live555 streaming media server

Due to the complexity of implementing the various standard protocols needed for multimedia streaming, the live555 media server program is used as a base to implement the streaming of the H.264.based video data.

2. Streaming of stored H.264 file over the network

The live555 is then modified to support streaming of raw encoded H.264 file from the DM6446 EVM board over the network. Knowledge of H.264 coding standard is necessary in order to parse the file stream before streaming over the network.

3. Modifying a demo version of an encoder program and integrating it together with live555 to achieve live streaming

The demo encoder was modified to send encoded video data to the Live555 program which would do the necessary packetization to be streamed over the network. Since data is passed from one process to another, various inter-process communication techniques were studied and used in this project.

1.3. Resources

The resources used for this project are as follows:

1. DM6446 (DaVinci™) Evaluation Module

2. SWANN C500 Professional CCTV Camera Solution 400 TV Lines CCD Color Camera

3. LCD Display

4. IR Remote Control

5. TI Davinci demo version of MontaVista Linux Pro v4.0

6. A Personal Workstation with Centos v5.0

7. VLC player v.0.9.8a as client

8. Open source live555 program (downloaded from www.live555.com)

The system setup of this project is shown below:

1.4. Report Organization

This report consists of 7 chapters.

Chapter 1 introduces the motivation behind embedded video streaming system and defines the scope of the project.

Chapter 2 illustrates the video literature review of the H.264/AVC video coding technique and the various streaming protocols which are to be implemented in the project.

Chapter 3 explains the hardware literature review of the platform being used in the project. The architecture, memory management, inter-process communication and the software tools are also discussed in this chapter.

Chapter 4 explains the execution of the encoder program of the DM6446EVM board. The interaction of the various threads in this multi-threaded application is also discussed to fully understand the encoder program.

Chapter 5 gives an overview of the Live555 MediaServer which is used as a base to implement the video streamer module on the board. Adding support to unicast and multicast streaming, porting of live555 to the board and receiving video stream on remote VCL client are explained in this chapter.

Chapter 6 explains the limitations of file streaming and moving towards live streaming system. Various integration methodologies and modification to both encoder program and live555 program are shown as well.

Chapters 7 summarize the implementation results of file and live streaming, analysis the performance of these results.

Chapter 8 gives the conclusion by stating the current limitation and problems, scope for future implementation.

Chapter 2: Video Literature Review
2.1. H.264/AVC Video Codec Overview

H.264 is the most advanced and latest video coding technique. Although there are many video coding schemes like H.26x and MPEG, H.264/AVC made many improvements and tools for coding efficiency and error resiliency. This chapter briefly will discuss the network aspect of the video coding technique. It will also cover error resiliency needed for transmission of video data over the network. For a more detailed explanation of the H.264/AVC, refer to appendix A.

2.1.1. Network Abstraction Layer (NAL)

The aim of the NAL is to ensure that the data coming from the VCL layer is “network worthy” so that the data can be used for numerous systems. NAL facilitates the mapping of H.264/AVC VCL data for different transport layers such as:

* RTP/IP real-time streaming over wired and wireless mediums

* Different storage file formats such as MP4, MMS, AVI and etc.

The concepts of NAL and error robustness techniques of the H.264/AVC will be discussed in the following parts of the report.
NAL Units

The encoded data from the VCL are packed into NAL units. A NAL unit represents a packet which makes up of a certain number of bytes. The first byte of the NAL unit is called the header byte which indicates the data type of the NAL unit. The remaining bytes make up the payload data of the NAL unit.

The NAL unit structure allows provision for different transport systems namely packet-oriented and bit stream-oriented. To cater for bit stream-oriented transport systems like MPEG-2, the NAL units are organized into byte stream format. These units are prefixed by a specific start code prefix of three bytes which is namely 0x000001. The start code prefix indicates and the start of each NAL units and hence defining the boundaries of the units.

For packet-oriented transport systems, the encoded video data are transported via packets defined by transport protocols. Hence, the boundaries of the NAL units are known without having to include start code prefix byte. The details of packetization of NAL units will be discussed in later sections of the report.

NAL units are further categorized into two types:

* VCL unit: comprises of encoded video data

· Non-VCL unit: comprises of additional information like parameter sets which is the important header information. Also contains supplementary enhancement information (SEI) which contains the timing information and other data which increases the usability of the decoded video signal.
Access units

A group of NAL units which adhere to a certain form is called a access unit. When one access unit is decoded, one decoded picture is formed. In the table 1 below, the functions of the NAL units derived from the access units are explained.

Data/Error robustness techniques

H.264/AVC has several techniques to mitigate error/data loss which is an essential quality when it comes to streaming applications. The techniques are as follows:

· Parameter sets: contains information that is being applied to large number of VCL NAL units. It comprises of two kinds of parameter sets:

- Sequence Parameter set (SPS) : Information pertaining to sequence of encoded picture

- Picture Parameter Set (PPS) : Information pertaining to one or more individual pictures

The above mentioned parameters hardly changes and hence it need not be transmitted repeatedly and saves overhead. The parameter sets can be sent “in-band” which is carried in the same channel as the VCL NAL units. It can also be sent “out-of-band” using reliable transport protocol. Therefore, it enhances the resiliency towards data and error loss.

· Flexible Macroblock Ordering (FMO)

FMO maps the macroblocks to different slice groups. In the event of any slice group loss, missing data is masked up by interpolating from the other slice groups.

· Redundancy Slices (RS)

Redundant representation of the picture can be stored in the redundant slices. If the loss of the original slice occurs, the decoder can make use of the redundant slices to recover the original slice.

These techniques introduced in the H.264/AVC makes the codec more robust and resilient towards data and error loss.

2.1.2. Profiles and Levels

A profile of a codec is defined as the set of features identified to meet a certain specifications of intended applications For the H.264/AVC codec, it is defined as a set of features identified to generate a conforming bit stream. A level is imposes restrictions on some key parameters of the bit stream.

In H.264/AVC, there are three profiles namely: Baseline, Main and Extended. 5 shows the relationship between these profiles. The Baseline profile is most likely to be used by network cameras and encoders as it requires limited computing resources. It is quite ideal to make use of this profile to support real-time streaming applications in a embedded platform.

2.2. Overview of Video Streaming

In previous systems, accessing video data across network exploit the ‘download and play' approach. In this approach, the client had to wait until the whole video data is downloaded to the media player before play out begins. To combat the long initial play out delay, the concept of streaming was introduced.

Streaming allows the client to play out the earlier part of the video data whilst still transferring the remaining part of the video data. The major advantage of the streaming concept is that the video data need not be stored in the client's computer as compared to the traditional ‘download and play' approach. This reduces the long initial play out delay experienced by the client.

Streaming adopts the traditional client/server model. The client connects to the listening server and request for video data. The server sends video data over to the client for play out of video data.

2.2.1. Types of Streaming

There are three different types of streaming video data. They are pre-recorded/ file streaming, live/real-time streaming and interactive streaming.

* Pre-recorded/live streaming: The encoded video is stored into a file and the system streams the file over the network. A major overhead is that there is a long initial play out delay (10-15s) experienced by the client.

* Live/real-time streaming: The encoded video is streamed over the network directly without being stored into a file. The initial play out delay reduces. Consideration must be taken to ensure that play out rate does not exceed sending rate which may result in jerky the picture. On the other hand, if the sending rate is too slow, the packets arriving at the client may be dropped, causing in a freezing the picture. The timing requirement for the end-to-end delay is more stringent in this scenario.

* Interactive streaming: Like live streaming, the video is streamed directly over the network. It responds to user's control input such as rewind, pause, stop, play and forward the particular video stream. The system should respond in accordance to those inputs by the user.

In this project, both pre-recorded and live streaming are implemented. Some functionality of interactive streaming controls like stop and play are also part of the system.

2.2.2. Video Streaming System modules

Video Source

The intent of the video source is to capture the raw video sequence. The CCTV camera is used as the video source in this project. Most cameras are of analogue inputs and these inputs are connected to the encoding station via video connections. This project makes use of only one video source due to the limitation of the video connections on the encoding station. The raw video sequence is then passed onto the encoding station.

Encoding Station

The aim of the encoding station digitized and encodes the raw video sequence into the desired format. In the actual system, the encoding is done by the DM6446 board into the H.264/AVC format. Since the hardware encoding is CPU intensive, this forms the bottleneck of the whole streaming system. The H.264 video is passed onto the video streamer server module of the system.

Video Streaming and WebServer

The role of the video streaming server is to packetize the H.264/AVC to be streamed over the network. It serves the requests from individual clients. It needs to support the total bandwidth requirements of the particular video stream requested by clients. WebServer offers a URL link which connects to the video streaming server. For this project, the video streaming server module is embedded inside DM6446 board and it is serves every individual client's requests.

Video Player

The video player acts a client connecting to and requesting video data from the video streaming server. Once the video data is received, the video player buffers the data for a while and then begins play out of data. The video player used for this project is the VideoLAN (VLC) Player. It has the relevant H.264/AVC codec so that it can decode and play the H264/AVC video data.

2.2.3. Unicast VS Multicast

There are two key delivery techniques employed by streaming media distribution.

Unicast transmission is the sending of data to one particular network destination host over a packet switched network. It establishes two way point-to-point connection between client and server. The client communicates directly with the server via this connection. The drawback is that every connection receives a separate video stream which uses up network bandwidth rapidly.

Multicast transmission is the sending of only one copy of data via the network so that many clients can receive simultaneously. In video streaming, it is more cost effective to send single copy of video data over the network so as to conserve the network bandwidth. Since multicast is not connection oriented, the clients cannot control the streams that they can receive.

In this project, unicast transmission is used to stream encoded video over the network. The client connects directly to the DM6446 board where it gets the encoded video data. The project can easily be extended to multicast transmission.

2.3. Streaming Protocols

When streaming video content over a network, a number of network protocols are used. These protocols are well defined by the Internet Engineering Task Force (IETF) and the Internet Society (IS) and documented in Request for Comments (RFC) documents. These standards are adopted by many developers today.

In this project, the same standards are also employed in order to successfully stream H.264/AVC content over a simple Local Area Network (LAN). The following sections will discuss about the various protocols that are studied in the course of this project.

2.3.1. Real-Time Streaming Protocol (RTSP)

The most commonly used application layer protocol is RTSP. RTSP acts a control protocol to media streaming servers. It establishes connection between two end points of the system and control media sessions. Clients issue VCR-like commands like play and pause to facilitate the control of real-time playback of media streams from the servers. However, this protocol is not involved in the transport of the media stream over the network. For this project, RTSP version 1.0 is used.

RTSP States

Like the Hyper Text Transfer Protocol (HTTP), it contains several methods. They are OPTIONS, DESCRIBE, SETUP, PLAY, PAUSE, RECORD and TEARDOWN. These commands are sent by using the RTSP URL. The default port number used in this protocol is 554. An example of such as URL is:

<method name > rtsp://

· OPTIONS: An OPTIONS request returns the types of request that the server will accept. An example of the request is:

OPTIONS rtsp:// RTSP/1.0

CSeq: 1\r\n

User-agent: VLC media Player

The CSeq parameter keeps track of the number of request send to the server and it is incremented every time a new request is issued. The User-agent refers to the client making the request.

* DESCRIBE: This method gets the presentation or the media object identified in the request URL from the server. An example of such a request:

DESCRIBE rtsp:// RTSP/1.0

CSeq: 2\r\n

Accept: application/sdp\r\n

User agent: VLC media Player

The Accept header is used to describe the formats understood by the client. All the initialization of the media resource must be present in the DESCRIBE method that it describes.

· SETUP: This method will specify the mode of transport mechanism to be used for the media stream. A typical example is:

SETUP rtsp:// RTSP/1.0

CSeq: 3\r\n

Transport: RTP/AVP; unicast; client_port = 1200-1201

User agent: VLC media Player

The Transport header specifies the transport mechanism to be used. In this case, real-time transport protocol is used in a unicast manner. The relevant client port number is also reflected and it is selected randomly by the server. Since RTSP is a stateful protocol, a session is created upon successful acknowledgement to this method.

· PLAY: This method request the server to start sending the data via the transport mechanism stated in the SETUP method. The URL is the same as the other methods except for:

Session: 6

Range: npt= 0.000- \r\n

The Session header specifies the unique session id. This is important as server may establish various sessions and this keep tracks of them. The Range header positions play time to the beginning and plays till the end of the range.

* PAUSE: This method informs the server to pause sending of the media stream. Once the PAUSE request is sent, the range header will capture the position at which the media stream is paused. When a PLAY request is sent again, the client will resume playing from the current position of the media stream as specified in the range header.

RSTP Status Codes

Whenever the client sends a request message to the server, the server forms a equivalent response message to be sent to the client. The response codes are similar to HTTP as they are both in ASCII text. They are as follows:

200: OK

301: Redirection

405: Method Not Allowed

451: Parameter Not Understood

454: Session Not Found

457: Invalid Range

461: Unsupported Transport

462: Destination Unreachable

These are some of the RTSP status codes. There are many others but the codes mentioned above are of importance in the context of this project.

2.3.2. Real-time Transport Protocol (RTP)

RTP is a defined packet structure which is used for transporting media stream over the network. It is a transport layer protocol but developers view it as a application layer protocol stack. This protocol facilitates jitter compensation and detection of incorrect sequence arrival of data which is common for transmission over IP network. For the transmission of media data over the network, it is important that packets arrive in a timely manner as it is loss tolerant but not delay tolerant. Due to the high latency of Transmission Control Protocol in establishing connections, RTP is often built on top of the User Datagram Protocol (UDP). RTP also supports multicast transmission of data.

RTP is also a stateful protocol as a session is established before data can be packed into the RTP packet and sent over the network. The session contains the IP address of the destination and port number of the RTP which is usually an even number. The following section will explain about the packet structure of RTP which is used for transmission.

RTP Packet Structure

The below shows a RTP packet header which is appended in front of the media data.s

The minimum size of the RTP header is 12 bytes.. Optional extension information may be present after the header information. The fields of the header are:

· V: (2 bits) to indicate the version number of the protocol. Version used in this project is 2.

· P (Padding): (1 bit) to indicate if there padding which can be used for encryption algorithm

· X (Extension): (1 bit) to indicate if there is extension information between header and payload data.

· CC (CSRC Count) : (4 bits) indicates the number of CSRC identifiers

· M (Marker): (1 bit) used by application to indicate data has specific relevance in the perspective of the application. The setting for M bit marks the end of video data in this project

· PT (Payload Type): (7 bits) to indicate the type of payload data carried by the packet. H.264 is used for this project

· Sequence number: (16 bits) incremented by one for every RTP packet. It is used to detect packet loss and out of sequence packet arrival. Based on this information, application can take appropriate action to correct them.

· Time Stamp: (32 bits) receivers use this information to play samples at correct intervals of time. Each stream has independent time stamps.

· SSRC: (32 bits) it unique identifies source of the stream.

· CSRC: sources of a stream from different sources are enumerated according to its source IDs.

This project does not involve the use of Extension field in the packet header and hence will not be explained in this report. Once this header information is appended to the payload data, the packet is sent over the network to the client to be played. The table below summarizes the payload types of RTP and highlighted region is of interest in this project.

Table 2: Payload Types of RTP Packets

2.3.3. RTP Control Protocol (RTCP)

RTCP is a sister protocol which is used in conjunction with the RTP. It provides out-of-band statistical and control information to the RTP session. This provides certain Quality of Service (QoS) for transmission of video data over the network.

The primary functions of the RTCP are:

* To gather statistical information about the quality aspect of the media stream during a RTP session. This data is sent to the session media source and its participants. The source can exploit this information for adaptive media encoding and detect transmission errors.

* It provides canonical end point identifiers (CNAME) to all its session participants. It allows unique identification of end points across different application instances and serves as a third party monitoring tool.

* It also sends RTCP reports to all its session participants. By doing so, the traffic bandwidth increases proportionally. In order to avoid congestion, RTCP has bandwidth management techniques to only use 5% of the total session bandwidth.

RTCP statistical data is sent odd numbered ports. For instance, if RTP port number is 196, then RTCP will use the 197 as its port number. There is no default port number assigned to RTCP.

RTCP Message Types

RTCP sends several types of packets different from RTP packets. They are sender report, receiver report, source description and bye.

· Sender Report (SR): Sent periodically by senders to report the transmission and reception statistics of RTP packets sent in a period of time. It also includes the sender's SSRC and sender's packet count information. The timestamp of the RTP packet is also sent to allow the receiver to synchronize the RTP packets. The bandwidth required for SR is 25% of RTCP bandwidth.

· Receiver Report (RR): It reports the QoS to other receivers and senders. Information like highest sequence number received, inter arrival jitter of RTP packets and fraction of packets loss further explains the QoS of the transmitted media streams. The bandwidth required for RR is 75% of the RTCP bandwidth.

· Source Description (SDES): Sends the CNAME to its session participants. Additional information like name, address of the owner of the source can also be sent.

· End of Participation (BYE): The source sends a BYE message to indicate that it is shutting down the stream. It serves as an announcement that a particular end point is leaving the conference.

Further RTCP Consideration

This protocol is important to ensure that QoS standards are achieved. The acceptable frequencies of these reports are less than one minute. In major application, the frequency may increase as RTCP bandwidth control mechanism. Then, the statistical reporting on the quality of the media stream becomes inaccurate.

Since there are no long delays introduced between the reports in this project, the RTCP is adopted to incorporate a certain level of QoS on streaming H.264/AVC video over embedded platform.

2.3.4. Session Description Protocol (SDP)

The Session Description Protocol is a standard to describe streaming media initialization parameters. These initializations describe the sessions for session announcement, session invitation and parameter negotiation. This protocol can be used together with RTSP. In the previous sections of this chapter, SDP is used in the DESCRIBE state of RTSP to get session's media initialization parameters. SDP is scalable to include different media types and formats.

SDP Syntax

The session is described by attribute/value pairs. The syntax of SDP are summarized in the below.

In this project, the use of SDP is important in streaming as the client is VLC Media Player. If the streaming is done via RTSP, then VLC expects a sdp description from the server in order to setup the session and facilitate the playback of the streaming media.

Chapter 3: Hardware Literature Review
3.1. Introduction to Texas Instrument DM6446EVM DavinciTM

The development of this project based on the DM6446EVM board. It is necessary to understand the hardware and software aspects of this board. The DM6446 board has a ARM processor operating at a clock speed up to 300MHz and a C64x Digital Signal Processor operating at a clock speed of up to 600MHz.

3.1.1. Key Features of DM6446

The key features that are shown in the above are:

* 1 video port which supports composite of S video

* 4 video DAC outputs: component, RGB, composite

* 256 MB of DDR2 DRAM

* UART, Media Card interface (SD, xD, SM, MS ,MMC Cards)

* 16 MB of non-volatile Flash Memory, 64 MB NAND Flash, 4 MB SRAM

* USB2 interface

* 10/100 MBS Ethernet interface

* Configurable boot load options

* IR Remote Interface, real time clock via MSP430

3.1.2. DM6446EVM Architecture

The architecture of the DM6446 board is organized into several subsystems. By knowing the architecture of the DM6446, the developer can then design and built his application module on the board's underlining architecture.

The shows that DM6446 has three subsystems which are connected to the underlying hardware peripherals. This provides a decoupled architecture which allows the developers to implement his applications on a particular subsystem without having to modify the other subsystems. Some of subsystems are discussed in the next sections.

ARM Subsystem

The ARM subsystem is responsible for the master control of the DM6446 board. It handles the system-level initializations, configurations, user interface, connectivity functions and control of DSP subsystems. The ARM has a larger program memory space and better context switching capabilities and hence it is more suited to handle complex and multi tasks of the system.

DSP Subsystem

The DSP subsystem is mainly the encoding the raw captured video frames into the desired format. It performs several number crunching operations in order to achieve the desired compression technique. It works together with the Video Imaging Coprocessor to compress the video frames.

Video Imaging Coprocessor (VICP)

The VICP is a signal processing library which contains various software algorithms that execute on VICP hardware accelerator. It helps the DSP by taking over computation of varied intensive tasks. Since hardware implementation of number crunching operation will have a faster execution time, the DSP's performance is significantly enhanced. Some of the algorithms supported by VICP are:

* Matrix and Array operation i.e.: Matrix multiplication/transpose, Array Multiplication, Look-up table

* Digital Signal Processing Operations: 1D, 2D FIR Filtering, Convolution and Correlation

* Digital Image and Video Processing Functions: Alpha Blending, Colour space Conversion, Median Filtering

Video Processing Subsystem

This subsystem does the processing of the video frames. The Resizer module crops the video frame into the appropriate resolutions. It also has a On-Screen Display (OSD) to output either the encoded or to be encoded video frames to the LCD display. Four DAC channels are connected to this subsystem to condition the incoming video signals.

Switched Central Resources (SCR)

SCR acts as an interface between various subsystems and the underlying hardware peripherals. It manages the hardware resources and it decides which subsystem can gain control of the hardware resources in an efficient manner. SCR has several techniques to ensure that allocation of hardware for contesting subsystems does not result in deadlock.

3.2. Memory Management of DM6446EVM

The understanding of the memory management of the DM6446EVM was important when dealing with embedded systems. This is to ensure that the developed system does not exceed the memory capabilities of the embedded board.

The shows the breakdown of the different memory types present in the DM6446EVM board as it has a large byte addressable memory space. Since the DSP component is treated as a ‘black box', the memory mapping is shown with respect to the ARM processor. The ARM instruction and data RAM occupies the about 2MB size. The Flash/NAND memory is used to store the contents of the developed program to be loaded into the file system.

3.2.1. Current Memory map of system

For this project, the memory map is shown in the above. It is divided into different sections, each handling a different function. The explanations of the sections are as follows:

· LINUX Section: manages all the resources required by the applications. Whenever the application request for a resource, Linux grants it depending on the availability and the UNIX permissions. The memory partition is segmented into 4KB pages.

· DDRALGHEAP Section: contains heap memory which codec uses to allocate dynamic memory. The memory size is large as video codec consumes a lot of memory.

· DDR Section: contains the DSP-side codes, the static data for the codec and the system of DSP/BIOS and the Codec Engine.

· DSPLINKMEM Section: memory allocation for DSPLINK Inter-process Communication. This module communicates between ARM and DSP. It also loads DSP codes and controls DSP execution.

· RESET_VECTOR Section: contains the DSP reset vector.

3.2.2. Contiguous Memory Allocator (CMEM)

The ARM and DSP works on different regions of the memory on the DM6446EVM. The ARM views the memory of DSP as virtual. The DSP requires the allocation of contiguous memory space. If contiguous memory is not allocated, the DSP can corrupt the memory space of the ARM causing the system to crash. The CMEM is an API created to share buffers between ARM Linux processes and the DSP. CMEM uses a physical memory region and carves it into pools of contiguous memory space. This is done at module insertion time which occurs before the running of any applications on the DM6446EVM board. The advantage of the CMEM is that it is configurable by user. The command input by user at the target is:

The following command initializes the start and ending physical address of the contiguous memory space. The memory is partitioned into 4 pools of various sizes. The CMEM is an important module in the memory management of DM6446EVM. It helps the developers to con the memory pools needed for their applications running on DM6446EVM.

3.3. Inter-Process Communication (IPC) of DM6446 EVM

Since the DM6446EVM consists of both ARM processor and DSP, there must be IPC between the two processors in order to exchange data. The DSP/BIOS LINK is a software framework that allows communication between the ARM and DSP.

3.3.1. Software Architecture of DSP/BIOSTM LINK

The above shows the software architecture of the DSP LINK. The GPP component refers to the General Purpose Processor of ARM processor. The components of the architecture are:

On the GPP side

A specified OS is to be running. In this project, MontaVista Linux OS is running on ARM processor.

* OS ADAPTATION LAYER: wrapper which encapsulates the generic OS services needed by the other components of the DSP LINK. Hence, the other components make use of this API exported by this component instead of direct OS calls. This makes DSP LINK portable across platforms.

* LINK DRIVER: encapsulates the low-level control on the physical link between ARM and DSP.

* PROCESSOR MANAGER: logs information for all components. It also allows various boot loaders to be integrated into the system.

* DSP/BIOS LINKTM API: interfaces for all clients on the ARM side.

On the DSP side

· LINK DRIVER: is part of the DSP/BIOS drivers. It communicates with the ARM over the physical link.

3.3.2. Types of Communication in DSP/BIOSTM LINK

There are four types of IPC in the DSP/BIOSTM LINK which allows communication between the ARM processor and the DSP. They are PROC, CHNL, MSGQ and POOL.


This component refers to the DSP processor from the application's perspective. This allows the DSP to be callable from the ARM processor. Currently, only one DSP is supported. The use of processorId allows the number of DSP to be scalable.


CHNL refers to the logical data channel in application space. It is mainly responsible for data transfer across ARM processor and DSP. Multiplexing of channels on a single physical link is also supported. The information of the source or destination is not contained in the data and it must be explicitly established. The shows a simple CHNL example.


MSGQ refers to IPC via message queuing. This component can exchange short messages of varied length between ARM and DSP clients. The reader retrieves messages from queue and the writer writes messages to the queue. MSGQ supports one reader and multiple writers. The below shows a MSGQ example.


POOL is an API which opens and closes memory pools which are used by the CHNL and MSGQ components that allocates buffers needed to transfer data between the ARM processor and the DSP.

3.4. Software Framework and Tools of DM6446EVM

The DM6446EVM is equipped with software frameworks and tools which allows developers to reduce their system's development time. These software frameworks include Codec Engine, eXpress DSP algorithm Interoperability Standard (xDAIS) tools and eXpress DSP Components (XDC) toolset. The following sections of the report will elaborate on these features.

3.4.1. Codec Engine

Codec Engine is a collection of APIs which the developer can instantiate and execute xDAIS algorithms. It has a Video, Image, Speech and Audio (VISA) interface to communicate with the xDAIS algorithms. One set of API is defined per codec class. A MPEG4 can be changed to H.264 by changing the configuration. This allows software reusability. The Codec Engine supports real-time execution of codec. APIs are also defined to access memory, log CPU utilization statistics and execution trace information.

The advantages of this software framework are:

· Easy to use: developers just to specify codec to run

· Scalable and configurable: supports addition of new algorithm through the use of standard tools

· Portable: APIs are target, platform and codec independent.

The below shows the architecture of an application that exploits the codec engine.

The application calls the Core Engine and VISA APIs. The VISA APIs uses stubs to call the core engine's System Programming Interfaces (SPIs) and the skeletons. The VISA SPIs access the algorithms. For a ARM and DSP board, the application, media middleware and video encoder stubs run on ARM processor. The video encoder skeleton and codecs run the DSP.

3.4.2. eXpress DSP Algorithm Interoperability Standard (xDAIS)

This standard was developed for Texas instrument for TMS320 DSP family. It eases the integration of various DSP algorithms into a system. The xDAIS standard handles issues pertaining to resource allocation and utilization of DSP CPU cycles. This standard conforms to set of guidelines that are used in all its DSP algorithms.

The major advantages of using this standard are:

* Reduces integration time of algorithms

* Allows comparisons of different algorithms from different sources

* Has broad range of compliant algorithms from third parties and reduces the need to custom develop new algorithms

* Works well with Codec Engine Framework

3.4.3. eXpress DSP Components (XDC)

The eXpress DSP Components creates reusable software components. These components are optimized for the use in real-time embedded platforms. The reusable components are called packages. The main advantages of the XDC is that the delivery content is standardized which makes it easier for integration in applications. XDC is used by two groups of developers namely consumers and producers. The consumers integrate target contents into their own applications and producers develop the packages used by consumers. The below shows the relationship between consumers and producers.

Chapter 4: Execution of DM6446 Programs

This chapter illustrates the how the multi-threaded DM6446 programs work. The program of interest in this project is the encode program. The understanding of this program is essential during the later stages of implementation of the system. The interaction between various threads in the program is also explained. The setting up of the environment and compiling of the program is included in Appendix B for user's reference.

4.1. Understanding the encode program of DM6446

In this project, the encode program of the DM6446 is used as a base for encoding the raw captured video frames into the desired baseline H.264/AVC format. The resulting output bit stream is written back to a file on the NFS. Therefore, it is important to understand the workings of the encode program in order to make modification to the program. In this section of the report, the workflow of the encode program will be discussed.

4.1.1. Overview of encode program

The encode program's objective to capture raw video frames using the camera source and encodes into a baseline H.264/AVC format to be written to a output file. The program is a multi- threaded application. The threads make use of mutual exclusion and condition synchronization concepts to ensure the correct execution of the application. The below shows the various threads involved in this program.

The program makes use of 6 POSIX threads. They are main, control, video, display, capture, speech and writer threads. The main thread is evolved to the control in the application. All the threads except control thread are created from the main thread. These threads are cond to be pre-emptive and priority based scheduled. Initialization and cleanup of threads are done by the Rendezvous module. The Rendezvous module uses the condition synchronization to synchronize the threads. The threads are first initialized and it signals the Rendezvous object. Once all the threads are initialized, the threads are unlocked and execute their main loop routines. Hence, the shared buffers are not freed before the other threads are using.

4.1.2. Functions of threads in encode program

Each thread handles a certain function in the whole application. The functions of speech thread will not be discussed as it is not in the scope of this project.

Main Thread

This thread handles all the initializations and also checks the arguments given by the user. Based on these arguments, it creates the necessary threads to start the encoding application. The main thread then invokes the control thread. The below shows the workflow of the main thread.

Control Thread

The control thread handles the user interaction with the application. It constantly polls the IR interface to check if user got press any commands on the IR remote. If the keyboard is enabled, it also checks to see whether the user presses any key on the keyboard. The thread also draws texts and graphics on the LCD display console. It makes uses of the simplewidget utility to do this. Both the ARM and DSP CPU load is also calculated and displayed on the LCD console. Parameters like frame rate, bit rate and time elapsed also displayed.

Video Thread

The video thread is in charge of encoding the video frames into the H.264/AVC. The buffer from the capture thread is passed to the video thread and is encoded by the H.264 algorithm running on the DSP side. It allocates contiguous memory buffer for the writer thread to write the output to the NFS. It then passes the buffer to the writer thread. The below shows the workflow of the video thread.

Display Thread

This thread allows the user to see a preview of the encoded video frame while the encoding is taking place. It makes of the Video Processing Sub System (VPSS) to do the copying of frames in order to be displayed on the LCD console.

Capture Thread

The capture thread removes the interlacing artifacts in the raw captured video frames. This is done by using the VPSS resizer module. The resizer module consists of Smooth and the Rszcopy modules. The Smooth module just removes the interlacing artifacts and the Rszcopy copies the raw buffer with any modification. The removal of interlacing artifacts can also be disabled by the user.

Writer Thread

Finally the writer thread basically writes the encoded video frames to an output file on the NFS which is specified by the user. DSP processing and writing to file is done in parallel so as to conserve the CPU cycles.

4.1.3. Interaction of Threads in the encode program

After exploring the individual functions of various threads, it is essential to also know about how the threads interact with each other. Since it is a multi-threaded application, it is important the execution of the threads occur in a certain sequence so as to ensure that safety and liveness properties are not violated. The below shows the interaction between the various threads in the application.

After all the threads have been initialized, a raw buffer from capture device is dequeued by the capture device. It sends the raw buffer to the display thread to display the raw video frame to the LCD screen. It fetches empty raw buffer from the video thread. It makes use of the Smooth module to remove interlacing artifacts and puts into the buffer. The video thread receives this buffer to do the encoding of the video frames.

The video thread fetches an I/O buffer from the writer thread where it will place the encoded data. The display thread copy the copies the raw buffer to the display device frame buffer using the Video Processing Subsystem (VPSS) resizer. At the same time, the video is encoding the same buffer on the DSP. Since both VPSS and DSP are only accessing the capture buffer for reading, there would be no contention of data. After the display thread finish copying the buffer, it creates a new frame buffer.

When the video encoder on the DSP has finished encoding, it sends the I/O buffer to the writer thread to write to the Linux network file system. The capture thread is allocated the capture buffer. The writing of the encoded frame is done by the writer thread. This is done while the capture thread is waiting for the next dequeued buffer of the capture device to be ready. This cycle of execution continues till user interrupts/stops the program.

Chapter 5: Open-Source Live555 MediaServer

5.1. Introduction to Live555 MediaServer

The Live555 Media Server is a well defined complete RTSP open-source server application. It makes use of RTSP, RTP, RTCP and SDP for streaming media. Due to the complexity of various network protocols needed for streaming, live555 was used as a base for development of the streaming module of this system. This chapter briefly describes the overview of the live555 open source application. Support for multicast and unicast streaming is also added to streaming stored H.264 video files over the network. It also highlights the steps to cross compile and executing the live555 application on the DM6446EVM board.

5.1.1. Overview of Live555 MediaServer

The Live555 MediaServer It can stream different types of media files over the network. These media files include:

* MPEG Transport Stream file (“.ts” file)

* MPEG 1 or 2 Program Stream file (“.mpg” file)

* MPEG 4 Video Elementary Stream file (“.m4e” file)

* MPEG 1 or 2 audio file (“.mp3” file)

* WAV (PCM) audio file (“.wav” file)

* AMR audio file (“.amr” file)

* AAC (ADTS format) audio file (“.aac” file)

Although the live555 does not support H.264/AVC codec standard, it was added into the live555 media server by implementing classes which encapsulate the streaming of the H.264/AVC media file.

Modifying the live555 to stream H.264/AVC

The live555 server application was implemented using C++ and event-driven model. The task was to implement classes to encapsulate the H.264/AVC file streaming over the network. 5 classes were implemented to achieve the task. These classes are H264VideoFileSink H264VideoFileServerMediaSubsession, H264VideoRTPSink, H264VideoStreamFramer and H264VideoStreamParser. These implementations of classes are further explained in the following sections of this report. The below shows the execution of the implemented classes in order to achieve file streaming over the network.


The objective of this class was to opening and writing of output file. A file is created when the user enters the media file to be streamed. When it reads the first frame of the media file, it adds the 4 bytes start code (0x000001) to the file and continues to write the rest of the data of the media file into this output file. This file is passed onto other classes for further processing of data.


This class creates a dynamic session for streaming the data over the network. It inherits the connection type (unicast or multicast) from another class. The session must be created as RTSP is a stateful protocol. Bandwidth is allocated for this session and other auxiliary parameters are validated. After the session is created, the RTP sink and the video framer is instantiated.


The RTP sink is the underlying transport mechanism of H.264/AVC data. It facilitates in the packetization of NALs to be sent over the network. Firstly, it validates the dynamic SDP parameters such as payload type, sprop parameter sets, profile Id and packetization mode of the media data. It considers three cases of sending NAL units. The cases are as follows:

· Case 1: NAL unit data is present in the buffer and it is small enough to send to the RTP sink

· Case 2: NAL unit data is present in buffer but it is too large to send to the RTP sink. The first fragment of the data is sent as FU-A packet with extra one preceding header byte

· Case 3: NAL unit data is in buffer and some fragments are sent to the RTP sink already. The next fragment of NAL unit data is sent as FU-A packet with extra two preceding header bytes.

The last NAL unit of data is marked by setting the ‘M' bit of the RTP packet. Appropriate delays are set to fragments so that play out of the media file is smooth at the client side.


The aim of this class is to classify the input video data into frames. This is done by continuously reading the input file and identifying the data which is contained in each frame. The frame size is computed and the frame rate of the video is set appropriately by setting the presentation time of each frame.


This class is invoked by the H264VideoStreamFramer in order to correctly parse the data into frames. It checks for the 4 bytes start code before parsing the frames. In the absence of the start codes, the frames are not parsed. This returns frame size of the video so that it can be sent to the stream framer for further processing.

5.1.2. Adding support for unicast and multicast streaming

The live555 MediaServer provides support for both unicast and multicast streaming capabilities. The unicast connection is straightforward as the system will only allow one user to connect to the streamer module in order to receive the video stream. In order to model this type of connection, the H264VideoFileMediaSubsession is used to dynamically create a media session for a single user to receive the video stream. This is modeled after the OnDemandMediaSubsession class which checks the user filename input and compares the extension of the filename. It then returns the appropriate media subsession for the user.

The multicast connection is slightly complex. The system must also allow multiple users to connect and receive video stream from the streamer as this is a video surveillance system. The multicast connection is implemented by using the PassiveServerMediaSubsession. This class makes use of broadcast address to stream to multiple clients. The address is generated at random and it uses the range of [, As long as the system is connected in the same network, this broadcast address holds valid.

In using multicast connection, there could be instances of different users connecting at different moments in time. For this case, the system is implemented such at even when the users connect at different time instances, they all receive the same video stream. For example, if user A connects at time = 1s, and user B connects at time = 5s. In time = 6s, user A will continue receiving the video stream as normal. The user B will receive the same video stream as user A instead of starting of the video stream. To achieve this, reuseFirstSource parameter is used. When this parameter is set, the server will only sent the video stream packets to the client of the first video stream.

5.2. Porting Live555 MediaServer to DM6446 board

The next stage of development involved the porting of live555 MediaServer onto the DM6446 board. Since the system should run on the embedded board, live555 application has to be ported onto the board. This section of the report, the porting process and the running of the live555 application is explained.

5.2.1. Modifying the make files of Live555 and DM6446 encode program

The live555 application uses the make files in order to compile and execute. The make files describe how the various classes and objects are to be compiled and linked in order to execute. The original application uses the GNU C++ compiler to make the application. In order to port the application to the board, it must be cross compiled with the board tool chain. The board has a montaVista tool chain which make uses of the arm C and C++ cross compiler. If the cross compiler can compile the application, it can run on the board.

Firstly, the make files are of the live555 had to be modified to inform that the classes must use the board cross compiler to compile the application. The below shows the top portion of the original make file.

In the make file, the various compiler and suffixes are defined. The classes make use of these parameters to compile the application.

From the two s shown above, the C_Compiler and CPLUSPLUS_COMPILER variable is changed to the montaVista tool chain compiler. Once this modification is done, the live555 can be cross compiled for the DM6446 board.

The make files of the encode program is also changed so that the live555 is compiled along with the encode program and the resulting executable is stored directly into the appropriate directory. Firstly, the make file in the dvevm_1_10/demos is changed as:

The live directory is added to the SUBDIRS variable. Once this is done, the live555 will be compiled along with the encode programs of the board and the resulting executable will be stored into /home/ansary/workdir/filesys/opt/dvevm directory.

5.2.2. Cross compiling live555 and running it on the board

Firstly, open up the terminal in the linux host. Change the dvevm_1_10 directory. Type ‘make' to compile and type ‘make install' to install in appropriate directory. Then, boot the board using the minicom and change to opt/dvevm directory. Lastly, type ‘./live555MediaServer' to run the application on the board.

5.3. Receiving video stream on VLC player

When the live555 MediaServer is running on the DM6446 board, the VLC player must connect to the board to receive the video stream for playback to the client. In this section of the report explains the steps in connecting to the DM6446 board to receive the video stream.

5.3.1. Playing vide o stream via VLC

For this system, the VLC client is running on a Windows host machine. The steps in receiving the video stream via VLC are as follows:

1. Launch VLC player

2. Under the Media tab, click on Open Network option.

3. A dialog box should appear

4. Under the Protocol drop box, select RTSP. In the Address text field, type rtsp://

5. Click the Play button.

The VLC player connects to the board's IP address and makes use of the RTSP protocol to start receiving the video stream. Lastly, the video stream is played back for the client.

Chapter 6: From File streaming to Live streaming
6.1. Reasons for Live Streaming and investigation of approaches

The current system encodes the video data and writes to the Linux file system. The file is then passed as an input to the live555 media server program to be streamed over the network. This is two step streaming approach and there are some issues arising from this implementation.

· PROBLEM 1: The file is accessed by both the encode program and live555 media server program. Both the programs are contesting for the use of the file resource which could lead to data contention. This also leads to the next problem.

· PROBLEM 2: The end to end delay initial delay for the playback for the H.264/AVC on a remote station is about 10 - 15 seconds. If the remote VLC media player established a RTSP connection with the board and file is used by the encode program for writing data, the connection will timeout and teardowns the connection. Hence, the user has to re-establish connection again.

· PROBLEM 3: The live555 program reads from the beginning of the encoded video file and streams over to the remote VLC media player. Hence, the VLC would play the delayed version of the encoded video stream.

Due to the limitation of stored file streaming, the concept of live streaming was explored. Live streaming directly encoded video frames directly over the network. This approach prevents the programs from contesting over the file resource and avoiding data contention. Since the encoded video frames are streamed over the network directly, the end to end initial delay of playing back the video stream is significantly reduced. Hence, the user can experience the live version of the encoded video rather than the delayed version of the delayed version.

6.1.1. Possible Integration methodologies

Both the live555 media server program and the encode program have to be integrated together in order to achieve live streaming. Several integration methodologies are explored to achieve integration. Currently, the live555 media server is executing C++ codes whereas the encode program is executing C codes. The encode program is a multi-threaded program whereas the live555 media server is an event-driven program. Two of the methodologies that were considered were:

§ Having a single integrated program which encapsulates both the encoding program and the live555 media server.

§ Having two separate programs communicating with each other via inter-process communication mechanisms.

Single Program VS Multi Program System

Firstly, a single program approach was considered. The live555 media server must be compiled as a library to be integrated with the encode program. The live555 C++ functions and classes must be callable from the encode C program. The C++ classes and function that is being accessed by the C program must be declared using ‘extern C' keyword. The encode program is using threading concept and so the live555 library must be instantiated as a separate thread inside the encode program for the integration. The table below summarizes the pros and cons of having a single program.

Table 3: Pros and Cons of Single Program approach

PROS of Single Program

CONS of Single Program

Only need to run single program on the target

Need to modify the significant portion of the code to ensure it is callable by encode program

Understanding the relationship of threads and analysis of multi-threaded program is time consuming process

Need to ensure live555 thread does not violate thread safety and liveness aspects of the program execution

Program size increases

By adopting the single program system, the disadvantages outweigh advantages. Hence, the multi program approach was explored.

The multi program approach is having the two programs running separately on the target with the means of communicating with each other via inter-process communication. The two programs communicate through sockets. The program flow of the multi program approach is:

1. The encode program captures raw video frame and encodes the video frame into H.264/AVC format.

2. The encode program opens a socket and writes the encoded video frame to the socket.

3. The live555 media server, which listening to the socket, receives the video frame and streams it over the network.

The pros and cons of this approach are summarized in the table below.

Table 4: Pros and Cons of Multi program approach

PROS of Multi program system

CONS of Multi-program system

Fewer modification to individual programs and reduces integration time

Need to ensure reliable data transfer between programs via sockets

Use of sockets facilitate communication with each other

OS must be efficient in allocating resources in executing both the programs simultaneously

Program sizes are smaller than single program approach

After weighing the pros and cons of the two approaches, the multi program system approach was selected to be implemented in order to achieve live streaming of H.264/AVC on the DM6446EVM.

6.1.2. Writing a script to run both encode and live555 MediaServer

The DM6446 is booted up via the minicom application. Since only one instance of the minicom application can communicate with the board, it is only possible to run one program at a time. Therefore, a shell script is written in order to execute both programs at the same time via the minicom command line. The steps in writing a script are as follows:

1. Open an empty text file on the Linux host.

2. Type the following statements:

./live555MediaServer &

./encode -v test.264 -r 352x288

3. Save this file as <filename>.sh. This .sh indicates that this file is a shell script. Copy the file into the appropriate directory.

4. Boot up the board and change into the appropriate directory. Type ./<filename>.sh to execute both the programs at the same time.

6.2. Using Socket Programming for encode Program

After the multi program system approach was selected, inter-process communication between the programs were explored. Socket Programming was selected as it was the most straight forward approach in passing data from one program to another. This section of the report explains the socket programming and modifying the encode program to write encoded video frame to a socket instead of writing to a file.

6.2.1. Introduction to Socket Programming

Socket Programming is a kind of inter process communication framework which allows passing of messages or data to one process to another. There are basically two kinds of socket which is namely TCP sockets or UDP sockets. For this system, UDP sockets are used as video data are loss tolerant but delay intolerant. The basic outline of the writing and reading from socket from the perspective of a client are as follows:

1. Create a UDP socket.

2. Specify the server's address and port number. In this system, the encode program acts as the server and live555 as the client. The server's address is the board's IP address ( and port number is 9734.

3. Bind to the server's address.

4. Request for data from server by writing to socket using the sendto method.

5. Read data from the server by using the recvfrom method.

6. Close the socket.

The sendto and recvfrom methods go together when using UDP sockets. These methods are blocking in nature as in when the execution will wait on these methods till data is sent or received in the socket.

6.2.2. Modifying encode program to write to socket

In this stage of the development, the encode program of the board is modified such that it will write the encoded video frames into the socket to be passed onto the live555 MediaServer. As the encode program is multi-threaded application, the socket codes are added to the existing writer thread. The writing to a file is disabled and the socket codes are added. The below shows the setting up of UDP socket and the relevant client address and port number.

In the main loop of the writer thread, the writing of the encoded video frame data into the socket is performed. This will ensure that the encoded video frame would be continuously written to the socket to be transmitted to the live555. This block of code is shown in the below

6.2.3. Problems due to data streaming via sockets

Once the encode program, verification of data streaming is performed. Initially, the data was not streamed properly. Only a fraction of data was being written to the socket and transmitted to live555 server. For debugging this problem, gcc printf statements were used to check the frame size of the video data sent and received on the live555 side. It was discovered that only 1/3 of the data was transmitted via the sockets. Another debugging tool called the Wireshark was used to capture the packets arriving at the live555 side. The below shows a screenshot of captured data.

The 39 shows that only about 1500 bytes was transmitted each time. It was also noted that the protocol used to transmit these data was TCP instead of UDP. Hence, the socket code is revised again to make sure that UDP was used. This was done by changing the second parameter of the socket method from SOCK_STREAM to SOCK_DGRAM. Once this was changed, all the data was transmitted.

A small test program was also written to verify whether the received data was correct. This program was just reading from the socket and writing the data into a file. We also enabled the file writing in the encode program. After executing the programs, verification of the file written by the encode program and the file written by the test program was done. Both the files were of same size and the data written to both the files were the same. Hence, data was properly streamed from the encode program to the live555 MediaServer.
6.2.4. Modifying Live555 MediaServer

In the last stage of development, the live555 was modified to read directly from the socket. It then passes the video data to the sink where it will stream out to the remote VLC client. The live555 application is event-driven and so the reading of the data of the socket is also modeled as an event. The reading of the socket is done in the BasicUdpSource class. The data is placed into a temporary buffer called fTo. This buffer then passes the video data to the H264VideoStreamFramer class to parse the data. The parsed data is passed to the H264VideoRTPSink where it is packetized into RTP packets to be transmitted to the remote VLC client. The below shows the interaction of the modified live555 MediaServer's events and the encode program.

Chapter 7: Results of Implementation and Analysis

In the chapter of the report, the results implementation would be discussed to show the implication of the results. The results would only cover the file streaming and live streaming aspect of this project.

7.1. File Streaming Results

File streaming was first implemented on the workstation PC using Live555 MediaServer. The encoded H.264/AVC video is first written to an output file. This file is then streamed over the to a remote VLC client via the network. The resolution of the video is Common Intermediate Format (CIF). The raw video is encoded and played back in 25 FPS. The table below summarizes the initial delay before play back and quality for implementation on PC to PC streaming.

Table 5: File streaming Results for PC to PC

Type of Connection

Initial play back delay



2 -4 seconds

Good, smooth video at 25 FPS


2 - 5 seconds

Good, smooth video at 25 FPS

Requires time to synchronize with original video stream

After which the live555 application was ported onto the board. When the application was run on the board, the initial play back delay and the quality of the video was about equivalent when it was running on the PC. This is due to the data rate of the Ethernet interface embedded onto the board was able to support up to 100Mbs.By using the Wire Shark Network Analysis, it was observed that there is no packet losses which attributes to the smooth video quality at the receiver's side. The RTCP QoS also validates no packet loss in the network via its receiver's report.

The Live555 MediaServer was also run together with the encoder program. It used the unicast connection to stream video data to the VLC player. In that setup, the VLC player can indefinitely receive video stream from the encoder. However, there were some problems in the performance of the streamed video. These problems were already discussed in chapter 6 and are summarized in the below:

* Problem 1: Both programs contesting for the use of file resource which involves the reading and writing to the file

* Problem 2: Due to the long initial play back delay, VLC connection timeout and user have to re-establish connection with board.

* Problem 3: VLC plays delayed version of the video stream as it seeks from the beginning of the output file.

7.2. Live Streaming Results

To resolve the issue of file streaming, live streaming was adopted. The live streaming was not fully implemented into the system and hence the results will discuss the status of the live streaming implementation and the some of the problems faced.

Currently, the encoder successfully writes the encoded video data to a UDP socket and sends out the data frame by frame. The Live555 MediaServer is also able to receive the video data through its own UDP socket. Through the Wire Shark, it is verified that the data being sent out by the encoder and the data received by the live555 application is correct. It also showed that there is no packet loss in the network and hence the all the data is received.

However, the VLC player is not playing back any of the video data being sent by the encoder program. Since the live555 application is event-driven, the arriving data is read from the socket using a reader event of the source and is stored into a temporary buffer. This buffer is passed to another event which packetizes the data into RTP packets. The packets are passed downstream to a sink to be sent out the VLC player.

The current implementation is such that reader event is blocked till the data arrives to the socket by the encoder program. This in turn blocks the execution of the other events and hence no packets are sent to the VLC player. A possible way to resolve the problem is setting a flag to check if the data has arrived to the socket. If there is no data in the socket, the application should exit from the event and call the other events.

Another problem could be the incoming data rate is higher than the outgoing data rate. The encoder program writes to the socket as soon as it gets encodes the video frame. The Live555 does some introduces some delay by setting timestamp information in the RTP packets to ensure that the data is sent out at the correct FPS rate. Hence, before the data get sent out to the VLC client, there is a possibility that the buffer is overwritten with new data which will affect the video quality of the play back.

A possible way of resolving of the above mentioned problem could be to allow the encoder to write the data into the sockets after encoding 50 - 100 frames. This will ensure that the sender does not overwhelm the receiver. However, the data size sent in the socket will increase. There must a tradeoff between the data size sent and the sending rate of the video data. These issues must be addressed to achieve the desired real-time performance of the system.

Chapter 8: Conclusion

This chapter summarizes the current limitation of the project. The recommendation for future works is also discussed in this chapter. A final conclusion of the project compared with the objectives stated at the beginning of the report is also presented.

8.1. Current limitation and future works

Currently the video camera surveillance system can successfully stream encoded H.264/AVC video file over the network using unicast and multicast connections. The video quality received by the VLC player is of good quality. The live streaming aspect is not fully implemented into the system due to the following reasons:

1. The reader event of live555 blocking the execution of other events in the application causing no video data to be streamed to the VLC client.

2. The sending data rate of the encoder may be overwhelming the sending rate of the RTP packets to the VLC client.

3. The implementation of a linear buffer may cause the overwriting of the video data before it is packetized and sent via the network.

These limitations of the implementation cause the live streaming not being successfully integrated into the video streamer module of the system.

After the live streaming is successfully implemented, audio streaming can be explored. The DM6446EVM board comes with a G711 speech codec which allows recording of speech. The system can be extended to support both audio and video streaming. Issues such as synchronizing video and audio stream together can be investigated.

Currently, the encoder of the surveillance system encodes every frame. Image processing can be done to identify motion or facial features can be added to the system. The encoder can then only encode those frames and stream to the client. This can save bandwidth as it minimizes the streaming throughput.

8.2. Conclusion

After the integration of the video streamer module to the system, it is now closer to realizing an intelligent embedded camera surveillance system based on H.264/AVC coding standard. The author managed to successfully stream encoded H.264/AVC file from the hardware encoder over the network. The development process is done in a logical and justifiable manner to stream video data over the network. The integration between C and C++ was difficult and both applications were implemented using different software architecture namely multi-threading and event-driven. As only text pad, GCC printf statements and command line compilation were used, debugging of the software was time consuming and tedious.

The author understood the basis of H.264 Video coding standard and is familiar with the various network protocols used in streaming multimedia data. A significant amount of knowledge in embedded systems, software architecture implementation and network analysis were also acquired in the course of this project.




H.264/AVC is the latest video codec standard developed by the ISO/IEC Moving Picture Experts Group (MPEG) and the ITU-T Video Coding Experts Group (VCEG). This codec is based on block-oriented motion-compensated technique. It is also known as International Standard (ISO/IEC) 14496-10 - MPEG-4 Part 10, Advanced Video Coding.

The objective of this codec is to produce a good quality video at lower bit rates. It also enhances the compression efficiency so as to enable data to be streamed over the network easily. H.264's robustness to data error or losses allows minimal losses when streaming multimedia over the network. Due to these advantages, the bandwidth required for streaming is relative lower than video coding schemes.

The video coding standard specified by ITU-T and ISO/IEC only reflects the syntax of the H.264/AVC bit stream and decoding process of the syntax elements. This allows developers maximal freedom to implement and to optimize their encoders for their specific requirements provided that that the encoded bit stream conforms to the syntax of the H.264/AVC. The scope of standardization is clearly illustrated in 2 below.

The H.264/AVC standard comprises of two layers: Video Coding Layer (VCL) and the Network Abstraction Layer (NAL). VCL handles the signal processing of the video content to create the relevant bit stream. NAL adds appropriate header information to the VCL's bit stream to facilitate the transmission of data through various network protocols. Therefore the NAL acts as an intermediate layer to relay data to the transport and higher layers. 3 shows the structure of H.264/AVC encoder and the relationship between VCL and NAL.

. Video Coding Layer (VCL)

The H.264/AVC VCL is implemented by block-based hybrid video coding model. The block units represent the each coded pictures. The block units consist of luma and chroma samples. These blocks units are called marcoblocks. The VCL contains two main algorithms: Inter- prediction and Intra-prediction. The s 4 and 5 below shows the implementation of H.264/AVC encoder and decoder used in this project.

The ‘forward' path depicts the encoding process at a macroblock level. Every macroblock is encoded by different modes namely intra mode or inter mode. The reconstructed picture samples are used as a reference to form a prediction P. For intra mode, prediction is based from the current slice that was encoded, decoded and reconstructed previously. For inter mode, it takes reference from one or two encoded pictures previously. These pictures are motion compensated and makes up the prediction.

A residual block is formed from the difference between the current block and the prediction block. The residual block undergoes transformation and is converted into a transform domain where it contains a block of transform coefficients. These coefficients pass through quantization, reordering and entropy-coding. Finally, entropy-coded coefficients together with the additional encoding information are packed into a NAL unit. The NAL unit is then transmitted or stored.

The original block is reconstructed from the original block. This process is called the ‘reconstruction' path. The quantized coefficients of X are undergoes inverse transform coding and this form an original reconstructed block uFn`. The filter acts as a deblocking filter to reduce the blocking distortion.

Fundamentals of Frames

H.264/AVC has different profiles. Different profiles make use of different frames namely I-frames, P-frames and B-Frames by the encoder. 2.5 shows the sequences of frames used in encoding a video picture.

· I-Frame (intra-frame): self- contained frame which is decoded without the use of other reference frames. Commonly, the first frame of the video sequence is the I-frame. The I-frame is transmitted for new viewers or to resynchronize damaged bit stream. The overhead is that it takes up too many bits.

· P-Frame (inter-frame): predictive inter-frame which is dependent on the previous I-frame and P-frame to code the frame. Unlike I- frames, it requires lesser bits but it is prone to transmission errors due to its dependency to previous frames.

· B-Frame (inter-frame): bi-predictive inter-frame which takes reference to previous and future frames. It produces a large number of prediction modes for each marcoblocks which enhances the compression efficiency. This leads to lower bitrates with improved prediction accuracy.

Intra- Prediction

Intra prediction uses the concept of spatial redundancy between adjacent macroblocks in a particular frame. There are three modes of intra prediction namely: Intra_4x4, Intra_16x16 and I_PCM modes. The modes determine the size prediction region and its respective algorithms.

* Intra_4x4: mode to predict 4x4 luma blocks. It is very apt for regions with significant details or fast coding sequence.

* Intro_16x16: mode to predict 16x16 luma blocks. Suitable to code smooth areas of pictures.

* I_PCM: does not have any implementation of prediction and sends transformed coding and samples directly.

The intra_4x4 and 16x16 have different prediction modes. The intra_4x4 consists of 9 prediction modes and the intra_16x16 has 4 prediction modes. The prediction modes are summarized in the tables below.

The concepts of intra-prediction on macroblocks are based on extrapolation. Extrapolation is defined as the constructing new data points beyond the known discrete domain. The concepts are further explained in the s below.


Inter prediction exploit the concept of temporal locality between consecutive frames for compression. A prediction model is created by one or more previous frames of variable block sizes.

Macroblocks are divided into luma sizes of 16X16, 16X8, 8X16 and 8X8 samples. The 8X8 samples are further divided into 8X4, 4X8 and 4X4 luma samples. These samples are coded with respective motion vectors which determine the predicted translational displacement of the samples with respect to the predicted frame. This allows provision for greater motion compensated flexibility. The shows the marcoblocks partitioning.

Motion Vectors

Each partition or sub partition of a macroblock is predicted from the same area size of a reference picture. The difference between the two samples is a quarter resolution for the luma component and one-eighth of the chroma component. However, the luma and chroma components are not present in the sub sample positions due to the motion vector predictions. Hence, it is vital to generate these extra samples through interpolation from surrounding samples.

Interpolation samples

The prediction values of half samples are generated by using a 6 tap FIR filter. The values of b and h are found out by calculating the immediate values b1 and h1 by using:

Next the value of j are obtained by:

where cc, dd, ee and ff are also obtained through its immediate value.

The samples at the quarter samples positions a, c, d, n, f, i, k, and q are calculated by averaging with the upward rounding of the two nearest samples at integer and half sample positions given by:

The below illustrates the generation of interpolation samples.

Entropy Coding

H.264/AVC supports two types of entropy coding schemes: Context-Adaptive Variable Length Coding (CAVLC) and Context-Adaptive Binary Arithmetic Coding (CABAC).Both the coding schemes employ the mapping of syntax elements to a codeword table to improve the performance.

Context Adaptive Variable Length Coding (CAVLC)

This coding scheme encodes the residual and zigzag blocks of the transform coefficients. The scheme uses switching of different VLC tables for various syntax elements and with respect to already transmitted elements. This improves the overall entropy coding performance.

Context Adaptive Binary Arithmetic Coding (CABAC)

This coding scheme uses the probability analysis at the encoder and decoder to evaluate the transform coefficients. Due to dynamic statistics of the video frame, the probability analysis is suitable for this scenario. This scheme also can reduce the bit rate which improves the overall entropy coding.

Deblocking Filter

In a block-based coding scheme, accidently productions of visible blocking artifacts are common. The intent of the In-loop deblocking filter is to remove the blocking artifacts by controlling the strength of filtering through values of the syntax elements. The filtering of the samples is determined by a quantization parameter. This parameter sets the threshold for filtering to occur.

In the event that the absolute difference between the samples close to a block edge is relatively large and still below the threshold, it is deemed as a blocking artifact. Filtering of the edge region can be applied to smooth out. If the coarseness of the quantization cannot explain the large difference of the samples, it is regarded as the actual behavior of the picture. Hence no filtering will be applied.

Video Content Organization

H.264/AVC bit stream consists of sequence of encoded pictures which can form entire frames or single field. Frame has two interleaved fields namely top and bottom fields. Even number rows belong to top field and odd number rows belongs to bottom fields. If the two fields are captured at different time instances, it is called an interleaved frame; else it is called a progressive frame.

A picture is divided into macroblocks of fixed sizes. The sequence of macroblocks is categorized into slices or slice groups. By using the Flexible Macroblock Ordering (FMO), each macroblock are mapped to a slice group by using a unique identification number. Macroblocks belonging to the same slice group are processed faster by scan order. The illustrates the macroblock organization.


Setting of DM6446 board system

This chapter illustrates the initial setting up and execution of various demo programs that came along with the DM6446 board. This involves the booting of kernel via minicom, setting up of Network File System (NFS) and execution of these demo programs. These are further explained in the following sections of this chapter to provide a deeper understanding of the implemented system of this project.

System Setup via Minicom

The DM6446 comes with a Montavista Linux kernel. The kernel has to boot up before running the programs on the DM6446. Since for this project, we are not using any Integrated Development Environment, it is essential for flash the program into the board in order to execute the programs that we develop. Hence, we need to use a HyperTerminal application to boot up the kernel of the board. Minicom is the Linux equivalent of the HyperTerminal application. The shows the setup of minicom and the following steps are summarized below:

1. Open the minicom application by typing minicom -s in the terminal. This will launch the below .

2. Con the above parameters A,E,F and G in this way:

* A: /dev/ttyS0

* E: 115200 8N1

* F and G: No

3. Save the changes and exit.

Once the changes have been made, re open the application. Wait for 5 seconds and then turn on the power. This should automatically boot up the montavista Linux of the DM6446 board. Then type “root” to login to the DM6446 board.

Setup of Network File System (NFS)

The Network File System (NFS) is a protocol that allows clients to access files via network in the same way as how they are accessed in local storage. For the ease of development, the NFS of the Linux host is mounted on the DM6446 board. The board (client) can then access the NFS and run the executables on the board. In order to achieve that, the target file system must be exported. The following steps explain the steps involving setting of NFS:

1. On the Linux host, login in as user. Then make a location for the montavista file system.

2. Now, change user to root by typing

3. Copy the montavista file system and set appropriate permission rights to the shared area. Replace <useracct> with your username.

4. Locate /etc/exports file on your Linux host. Edit the file by adding the following line:

5. Type these commands to export the file system and to restart the NFS server. Note that these must be done as root.

6. Now open the minicom application. Turn the power of the board. Interrupt the automatic booting sequence by pressing any key. The appropriate environment variables have to be set in order for the board to mount the NFS correctly. Type these following commands:

7. Finally save the setting by typing:

Now, the board is booted via NFS.

Assigning static IP address to board

When the board boots up the kernel via NFS server, a dynamic IP address is assigned to the board. This is attributed to the bootargs parameter which specifies the ip address to be assigned by DHCP. It is impractical for the client to know which IP address the board uses if it connects to the board through a remote station. Therefore, a static IP address is assigned to the board. The following steps illustrate how this is done:

1. Open /home/ansary/workdir/filesys/etc/network/interface file.

2. In the file, add the following statements

auto etho

iface etho inet static






These setting will set the appropriate static IP address.

3. Boot up the board using minicom and login in. Type these command to check if the appropriate IP address is set:

$ /sbin/ifconfig

The shown IP address should be the same as the one that was specified in the file. Hence, static IP address is successfully set. The below shows screenshot of checking the board's static IP address.

Compiling and running DM6446 programs

In this section of the report, a brief discussion of how to compile and execute the programs will be highlighted. As there is no IDE, most of the compiling and running is done via terminal in the Linux OS. The setting up of build environment to compile and executing the programs will also be discussed.

Setup of Build Environment

The project is developed on the ARM processor. In order to build the executables for the ARM processor to execute, it has to be cross-compiled with the montavista tool chain tools and compiler. The PATH must be set to the montavista tool chain by adding these commands:

$ nano ~/.bashrc

Under number of user specified aliases and functions, add this statements:

Once the path is set, the programs can be cross compile to be executed in the DM6446 board.

Compiling and running of programs

Before compiling the programs, edit the Rules.make file in the dvevm_1_10 directory by modifying these lines:

The EXEC_DIR will store all the executables from the compilation into the specified directory. Then in the dvevm_1_10, type these commands:

Once the programs are compiled, it will be stored in the NFS EXEC_DIR path. In order to run the programs, boot up the board kernel by minicom. After login, type these commands,

$ cd /opt/dvevm

The first command will change into the directories that the executables are stored into. The second command will initialize the memory pools needed for the various programs by using the CMEM module.

$. /encode -v test.264 -r 352x288

The last command executes the encode program. The parameters -v specifies the relevant file name to store the encoded bit stream and -r specifies the resolution size. For this project, CIF resolution (352x288) is used.

Other parameters are also supported by this encode program namely:

-s: name of speech file to store encoded speech

-b: specifies the bit rate of the encoding of video

-t: specifies the duration of the execution of the program

-h: prints help message on console

Read more: http://www.ukdissertations.com/dissertations/computer-sciences/video-streaming-system.php#ixzz2OhO00s6l
  • 0
  • 0
  • 0
  • 一键三连
  • 扫一扫,分享海报

Programming multi-core and many-core computing systems Sabri Pllana, Linnaeus University, Sweden Fatos Xhafa, Technical University of Catalonia, Spain Provides state-of-the-art methods for programming multi-core and many-core systems The book comprises a selection of twenty two chapters covering: fundamental techniques and algorithms; programming approaches; methodologies and frameworks; scheduling and management; testing and evaluation methodologies; and case studies for programming multi-core and many-core systems. Program development for multi-core processors, especially for heterogeneous multi-core processors, is significantly more complex than for single-core processors. However, programmers have been traditionally trained for the development of sequential programs, and only a small percentage of them have experience with parallel programming. In the past, only a relatively small group of programmers interested in High Performance Computing (HPC) was concerned with the parallel programming issues, but the situation has changed dramatically with the appearance of multi-core processors on commonly used computing systems. It is expected that with the pervasiveness of multi-core processors, parallel programming will become mainstream. The pervasiveness of multi-core processors affects a large spectrum of systems, from embedded and general-purpose, to high-end computing systems. This book assists programmers in mastering the efficient programming of multi-core systems, which is of paramount importance for the software-intensive industry towards a more effective product-development cycle. Key features: Lessons, challenges, and roadmaps ahead. Contains real world examples and case studies. Helps programmers in mastering the efficient programming of multi-core and many-core systems. The book serves as a reference for a larger audience of practitioners, young researchers and graduate level students. A basic level of programming knowledge is required to use this book. Table of Contents PART I: FOUNDATIONS CHAPTER 1: MULTI- AND MANY-CORES, ARCHITECTURAL OVERVIEW FOR PROGRAMMERS CHAPTER 2: PROGRAMMING MODELS FOR MULTICORE AND MANY-CORE COMPUTING SYSTEMS CHAPTER 3: LOCK-FREE CONCURRENT DATA STRUCTURES CHAPTER 4: SOFTWARE TRANSACTIONAL MEMORY PART II: PROGRAMMING APPROACHES CHAPTER 5: HYBRID/HETEROGENEOUS PROGRAMMING WITH OMPSS AND ITS SOFTWARE/HARDWARE IMPLICATIONS CHAPTER 6: SKELETON PROGRAMMING FOR PORTABLE MANY-CORE COMPUTING CHAPTER 7: DSL STREAM PROGRAMMING ON MULTICORE ARCHITECTURES CHAPTER 8: PROGRAMMING WITH TRANSACTIONAL MEMORY CHAPTER 9: OBJECT-ORIENTED STREAM PROGRAMMING CHAPTER 10: SOFTWARE-BASED SPECULATIVE PARALLELIZATION CHAPTER 11: AUTONOMIC DISTRIBUTION AND ADAPTATION PART III: PROGRAMMING FRAMEWORKS CHAPTER 12: PEPPHER: PERFORMANCE PORTABILITY AND PROGRAMMABILITY FOR HETEROGENEOUS MANY-CORE ARCHITECTURES CHAPTER 13: FASTFLOW: HIGH-LEVEL AND EFFICIENT STREAMING ON MULTICORE CHAPTER 14: PARALLEL PROGRAMMING FRAMEWORK FOR H.264/AVC VIDEO ENCODING IN MULTICORE SYSTEMS CHAPTER 15: PARALLELIZING EVOLUTIONARY ALGORITHMS ON GPGPU CARDS WITH THE EASEA PLATFORM PART IV: TESTINE, EVALUATION AN OPTIMIZATION CHAPTER 16: SMART INTERLEAVINGS FOR TESTING PARALLEL PROGRAMS CHAPTER 17: PARALLEL PERFORMANCE EVALUATION AND OPTIMIZATION CHAPTER 18: A METHODOLOGY FOR OPTIMIZING MULTITHREADED SYSTEM SCALABILITY ON MULTICORES CHAPTER 19: IMPROVING MULTICORE SYSTEM PERFORMANCE THROUGH DATA COMPRESSION PART V: SCHEDULING AND MANAGEMENT CHAPTER 20: PROGRAMMING AND MANAGING RESOURCES ON ACCELERATOR-ENABLED CLUSTERS CHAPTER 21: AN APPROACH FOR EFFICIENT EXECUTION OF SPMD APPLICATIONS ON MULTICORE CLUSTERS CHAPTER 22: OPERATING SYSTEM AND SCHEDULING FOR FUTURE MULTICORE AND MANY-CORE PLATFORMS
©️2021 CSDN 皮肤主题: 技术黑板 设计师:CSDN官方博客 返回首页
钱包余额 0