Objectives for Lecture
Related learning outcomes
3) Produce a Multimedia application and deliver it through a browser. L7,8 A2
7) Describe MPEG audio standards for the WWW. L5 A2
Producing and delivering Multimedia applets in browsers (translation and bandwidth issues). L4-9
This lecture looks specifically at the streaming of audio-visual material over the web as it stands now, and in the near future. Streaming is used for audio and video playback of recorded and live content. It is also used for video conferencing as well as Voice-over-IP (VOIP), which is currently used for cheap telephone calls, but which will be the conduit for almost all voice telephone calls within the next five years.
Downloading files for later playback is a separate issue. However as files shrink and speeds increase, there is a blur between streaming and download and play. Near Video on Demand - broadcasting a movie with 15 minute delayed start on adjacent channels - is one example. MP3 and AAC file formats, illustrate the underlying technology. There is a great deal more to examine in these formats and similar ones. Also, future technologies, eg HTTP-NG - the next generation web protocol, need separate consideration. This lecture can only serve as a jumping off point for further research.
ISO OSI 7 layer model
(Dotted line = virtual flow, solid line = actual flow. Each layer adds its own addressing data to the basic data packet.)
See Chapter One, Buchanan "Mastering the Internet". This is a way of looking at data communication that allows you to assume the underlying layers. This process of abstraction is like driving a car, without fully knowing how the engine or brakes work.
For example TCP/IP covers the Transport and Network layers respectively. How the physical and data link layers work beneath it are irrelevant.
The model is not universally admired - some protocols require the re-implementation of the lower levels.
MPEG and JPEG compression standards
See lecture 6, MMT 1 and accompanying material in teach directory. See also http://www.mpeg.org/MPEG/audio.html#audio-overview for an overview of all MPEG audio standards and http://www.tnt.uni-hannover.de/project/mpeg/audio/faq/ for the most uptodate FAQ
Basic Copyright law for audio-visual content
Intellectual Property Rights (IPR) are covered in Multimedia Development Methods and see PRS/MCPS Masterclass handout, or www.mcps.co.uk
As a result of Napier University's work with Memory Corporation (http://www.mp3-go.com/), there has been much coverage in the national press of this issue. See http://www.scotsman.com/, select Tech and view the article "Listen without prejudice 16/3/1999" for a typical example
MP3 is the second most popular search word on the web. Many hundreds of thousands of songs are out there for free downloading - most illegal copies. Every download is technically an illegal act, but, then so is every cassette or MD copy of a CD track.
Each file is compressed to 5-10% of the original file size. A three-minute pop-song would normally be 30MB, and take 15-20 minutes to download on a 33.6kHz. Compressed, this will take 1-2 minutes. Not long, but not immediate.
Each file is compressed to an international standard, but the algorithms for doing this vary. Source code is available publicly, but most are copyright. This has led to the enjoyable sight of small companies, who produce software designed to infringe music creators' IPR, suing each other for usage of proprietary software.
[PR Newswire, 15MAR] LOS ANGELES -- PlayMedia Systems, Inc., a leading MP3 Internet music technology firm, has filed a federal copyright lawsuit in U.S. District Court in Los Angeles, seeking over $20 million in damages and permanent injunction against Nullsoft, Inc., maker of the popular "WinAmp" MP3 player; PlayMedia Systems, Inc., et al. v. Nullsoft, Inc., et al., U.S.D.C.
Case No. 99-02494 AHM (Mcx).
The Secure Digital Music Initiative (http://www.sdmi.org/) is an attempt to establish standards to protect writers, musicians and the companies that market them from theft of their work. Or it's an attempt by chemical-abusing fat-cats to shut the stable door after the horse has bolted, according to your taste!
More recently a paper (http://www.mp3.com/news/196.html) discussing the inability of Linux to implement SDMI provoked a typical response in the mp3.com message board.
See http://bboard.mp3.com/mp3/ubb/Forum4/HTML/000193.html. Discuss the ethical issues informally in groups after this lecture.
From the FAQ for MPEG AAC (Advanced Audio Coding)
- very high audio quality at a rate of 64 kb/s/channel for multichannel operation.
- up to 48 main audio channels,
- 16 low frequency effects channels,
- 16 overdub/multilingual channels,
- 16 data streams.
- 16 programs can be described, each consisting of any number of the audio and data elements.
There are three profiles for the AAC standard, called Main Profile, Low Complexity Profile, and Scalable Sampling Rate Profile.
The Main profile is intended for use when processing, and especially memory, are not at a premium.
The Low Complexity profile is intended for use when cycles and memory use are constrained, and the SSR profile when a scalable decoder is required. The Main and LC profiles have been tested at 320 kb/s for 5-channel audio programmes, and both have demonstrated better quality than competing audio coding algorithms running at 640 kb/s for the 5-channel program.
When MPEG-2 was in development, the need for improved audio capability, to support cinema-style 5:1 speaker set-ups, and encryption and copy protection were known. However the standard itself does not support encryption. Vendors have added encryption in linked schemes based on AAC, such as http://www.a2bmusic.com/. (See policymaker.doc and musicipp.doc from this site, which are cached in the teach directory) However commercial take-up has been slim, and hackers claim to be able to crack it. This is sometimes only done by re-digitising the analogue output, and sometimes by low level code interrupting the digital audio bitstream feed to the soundcard.
LiquidMusic, who have worked with PRS/MCPS have a commercially working solution, but it depends on the creditcard details being embedded in the digital license for each downloaded track - not a solution for distribution of Spice Girls tracks!
The following is taken from "Music on the Internet and the Intellectual Property Protection Problem" Lacy, Snyder & Maher (musicipp.doc)
We need to think very carefully about the way in which we make available digitally stored music, both compressed and original. There are three fundamental requirements:
- We need to prevent access to uncompressed cleartext originals.
- We need to associate licensing information with compressed music files, and prevent access to the cleartext, compressed music.
- We need to ensure that mechanisms that can play back the compressed music are carefully controlled.
With the focus on available audio content, most web-users lose site of the fact that the interactive transmission of audio and video is the area that has most benefits to business and indivduals. Video telephone calls, business video-conferencing, white-board and application-sharing are reality for many (though still not enough!)today.
See http://www.dstc.edu.au/RDU/staff/jane-hunter/video-streaming.html for a good overview of the enabling technology, in particular the H.261, H.263, MJPEG, MPEG1, MPEG2 and MPEG4 standards that apply in this field. In order to get a handle on this, you need to gain a good understanding of the OSI seven-layer model. H261 is the standard that supports almost all video-conferencing products.
In order to stream, several things need to be modified from TCP/IP. Firstly, what to do about missing packets. Since a/v material is generally delivered with drop-outs rather than pauses when things go wrong, current streaming technology relies on UDP (User datagram protocol) rather than TCP. A further refinement is the multivendor-developed RTP (Real time protocol) which adds timing reconstruction, loss detection, security and content identification to multicast/unicast and quality-of-service support.
In Oct 1996, Progressive Networks and Netscape developed the RTSP (Realtime streaming protocol) which defines the connection between streaming media client and server software. This underpins the RealVideo and RealAudio prodcuts available today.
RealAudio and RealVideo
See http://www.real.com/devzone/library/whitepapers/index.html for all the theory behind the products.
See http://service.real.com/help/library/guides/production/realpgd.htm for an interactive guide to production.
The following excerpt from their marketing literature describes the product range:
|RealSystem||G2 includes improved versions of RealAudio, RealVideo and adds two new important data types, RealPix and RealText. In addition, RealSystem G2 now supports many new streaming third-party data types and standard media types.|
|RealPix||Allows existing image formats like JPEG to be easily added to presentations, offering powerful transition effects and overlay capabilities. RealPix allows content creators to be only a scanned image away from compelling streaming multimedia.|
|RealText||Allows both static and live XML-compliant text to be added to presentations with powerful effects like smooth scrolling, selectable fonts, and selectable colors.|
|RealVideo||Available with smoother video playback using post-filtering which scales to client CPU capability, automatic bitrate scalability across all bandwidths, and improved live performance delivering higher frame rates.|
|RealAudio||Offering 80% greater frequency response for 28.8Kbps modem connections, and dramatically increased packet loss tolerance using sophisticatd interleaving and loss interpolation techniques.|
|RealFlash||Combines the compelling animation technology from Macromedia with the leading streaming media technology from RealNetworks to deliver high quality animation synchronized with RealAudio.|
|Third party datatypes||Extensive 3rd party support delivers new datatypes such as VRML, MIDI, MPEG, and more from leading developers such as Iterated Systems, LivePicture, LiveUpdate, Macromedia, OZ, and P7.|
|Standard datatypes||A wide range of standard media types include AVI, WAV, ASF, VIVO, MPEG, JPEG, AU, AIFF|
From the paper described at the start of this section (Jane Hunter , Varuni Witana , Mark Antoniades)
Progressive Networks has recently launched RealVideo, the streaming video version of their well-known RealAudio product. Both server and client versions have been released. In addition Progressive Networks have released a range of video-oriented content development tools, some their own, others developed by third parties. Users need to install the RealServer 4.0 and the RealPlayer Plus 4.0. It uses the RTSP protocol on top of UDP. Users apparently have a choice of either fixed or optimized frame rate encoding in the new RealVideo encoder.
Users choose between a number of pre-defined encoding templates which correspond to the most appropriate audio and video formats for a given bandwidth. "Stream thinning" detects poor or congested Internet connections and will dynamically adjust the video frame rate in real-time. This is presumably frame dropping. "Smart networking" automatically delivers audio and video streams via the most efficient network protocol. This is presumable choosing between TCP, UDP or UDP multicast. The choice of TCP would be to deal with firewall restrictions blocking UDP.
Progressive Networks have recently licensed in ClearVideo, a fractal-based video compression technology from Iterated Systems (see http://www.iterated.com) to complement their internally-developed compression methods. RealVideo 1.0 provides two codecs RealVideo Standard (developed by Progressive Networks) and RealVideo Fractal (using Clear Video technology from Iterated Systems, Inc.).
Microsoft ASF (Advanced Streaming Format)
Push v Pull Technology
Why broadcast every byte of every clip to every user? Why swamp the internet (or at least a web-site) with multiple downloads of the same material. Wasn't this the reason Caxton invented the printing press?
This was the rationale behind developing Push technology - streamed content that you "tune" into. By registering as a recipient for a data stream, you tap into the data-packets that are being sent anyway to other users. This is particularly significant when we consider the overhead in delivering video training material in an intranet - where bandwidth is most definitely finite. The following table, produced by Napier MSc student Murray McPherson, shows the overall data requirement for a number of scenarios of users, networks and "channels" of video content
|20||1||Local Ring Pull||20 x 64k||20 x 64k|
|20||1||Local Ring Push||1 x 64k||1 x 64k (+ push overhead)|
|20||1||Two Sites Star Pull||10 x 64k||20 x 64k|
|20||1||Two Sites Star Push||1 x 64k||1 x 64k (+ push overhead)|
|20||1||Many Sites Pull||2 x 64k||20 x 64k|
|20||1||Many Sites Push||1 x 64k||1 x 64k (+ push overhead)|
Now consider this table, should 5 or 20 separate programmes be broadcast.
MPEG-4 is about the combination of existing compression approaches, with structured synthesis of sound. MIDI is a primitive example, and Speech synthesis a more complex example of the latter. With new technologies such as 3D sound. By combining these, audio communication can be meaningful even at very small bitrates. To this end different types of compression are available to match the type of audio signal:
- a parametric codec for the lower bit rates in the range,
- a Code Excited Linear Predictive (CELP) codec for the medium bit rates in the range,
- a Time to Frequency (TF) codecs, including MPEG-2 AAC and Vector-Quantiser based, for the higher bit rates in the range.
For more information on MPEG-4, see http://www.tnt.uni-hannover.de/project/mpeg/audio/faq/
Speed control, pitch change, error resilience
- shrinking the bitstream size in the decoder or in transmission
- eliminating parts of the bandwidth - removing less relevant frequencies
- varying complexity of the encoding
- varying complexity of the decoding
- avoid, or conceal, problems due to transmission error
By taking realistic objects - the voice of a speaker or an instrument, and where appropriate aggregating them into a group eg a choir, orchestra, crowd at a (virtual) football match, economies of compression can be achieved, as well as greater control. More than a single audio channel can be associated with an object, as can movement, position and commands such as the conductor's baton, or the stimuli of a goal.
The following are in the mpeg-iso folder of lecture 6 in MMT1 teach directory
- MP3 Source code (mp3src.zip)
- AAC Source Code (aac.zip)
- International Organisation For Standardisation Iso/Iec Jtc1/Sc29/Wg11 Coding Of Moving Pictures And Audio (W2006.Zip)
Murray McPherson MSc Dissertation Napier University 1999
David Oxley MSc Dissertation Napier University 1998 (edit)
Comparison of different a/v file formats: http://www.dstc.edu.au/RDU/staff/jane-hunter/video-streaming.html
VideoConferencing Protocols http://www.ietf.org/html.charters/mmusic-charter.html
HTTP-NG William C Janssen JR Xerox Palo Alto, IEEE Internet Computing Jan/Feb 1999
D Larner HTTP-NG Web Interfaces http://info.internet.isi.edu/in-drafts/files/draft-larner-nginterfaces-00.txt
Distributed Desktop Training in the Corporate Environment: http://www.xingtech.com/.