Image and Video Compression Techniques

Image and Video Compression Techniques

Digital image and video compression is now essential. Internet teleconferencing, High Definition Television (HDTV), satellite communications and digital storage of movies would not be feasible without a high degree of compression. As it is, such applications are far from realizing their full potential largely due to the limitations of common image compression techniques. The limitations are inherent in the information theory on which they are based, that published by Claude Shannon in 1948. The Shannon theory has lead modern communications into a theoretical trap from which it is difficult to escape.

The Shannon theory defines “information” merely as binary digits (bits). Data content is irrelevant. The bit rate in television is therefore determined entirely by the system's hardware parameters, such as image size, resolution and scanning rates. The images shown on the screen are irrelevant such that a random noise image requires the same bit rate as a blank image.

Improving television with larger screens and better resolution requires a huge increase in transmission bit rates. The bit rates are ,however, limited by the available broadcast spectrum or network connection. The only recourse is lossy image compression, most commonly JPEG, MPEG-2, Wavelets or Fractals. “Lossy” by name and lossy by nature. The more the image is compressed, using lossy methods, the worse the image quality.

Autosophy information theory may offer an escape from the trap. Autosophy re-defines “information” as depending only on image content and motion. Hardware parameters such as image size, resolution and scanning rates become virtually irrelevant. Static images produce very low bit rates, while fast action sequences require higher bit rates.

Autosophy transmission is ideally suited to the new packet switching networks, such as Internet TCP/IP, ATM and the future Information Superhighway. Autosophy’s built-in encryption capabilities can also ensure the security of communications even via public networks. Self-learning multimedia databases and robot vision systems are areas of further potential.

 


Bit rates in digital television: Conventional vs. Autosophy

Bit rates and communication protocols in conventional digital television are determined entirely by system hardware, such as image size, resolution and scanning rates. Images are formed by “pixels” in ordered rows and columns where each pixel must be constantly re-scanned and re-transmitted. According to the CCIR-601 industry standard digital television comparable to analog NTSC television would contain 720 columns by 486 lines. Each pixel is represented by 2 bytes (5 bits per color = 32 brightness shades) which are scanned at 29.97 frames per second. That would require a bit rate of about 168 Mb/s or about 21 Mega bytes per second. A normal CD-ROM can store only about 30 seconds of such television. The bit rate will not be affected no matter what images are shown on the screen. Because the bit rate is constant transmission is best on fixed bandwidth channels, such as the 6.75 MHz analog channel in commercial NTSC television. Transmitting the images via packet switching networks faces severe difficulties including a need for huge compression ratios. But, the more the images are compressed the worse the image quality.

Every increase in screen size, resolution or frame rates makes the problem worse and requires ever-greater compression ratios. Hence the rather poor image quality of so-called High Definition Television (HDTV) and especially of Internet teleconferencing and streaming video.

Required compression ratios for package television via commercial channels

  NTSC TVHDTVFilm quality
ChannelBit rate168 Mb/s933 Mb/s2300 Mb/s
PC local LAN30 kb/s5,600:131,000:176,000:1
Modems56 kb/s3,000:117,000:141,000:1
ISDN64 - 144 kb/s1,166:16,400:116,000:1
T-1, DSL1.5 Mb/s112:1622:11,500:1
Ethernet10 Mb/s17:193:1230:1
T-342 Mb/s4:122:154:1
Fiber optic200 Mb/s1:15:111:1

The table above shows some discouraging facts about the transmission of digital images via commercial networks. With the exception of costly fiber optic cables, each channel requires enormous compression ratios which cannot be achieved with acceptable image quality using conventional compression methods. Internet video is currently of very poor quality with extremely jerky motion. Even the HDTV standard approved by the Federal Communications Commission (FCC) produces rather blurred images with a jumpy flickering motion that is almost dizzying.

In Autosophy television, in contrast, the bit rate is dependent only on motion within the images. Screen size, resolution and scanning rates are virtually irrelevant. Motion is defined in increments of 1024 pixels/sec (kp/s) which, in normal television, is approximately one square inch of changed screen per second. Change may be distributed throughout any part or portion of the screen image. Motion is usually generated by large moving objects which generate change in both their leading and trailing edges. High motion values are also generated by severe panning of the camera. The human eye can perceive very fine color resolution but only within static images. Rapid motion reduces fine detail perception. The human eye can either perceive fine color resolution or rapid movement but not both at the same time. The “true information bandwidth” of the human eye can be defined by the Autosophy information theory.

Autosophy television channels for various averaged motions within the images

Average motion very slowslownormalfast
  2 kp/s4 kp/s8 kp/s16 kp/s
ChannelBit rate12 kb/s24 kb/s48 kb/s96 kb/s
PC local LAN30 kb/s2.51  
Modems56 kb/s421 
ISDN64 -144 kb/s12631
T-1, DSL1.5 Mb/s125623115
Ethernet10 Mb/s833416208104
T-342 Mb/s35001750875437
Fiber optic200 Mb/s16,6668,3334,1662,083

Assuming a very large television screen with 2048 by 2048 (2k by 2k) pixels and 7 bit resolution per color, each kp/s (1024 pixels change per second) would generate a bit rate of about 6 kb/s (6000 bits per second). Motion within the images is usually not continuous. Periods of slow motion are interspersed with periods of rapid motion. The figures above are for an average motion integrated over time. They would allow a PC to PC teleconferencing session via a normal PC local LAN, but only with slow motion within the images. T-1, Ethernet and fiber optic connections could carry hundreds or thousands of simultaneous teleconferencing sessions. The bit rate for each television transmission may be expressly limited by motion feed-back, as explained later. The same methods apply to improvements in memory storage capacities, allowing the storage of a full length motion picture on a credit card sized CAROM module.

In addition to orders of magnitude image compression Autosophy methods have other important advantages.

The transmission of conventional analog television is best accomplished using fixed bandwidth channels such as the 6.75 MHz NTSC channels. Transmitting such television via the new packet switching networks (such as ATM or Internet TCP/IP) is very difficult and requires a rigidly defined Quality of Service (QoS). Autosophy television in contrast is ideal for the new packet switching networks because slow moving images will produce a slow packet rate, while rapidly moving images increase the packet rate. The network can then be shared by many users each producing packet bursts only when motion occurs in their images. Autosophy television is also much less sensitive to transmission errors or packets being dropped in a congested network.

Because the bit rates in conventional television are determined by the hardware, each advancement in technology towards larger screens and better cameras requires a new transmission standard which may not be compatible with previous standards. For Autosophy television, in contrast, a hardware independent communication protocol could be developed. This would allow evolution towards larger and better screens without any change to communication protocols. Television cameras and monitors could have different screen sizes, resolutions and scanning rates; yet communicate in a universal protocol which would always remain backwards compatible.

Autosophy television's built-in encryption option allows secure teleconferencing via the Internet or satellite without any possibility of unauthorized interception. It would largely solve the security problems associated with the Internet today.

 


Cosine transform compression in the JPEG, MPEG-2 standards

Cosine transforms are used in JPEG compression for still images, MPEG-2 compression for moving video, and the FCC-standard for HDTV. All use variations on the basic methods, explained below.

The basic idea was conceived by a French mathematician, for whom it is named the Fourier Transforms. Fourier discovered that any repeating signal, such as vibrations or sound waves, can be converted from samples into a set of frequency values. Each higher frequency is a whole multiple of the base frequency. The Fourier transform function is implemented, for example, in test instruments and oscilloscopes to analyze vibrations and noisy transmission signals.

The cosine transform uses a similar method to convert an image pixel pattern into a set of spatial frequency values. Instead of changing brightness in time, spatial transforms change brightness within an image area. The frequency values can be imagined like image brightness waves that change from light to dark in sine wave fashion. Low frequency values change brightness slowly, while high frequency values change more rapidly. Low frequency values are found in flat, slowly-changing image backgrounds. Higher frequency values add sharp edges and crispness to the images. In short, the theory predicts that any pattern of pixel brightness samples in a television image can be converted into a pattern of spatial frequency values. The frequency values can later be used in a “reverse transform” to reproduce the original pixel brightness samples.

The input image is first cut into 8 by 8 pixel tiles where each color (red-green-blue) is represented by a separate tile. Each tile is then sequentially processed by a computer using the cosine transform algorithm.

Using a very complex algorithm the 64 pixel tile is then converted into 64 frequency values. This transformation requires enormous computing power. Converting HDTV images in real time is very difficult with today's hardware. A DC value represents the overall background brightness of the tile. Because each tile is processed separately from neighboring tiles, checkerboard-pattern image distortion may appear in compressed images if the DC value is computed even with slight errors. The transform is theoretically lossless and as such should not distort the images. However, in real electronic systems, computation is flawed and will produce only approximate values. Starting from the DC value the other 63 frequency values are scanned out in a zigzag pattern, starting from the lowest frequency values and proceeding to higher and higher frequency values.

Up to this point there is no image compression. The 64 frequency values, in fact, require many more bits than the original 64 pixel samples. Likewise, distortions arise only from flaws in the computation process. But the image compression process will now selectively remove information that is deemed to be least important to the human eye. A quantization threshold value is applied depending on the desired compression ratios. All samples smaller than the threshold are cleared out to zero. This means that the higher the threshold the higher the compression, but also the lower the image quality. Erasing smaller frequency values removes detail resolution in the images and also introduces image artifacts. The result is an image tile that is only an approximation of the image tile originally seen by the camera. Introduced image artifacts include light or dark lines like cracked paint in old paintings. Such distortions and artifacts are obviously not acceptable for scientific or medical imaging.

The 64 processed frequency values are encoded for transmission. A Run-Length encoding scheme simply counts the number of zero values in a string of zeroes and represents it as a single number. More information on the Huffman coding scheme can be found in the data compression tutorial. Basically, it compresses the data by assigning codes with fewer bits to the most often encountered output patterns. The final output is a code for each tile containing a variable numbers of bits.

Because of run-length and Huffman coding such transmissions are highly sensitive to error propagation. Even a single bit error in the transmission can cause the image to break up into random noise until an error recovery code is detected. This produces very disturbing visual effects in noisy transmissions.

The receiver reconstructs the images in reverse of the above process.

Image compression for live television requires higher compression ratios than cosine transforms can achieve. Thus many fewer frames than the 30 per second in conventional television can be transmitted. So the missing frames are simulated by “motion compensation” in the receiver, which requires enormous computation power. The actual cosine transform compressed images (I) are used to predict intermediate frames (P). Those (P) frames are in turn used to construct bi-directional frames (B) by interpolation. The images appearing on the output monitor are therefore mostly approximations or simulations created by a computer rather than the actual images seen by the camera. This produces jagged and blurry motion in the images which can be so disturbing as to cause dizziness. This is especially true in HDTV broadcast because the flickering tends to be synchronized with the 10 Hz alpha waves in our brain.
 
Despite the billions of research and development dollars invested in this method, the results are far from ideal. Demonstrations typically show fuzzy objects, such as fog or hairy animals, that render image distortions harder to detect. Or they show very slow or very fast moving scenes that best hide the effects of motion compensation. Such fudging is undesirable even for entertainment purposes and quite unacceptable for scientific or medical imaging. It is time to re-evaluate the basic technology.
 


Wavelet image compression

Wavelet compression uses bandpass filters to separate an image into images with low or high spatial frequencies. Low frequency images are those in which brightness change is gradual, for example, flat or rounded background areas. Such images appear soft and blurry. Higher frequency band images are crisp and sharp edged. Adding the frequency band images back together should reconstruct the original input image; perfectly if the processing is perfect.

A pixel data stream from an input image is divided into several sub-bands by a tree of bandpass filters. Each filter allows only a specific band of frequencies to pass. The filters may be analog or digital, but since neither kind is perfect some image distortion can be expected even at this stage.

The process takes several steps backward before taking a step forward. We began with a single input image and now have several images, each of which requires a full measure of bits. However, since low frequency images change brightness more slowly they can be sampled at a slower rate. The sampling rate is so adjusted that the highest frequency image takes half of all samples, while each lower frequency band is sampled at a progressively halved speed. The lowest frequency image is sampled at the lowest rate. In the end, the sum of the samplings from all the frequency bands is exactly the same as the single sampling of the original input image. No image compression has yet been realized. Even so, some distortion will have crept in due to imperfections in the sampling process.

Lossy image compression is applied using a quantization threshold. Samples below the threshold are cleared to zero. The higher the threshold, the more samples cleared and the higher the compression ratio. Equally, though, the more samples cleared, the greater the image distortion and the lower the image quality. Output images are therefore only approximations of the images seen by the camera.

Output samples are further processed using Run Length coding (replacing a string of zeroes with a single number) and Huffman coding (assigning shorter bit codes to more frequent patterns). The output codes are then combined in the output data stream. Such transmissions are subject to error propagation. Even a single bit error can cause the image to break up into random noise.

Wavelet compression is lossy. It will always compromise image quality to some extent. The more images are compressed, the worse the image quality. Commercially useful compression ratios can only be achieved with significant distortion. The pattern of distortion will, of course, differ from the “checkerboard” pattern arising from the cosine transforms. But whether the overall image quality is better or worse depends on the application and individual judgment. Certainly both methods require enormous computing resources and can generally only achieve low levels of compression with acceptable image quality. Higher levels of compression come with progressively greater distortion.

 


Fractal image compression and forging techniques

Forging techniques, such as fractal compression, generate images that look approximately like the originals. The human eye can be fooled to disregard the differences in some cases. Typical demonstrations are of flat images or else chaotic ones such as bird plumage.

An example can be found in graphics software packages, such as MS PowerPoint. These generate large geometric shapes from simple vector equations. The geometric objects are then filled with a pattern or color. Images generated in this way can be reduced or enlarged without changing shapes, filling patterns, or color densities. Such images are said to be resolution and size invariant in that a large image contains the same information as a small image.

A closer simulation of reality is attempted with Mandelbrot fractal equations. Simple shapes are combined to form larger and larger shapes. The larger shapes are identical to the smaller ones. Higher magnification will reveal only smaller and smaller shapes that are identical to the original shape. A good example is a mountain landscape of peaks and valleys. Higher magnification reveals smaller and smaller peaks and valleys which look like the original landscape.

For fractal image compression a “reverse Mandelbrot” procedure is used. It matches an image tile to a Mandelbrot equation that approximately simulates its pattern. The equation can then be transmitted and will produce an output tile that looks approximately like the input tile. Higher and higher magnification would reveal smaller and smaller shapes identical to the larger ones.

Such pattern substitution requires enormous computation and is very difficult to achieve in real time. Anyway, the bottom line is that the output images are mere approximations of the input images. Such forgeries can fool some of the people some of the time and may be suitable for games and other entertainment purposes. But they can have no place in scientific or medical imaging.

 


Conclusion

Hardware-defined television systems require excessive bit rates which can not be accommodated by packet switching networks. Mathematical procedures, no matter how complex, can not circumvent the basic truth that any attempt at image compression must be paid for with deterioration of image quality. The Federal Communications Commission (FCC) tried to repeal that basic law when it imposed its High Definition Television standard. It remains to be seen whether the resulting quality and price will be acceptable to consumers, whose standards are rather higher than those of bureaucrats.

 


Lossless Autosophy still image compression

Autosophy image compression is different. It uses an approach based on Autosophy information theory in which bit rate is determined not by hardware factors but by image content. Essentially, simple images are highly compressible, complex images less compressible and random noise images not compressible at all. Being radically based on image content, Autosophy compression is entirely lossless. Images are not distorted.

The degree of compression is also influenced by the quality of the camera. Cheap cameras and noisy images are not as suitable as higher quality cameras with low-noise output. Noise level may be reduced by filters as long as care is taken not to remove any essential information.

The system above is suitable for transmitting high resolution still images of any size or format via the Internet. The resolution is 7 bits per color, 128 shades for each color or better than 1 % accuracy in reproduction. That is the limit of human perception and the maximum resolution of commercial color monitors and printers. The output is in common 8 or 16 bit codes, easy to implement in storage and transmission. Compression ratios depend on complexity and noise in the images. Even random images would not produce any data expansion, while average lossless compression ratios would be about 5:1. Errors in the transmission may produce error propagation, but that should not be a problem in modern Internet communications because data packages contain error checking codes. Packages with errors are automatically re-transmitted until valid data is received.

First the image is divided into 5 by 5 pixel tiles. Each tile has a different center pixel address computed for each image format. Each 5 by 5 pixel tile is converted into a 25 pixel string by spiral scanning from each tile's center pixel address. A hyperspace library contains up to 30k by 22 bit nodes, each consisting of a 7 bit pixel brightness value (GATE) and a 15 bit POINTER. The library can contain many thousands of the most common image patterns stored in a saturating hyperspace mode. (The serial tree library is further explained in the data compression tutorial.) Each of the first 128 library locations contains a GATE equal to the 7 least significant address bits and a POINTER of all zero bits. The last 2k addresses are reserved for special communications codes (such as error checking or image format codes) embedded in the output data stream.

STILL IMAGE ENCODING ALGORITHM

MATRIX         [ GATE [ POINTER ]
Tile:                 Compute the center address of the next tile (if any).
                        Set PIXEL COUNTER = 0; Set POINTER = 0.
                        Move the brightness of the center pixel to the GATE.
Loop:               Search the library for a matching MATRIX.
                        If found then move the ADDRESS where it was found to the POINTER;
                                Increment the PIXEL COUNTER.
                                If the PIXEL COUNTER = 25 then:
                                        Output the POINTER (16 bit code); Goto Tile.
                                Else compute the next pixel location using the PIXEL COUNTER.
                                Move the next pixel brightness to the GATE; Goto Loop.
                        Else if not found then output the POINTER (8 or 16 bit code);
                                Set POINTER = 0; Goto Loop.
 
The image encoding routine converts each tile into one or more 8 or 16 bit codes. If the entire 25 pixel tile pattern is found in the library, then it is represented by a single 16 bit code. If the tile is not found in the library then it is divided into fragments, each of which is represented by an 8 or 16 bit code until the entire 25 bit tile is encoded. Simple image tiles are more likely to match tiles or fragments in the library and therefore be highly compressed. Noisy or complex tiles will be compressed to a lesser degree. But, even totally random noise images will not require any more bytes than the original image. Archivable compression ratios therefore range from 1:1 for very noisy images to 12.5:1 for simple graphic images.
 
STILL IMAGE RETRIEVAL ALGORITHM
MATRIX          [ GATE [ POINTER ]
Start:                 Set the PIXEL COUNTER = 0.
Next:                 Move the next input code to the POINTER.
                          If L=0 then: Push the input pixel into a BUFFER;
                                    Increment the PIXEL COUNTER;
                                    If the PIXEL COUNTER = 25 then: Goto Output.
                                    Else Goto Next.
Loop:                 Else use the POINTER as a library ADDRESS to fetch a new MATRIX.
                          Push the new GATE into a (First-In-Last-Out) FILO stack;
                          If the new POINTER = 0 then: Goto Pull.
                          Else Goto Loop.
Pull:                    If the FILO stack is empty then: Goto Next.
                          Else pull a brightness value from the FILO stack
                                    and push it into the BUFFER;
                           Increment the PIXEL COUNTER:
                           If the PIXEL COUNTER = 25 then: Goto Output.
                           Else Goto Pull.
Output:                Retrieve the 25 brightness values from the BUFFER to restore the tile;
                           Goto Start.
 
The retrieval algorithm restores the original tile from the transmitted 8 or 16 bit codes. Since the brightness values in each 16 bit code are retrieved in reverse order, a First-In-Last-Out (FILO) stack is required.
 
The image encoding and retrieval process may be achieved using software only. However, image encoding requires linear library searching which could require several seconds for each image. For real time image encoding a Content Addressable Memory (such as the CAROM) may be used. Near real time image retrieval is possible using a typical PC RAM. Images could therefore be compressed at relatively slow speeds for storage in a multimedia database or web server. Fast image retrieval by an internet user would require only a software plug-in. Hardware chipsets could also be developed for both real time image encoding and retrieval.
 
Autosophy image compression requires a hyperspace knowledge library generated prior to transmission. Commercial CD-ROM images may be used as input. A software package could generate the hyperspace library in about 30 minutes even without any special hardware. This could be a one time operation. For example, a universal hyperspace library could be grown as part of an Internet communications standard. Standard communication software would enable compressed transmission of any type of image via the Internet. For specialized transmissions (such a x-ray or weather map images) a special library may be grown. Special libraries may also be grown for encryption purposes. If the library is distributed only to authorized users, transmissions are secured from everyone else who does not share that library.
 


Autosophy live video compression

Bit rates are dramatically reduced in Autosophy television because they come to depend not on system hardware, but on the motion and complexity of images shown on the screen.. Autosophy video compression is especially suited to packet switching networks such as Internet TCP/IP or ATM.

According to Autosophy information theory, a communication need only transmit that which is not already known to the receiver. Everything already known is redundant and need not be constantly re-transmitted. Autosophy television therefore requires an Image Buffer in both the transmitter and receiver, which contains the entire current image. Images scanned from the television camera are compared with the current image in the Image Buffer to locate pixels that have changed brightness. The screen addresses of the changed pixels are accumulated in a Change Buffer. The new pixel brightness from the camera replaces the previous pixel brightness in the Image Buffer. Every pixel that has not changed is ignored. The changed pixel addresses in the Change Buffer are then combined, using a hyperspace library, into “superpixel” or cluster codes for transmission. The superpixel codes are used by the receiver to selectively update small clusters of pixels in its own Image Buffer. The image in the Image Buffer is periodically scanned to the output monitor.

Superpixel or cluster codes are transmitted only when change or motion occurs in the input images. If the images change slowly then only a few superpixel codes are transmitted. Fast-moving action sequences generate many more superpixel transmissions. Random noise images generate excessive transmissions unless motion feedback (as explained later) is used.

Assuming an HDTV-like image with up to 2k by 2k pixels and 7 bits per color resolution, then each superpixel or cluster code would contain 70 bits. Each superpixel code may describe change of between 2 and 25 pixels in each of the three colors (red-green-blue). Image scanning rates are irrelevant.

Autosophy information theory shows that the human eye has a limited “true information bandwidth.” It can perceive very fine color resolution only in static images; rapid movement reduces color sensitivity. In other words, it can perceive fine color resolution or rapid motion but not both at the same time. This can be exploited in a motion feedback circuit. The Change Buffer contains the number of pixels that have changed brightness in previous frames and is therefore a measure of motion. The number of changed pixel addresses in the buffer is used as feedback to the pixel brightness comparator. The brightness comparator applies a discrimination threshold. Any pixel brightness change below the threshold is ignored.

The more motion in the images the higher the discrimination threshold. For slow moving images the threshold is very low, filtering out only random noise from the camera. More rapid motion dynamically increases the threshold. Totally random images are also cut to an acceptable bit rate. The package switching rate is thereby limited even for very high motion or random noise images, without causing any distortions visible to the human eye. Only the most rapidly moving objects in the images will have temporarily reduced color motion resolution. Static portions of the images will not be affected and once the extremely rapid motion subsides, normal color motion is fully restored.

BIT RATE ESTIMATES

Average number of pixels per cluster code                                         12 (estimate)
Number of cluster codes per square inch change per second               85
Number of bits in each cluster code (see above) ________________70
Bit rate for each 1024 pixel change per second                                   5950
 
The above estimate shows that about 6000 bits per second are required to transmit motion of 1024 changed pixels per second. A 56k bits/sec modem could therefore transmit approximately 10 square inches of change to an NTSC television monitor. Real time teleconferencing is therefore possible using normal modems and monitors. There would be no image or motion distortion other than the removal of random noise from the camera images.
 

An Autosophy television system can be built with ordinary hardware. Any large memory chips will do for the Image Buffers and Change Buffers. For real time conversion, however, the transmitter requires a CAM (Content Addressable Memory). Commercially available CAM are acceptable; even better would be the Autosophy-native CAROM. For real-time playback the receiver requires only a normal Read Only Memory, which can be mass produced as a chip containing the hyperspace library. The rest of the hardware consists of run-of-the-mill integrated logic circuits. Note that there is no need for a microprocessor or program storage. The output packet codes can be sent directly to the receiver or stored on CD-ROM or DVD for later playback. A complete television encoder/receiver may eventually be contained in integrated chipsets.

 

Comparing the features of conventional and Autosophy television.

Image standards and compatibility problems

In conventional television bandwidth and transmission protocols are determined by the system hardware so every new development in camera or monitor technology requires a new standard. Old recordings are difficult to convert to a new standard and may be effectively lost. In Autosophy television, in contrast, transmission is determined by image content which can be rendered entirely independent of system hardware. This would address standard and compatibility issues once and for all. Free evolution would be possible from today's small-screen low-resolution images to future giant-screen high-resolution images, with a communications standard that would always remain backwards compatible. Even old recordings could thus be viewed on future television. Each camera or monitor can have its own image size, resolution and scanning rates and yet communicate in a standard protocol. There is no need for the transmitter and receiver to have identical numbers of rows, columns, brightness resolution, colors, and scanning rates. Images may be stored in high resolution and displayed at lower resolution.
 
Image quality after compression
Lossless image compression is theoretically impossible in conventional television, so one must resort to lossy compression, as in JPEG, MPEG-2, Wavelets and Fractals. Image quality is then determined by the available bandwidth, which determines the required compression ratios. The more the images are compressed, the worse the image distortion. In contrast Autosophy television provides essentially “lossless” compression. Every frame is complete and can be frozen or printed out, while video can be displayed in fast forward or according to indexed image searches. Slow motion will not reveal any artifacts or distortions. Indeed very high image quality is recommended for Autosophy television. Because random noise in the images is interpreted as “movement” any increase in noise increases the transmissions. The cleaner the images and the less noise, the lower the average transmission bandwidth. It is therefore cheaper to transmit high quality images, rather than noisy low quality ones.
 
Transmission bandwidth in packet switching networks
Conventional television was originally designed for fixed bandwidth analog channels, such as the 6.75 MHz NTSC channels in commercial television. The bandwidth is determined by the hardware parameters, such as image size and scanning rates. Transmitting video via packet switching networks is very difficult and requires expensive high priority channels. But in Autosophy television the transmission rate is determined solely by the movement within the images. Slow moving images require few transmissions, while fast action sequences generate more rapid transmissions. Maximum transmission rates are determined by human perception, which allows for only limited complexity and rapidity of movement. Because Autosophy transmissions occur in bursts, packet switching networks are the ideal transmission medium.
 
Effects of latency and transmission errors
Errors are almost inevitable in any communication. Satellite transmission errors are introduced, for example, by thunderstorms or solar flares. Internet communication packets are sometimes delayed or dropped due to network congestion. Virtual network packets may be transmitted via different routes and arrive at the receiver with unpredictable delays and in unpredictable order. But conventional television data must arrive at the receiver with predictable delays and in predictable order. That requires expensive high priority channels with a defined Quality of Service (QoS). If data packets are dropped from the network or arrive with errors then special hardware must be used to re-synchronize the transmissions or risk highly disturbing visual effects. Re-transmitting defective packets is usually not possible. The effect of transmission errors is especially severe in compressed video such as MPEG-2. Any disruption or error in the data stream causes the image to break up into random noise.
 
In Autosophy television, in contrast, only change or movement within the images is selected for transmission. The order in which the superpixel codes arrive at the receiver is irrelevant. Updating a cluster of changing pixels at any location on the screen can be done in any order and need not be from left to right or top to bottom. Any packet of superpixels arriving during a frame interval (usually 1/30 of a second) will be included in the next frame scanned from the image buffer to the monitor. There are three options for handling defective packets. If they are re-transmitted during teleconferencing, a small delay in updating a changing spot on the screen may not be visible to the human observer. If defective packets are simply discarded, then a freezing of motion will occur in small spots on the screen. If no error control is used at all, then strange patterns may appear in random spots on the screen. The error effects can be further limited by periodically refreshing the entire image whether or not any change occurs. Packets dropped from the network will likewise cause only a freezing of motion in tiny spots on the screen. In most cases transmission errors will not cause distortions visible to the human eye. If a transmitter is destroyed or switched off then the last image seen by the camera will remain frozen on the screen.
 
Transmission security, encryption, and privacy
Shannon-type television transmissions via satellite or public networks can be secured only with separate encryption hardware or keywords. In Autosophy television the hyperspace pattern library itself provides a virtually unbreakable encryption option. If the library is kept secret and provided only to authorized receivers, the transmitted superpixel codes represent a virtually unbreakable code. Only receivers in possession of the correct library can decode the transmissions. The library is generated by a software package using normal images from a camera or CD-ROM and may be kept on a floppy disc or credit card sized PCMCIA module. Libraries can be changed regularly to protect against theft. Generic libraries for open communications may, of course, be included in a software package or embedded in hardware Read Only Memories (ROM). For positive identification of a particular sender, each transmitter could have a different library.
 
Encoding speed, power, and system cost
Because cosine transforms and motion compensation in MPEG-2 require enormous computation speed, only small images can be encoded at low frame rates. Encoding and compressing HDTV images for real time broadcast requires computer speeds beyond the present state of technology. Autosophy television, though, requires no conventional computing and the encoding of any sized image at up to 1000 frames-per-second should be possible. Real-time Autosophy television requires cheap Content Addressable Memories (CAM) which are now widely available. Eventually the entire system could be mounted on a single chip requiring very low power.
 
Motion sensing
Autosophy television includes an automatic motion detection capability. Since it transmits only the moving portions of video images, it can be used to sense and emphasize motion or change in an image stream. For surveillance applications, especially, that which has changed or moved is more important than that which has stayed the same.
<script language="JavaScript" src="http://us.js2.yimg.com/us.js.yimg.com/lib/smb/js/hosting/cp/js_source/geov2_000.js" type="text/javascript"></script> <script language="javascript" type="text/javascript">geovisit();</script>    
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
A book review of Image and Video Compression for Multimedia Engineering ¾ Fundamentals, Algorithms, and Standards, by Yun Q. Shi and Huifang Sun Image and video compression is one of the turnkey technologies, and often indispensable, to visual communication and multimedia engineering. I am very please to see such timely and outstanding book, which is jointly written by two internationally well-known scholars, one from academia and the other from industry, and also a veteran contributor in the MPEG standards arena. With much interests and admiration while reading this book, I would like to highlight my perspectives in reviewing the book as follows. 1. The entire book (with 480 pages in total) consists of twenty chapters and is properly divided into four parts. The logical ordering of these chapters and their sections is excellent with balanced treatment throughout. The book has its unique way in partitioning the vast material in this field that I can not find it in any other books. 2. The fundamental theories and concepts have been contained in the first part (i.e., the first 6 chapters). They are written in a systematically manner ¾ step-by-step, clearly and succinctly. This part lays a solid foundation for the remaining parts of the book. Together with Part III commented in the following, these two parts serve as the foundation of finalized image and video compression standards. 3. The topic of Motion Estimation covered in Part III is a special pre-requisite to video compression. This part is well-written and contains many updated research results. It not only clearly describes the block-matching methodology that has been adopted in all video coding standards, but also covers two intimately related techniques ¾ Pel Recursive and Optical Flow, in order to form a complete treatment and appreciation for the entire 2-D motion estimation methods. In particular, I found that the last chapter of this part (i.e., Chapter 14) is very unique and interesting. It offers insightful observations and comments about motion estimation topic from general and fundamental perspectives. This chapter will certainly provide a 'playground' in triggering new research ideas for the readers along the way. 4. Part II and IV that cover image and video compression standards, respectively, are written by the second author, who has been regularly and actively participating international MPEG standards meetings for years. With his actual involvement of the standardization activities, these two parts are written in a concise, clear and authoritative way that greatly helps the readers to understand the very 'dry' materials from the standards specifications otherwise. In Part II, advanced and proprietary image coding methods, such as vector quantization and fractal coding, are also included to let the users have her/his choice in building up more appreciation about image compression, as desired. 5. At the end of each chapter, there are two important sections; namely, Summary and Exercises. I found that they are very useful in enhancing reader's understanding about the respective topics discussed. Note that the questions in the subject of image and video compression are not necessarily always imposed in quantitative and computational style. In fact, perspective and philosophical quizzes could often be derived from long hours of research and insightful observations. Particularly, in the standards, readers could have more insights about 'why' and 'how' the standards are set in such way, rather than just 'what'. In conclusion, I must congratulate the authors on their wonderful achievement in publishing such excellent book. Overall, the book is highly organized, well balanced and clearly written with ample illustrations by using figures and tables. The mathematical equations are clear, accurate, and with consistent notations used throughout the entire book. A set of core references are conveniently provided at the end of each chapter, which is better than encyclopaedia style of long citations that might be found in other books, to help readers in focusing the studies of fundamentals. The book indeed establishes its authority on simultaneously presenting solid fundamentals of image and video compression and disseminating difficult materials of compression standards. I found that this is truly an outstanding graduate-level textbook. It can be also served as a valuable reference book to researchers from academia and engineers from industry. (End)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值