Audio Coding

最新推荐文章于 2015-07-11 17:28:47 发布

dg0418

最新推荐文章于 2015-07-11 17:28:47 发布

阅读量168

点赞数

原文链接：http://www.cnblogs.com/gaozehua/archive/2012/04/03/2431449.html

版权

http://www.ece.umassd.edu/Faculty/acosta/ICASSP/ICASSP_1996/html/ic96s212.htm

Audio Coding

Chair: Marina Bosi, Dolby Labs

Home

A bi-dimensional coding scheme applied to audio bitrate reduction

Authors:

Laurent Mainard, CCETT (France)
Michel Lever, CCETT (France)

Volume 2, Page 1017

Abstract:

In this paper we present an audio bidimensional encoding scheme. Taking advantage of a new complex filterbank, and of a regular lattice associated with a new hexagonal projection kernel, this scheme provides each step of the encoder and of the decoder with fast algorithms, which keeps the overall complexity low. Moreover variable or fix length encodings are available a without look-up table. Result show a very good quality at 80 kbit/s for monophonic signals, and a significant improvement with respect to normalized algorithms of a similar complexity.

Acrobat PDF file of scanned paper: ic961017.pdf

Acrobat PDF file of original paper: ic961017.pdf

TOP

Audio Coding with a Dynamic Wavelet Packet Decomposition Based on Frequency-Varying Modulated Lapped Transforms

Authors:

Marcus Purat, Technical University of Berlin (Germany)
Peter Noll, Technical University of Berlin (Germany)

Volume 2, Page 1021

Abstract:

Optimum time-frequency decompositions are very useful in audio coding applications, because the signal energy can be maximally concentrated even for the wide variety of audio signal characteristics. Moreover, this signal representation is particularly well suited for a perceptual weighting of the quantization noise. The well known tree structure of cascaded 2-channel filterbanks allows a very flexible optimization, leading to a signal adaptive, dynamic wavelet packet decomposition. A major drawback of this technique are strong spectral side lobes which produce clearly audible aliasing in perceptual coders. In this paper we present a new dynamic wavelet packet decomposition, based on modulated lapped transforms, which allows the same flexibility while avoiding the disadvantage mentioned above. We propose a scheme for low bit rate audio coding that efficiently exploits the high energy concentration. This new codec yields excellent audio quality at about 55 kb/s for monophonic signals.

Acrobat PDF file of scanned paper: ic961021.pdf

Acrobat PDF file of original paper: ic961021.pdf

Sound files associated with this paper.

0479_a.wav Piano signal prior to encoding-decoding
0479_c.wav Male speech signal prior to encoding-decoding
0479_e.wav Triangle signal prior to encoding-decoding
0479_b.wav Piano signal following encoding-decoding (54kb/s)
0479_d.wav Male speech signal following encoding-decoding (64kb/s)
0479_f.wav Triangle signal following encoding-decoding (64kb/s)

TOP

A Test of MPEG Using Time-inverted Spoken Audio

Authors:

Thomas McLaughlin, Library of Congress (U.S.A.)
John Cookson, Library of Congress (U.S.A.)
Lloyd Rasmussen, Library of Congress (U.S.A.)

Volume 2, Page 1025

Abstract:

We excerpted a 20 second sample from aDAT-mastered talking book segment and coded it at 32 and 48 kbit/sec using MPEG I, layer 3. We also coded the same segment at 80 kbit/sec using MPEG I, layer 2. We then coded a time-inverted version of the material in the same way. After decoding, we put the inverted segments back into normal sequence and compared them with the corresponding segments coded in normal temporal order. We did the comparison by means of an ABX test with volunteer listeners. Naive listeners were unable to reliably distinguish between material coded in normal temporal order and the same material coded in inverted order. Trained listeners could reliably make the distinction in layer 3 at 32 and 48 kbit/sec but not in layer 2 at 80 kbit/sec.

Acrobat PDF file of scanned paper: ic961025.pdf

TOP

Extension and Complexity Reduction of TwinVQ Audio Coder

Authors:

Takehiro Moriya, NTT Human Interface Laboratories (Japan)
Naoki Iwakami, NTT Human Interface Laboratories (Japan)
Kazunaga Ikeda, NTT Human Interface Laboratories (Japan)
Satoshi Miki, NTT Human Interface Laboratories (Japan)

Volume 2, Page 1029

Abstract:

This paper proposes two novel techniques for TwinVQ (Transform domain Weighted Interleave VQ) high-quality audio coding scheme for lower rates than 64 kbit/s. One is an extension of the weighted interleave technique to time and input channel domains as well as the frequency domain. The other is an efficient representation scheme of the spectral envelope by means of a interpolated square root LPC (Linear Predictive Coding) spectrum.

Acrobat PDF file of scanned paper: ic961029.pdf

Acrobat PDF file of original paper: ic961029.pdf

TOP

Minimising the Effects of Subband Quantisation of the Time Domain Aliasing Cancellation Filter Bank

Authors:

Conrad Jakob, Royal Melbourne Institute of Technology (Australia)
Alan Bradley, Royal Melbourne Institute of Technology (Australia)

Volume 2, Page 1033

Abstract:

The effect of the quantisation of filter bank subbands has been analysed by incorporating quantisation noise models into the Time Domain Aliasing Cancellation (TDAC) filter bank. We have found expressions for the reconstruction error of the quantised TDAC system in terms of several signal correlated components, and an uncorrelated component. These expressions allow easy identification of subjectively annoying errors, and provide the framework for a subjective optimisation of the quantisation process. Research has been carried out on alternative quantiser models and methods of quantiser-compensation.

Acrobat PDF file of scanned paper: ic961033.pdf

TOP

Speech Analysis and Coding Using a Multi-Resolution Sinusoidal Transform

Authors:

David V. Anderson, Georgia Institute of Technology (U.S.A.)

Volume 2, Page 1037

Abstract:

The sinusoidal transform, as developed by Quatieri and McAulay, provides a sparse representation for speech signals by taking advantage of psychoacoustic masking. The currently reported work takes the sinusoidal transform one step further by considering the frequency resolution abilities of the human auditory system in more detail. The new transform is based on the wavelet principle of variable resolution in time/frequency analysis. Specifically, a sinusoidal transform is developed which uses quadrature mirror filter (QMF) banks to obtain better time resolution at high frequencies and better frequency resolution at low frequencies. This naturally provides a perceptually improved allocation of the sinusoids. The new transform matches the human auditory system better than its predecessor and it also matches speech signals well, both fricative sounds and voiced speech. The QMF based ST is then shown to be equivalent to a more efficient FFT based implementation.

Acrobat PDF file of scanned paper: ic961037.pdf

Acrobat PDF file of original paper: ic961037.pdf

Sound files associated with this paper.

0809_a.wav Unprocessed speech
0809_b.wav Processed speech with 60 msec window, 4 bands, limit of 8 peaks per band
0809_c.wav Processed speech with 40 msec window, 4 bands, limit of 12 peaks per band

TOP

Audio coding using the wavelet packet transform and a combined scalar-vector quantization

Authors:

Simon Boland, Queensland University of Technology (Australia)
Mohamed Deriche, Queensland University of Technology (Australia)

Volume 2, Page 1041

Abstract:

This paper investigates a hybrid scalar-vector quantization scheme for coding high quality audio signals. A Wavelet Packet Transform (WPT) is used to decompose the audio signal into frequency bands slightly finer than the critical band divisions. A masking model computation is then used as input to the hybrid quantization scheme, where scalar quantization is used for coding the subbands from 0-5.5 kHz, and vector quantization is used for coding the subbands from 5.5-22 kHz. The performance of the proposed coder is assessed from Segmental Signal-to-Noise Ratios (SNR) and the perceived quality for a number of signals. The perceived quality is determined from informal comparisons between the uncoded signals at the original bitrate of 705 kb/s, and the same signals coded with (1) the proposed coder at 80 kb/s, (2) a coder using only scalar quantization at both 128 kb/s and 96 kb/s, and (3) the MPEG layer III coder at 64 kb/s. The comparisons indicate that very good coder quality is possible with the proposed coder at bitrates of approximately 80 kb/s. This represents a saving of about 16 kb/s over full scalar quantization with a similar quality. Further bitrate reduction with the proposed coder is possible by entropy coding of the scalar quantized transform coefficients and the VQ indices.

Acrobat PDF file of scanned paper: ic961041.pdf

TOP

Low Bit Rate High Quality Audio Coding with Combined Harmonic and Wavelet Representations

Authors:

Khaled N. Hamdy, University of Minnesota (U.S.A.)
Murtaza Ali, University of Minnesota (U.S.A.)
Ahmed H. Tewfik, University of Minnesota (U.S.A.)

Volume 2, Page 1045

Abstract:

In this paper, we describe a novel high quality audio coding method using adaptive signal representation, based on sinusoidal and wavelet analysis of signals. First, we perform a harmonic analysis of the signal to remove strong periodic structures or tones from the signal. Then we carry out wavelet analysis that are useful in tracking the transients of the signal. These transients are then removed from the wavelet coefficients. The remaining coefficients have broadband noise-like structure. Since this method separates out tones (sinusoids), transients, and broadband noise, we may use tonal, noise, and temporal masking information to individually encode the tones and the wavelet coefficients. Our experiments suggest that this method yields a nominal bit rate of 1 bit/sample for high quality audio compression.

Acrobat PDF file of scanned paper: ic961045.pdf

Acrobat PDF file of original paper: ic961045.pdf

TOP

A High Performance Software Implementation Of MPEG Audio Encoder

Authors:

Manoj Kumar, IBM T.J. Watson Research Center (U.S.A.)
Mohammad Zubair, IBM T.J. Watson Research Center (U.S.A.)

Volume 2, Page 1049

Abstract:

The MPEG/Audio is a standard for both transmitting and recording compressed audio. The MPEG algorithm achieves compression by exploiting the perceptual limitation of the human ear. The standard defines the decoding process and also the syntax of the coded bitstream. However, there is room for having different implementations to generate the compressed bitstream. In this paper we propose a high performance software implementation of the MPEG/Audio encoder. We obtained more than a factor of five improvement over a straightforward implementation on the IBM PowerPC, Model 250.

Acrobat PDF file of scanned paper: ic961049.pdf

Acrobat PDF file of original paper: ic961049.pdf

TOP

Audio Compression At Low Bit Rates Using A Signal Adaptive Switched Filterbank

Authors:

Deepen Sinha, AT&T Bell Laboratories (U.S.A.)
James D. Johnston, AT&T Bell Laboratories (U.S.A.)

Volume 2, Page 1053

Abstract:

A perceptual audio coder typically consists of a filterbank which breaks the signal into its frequency components. These components are then quantized using a perceptual masking model. Previous efforts have indicated that a high resolution filterbank, e.g., the modified discrete cosine transform (MDCT) with 1024 subbands, is able to minimize the bit rate requirements for most of the music samples. The high resolution MDCT, however, is not suitable for the encoding of non-stationary segments of music. A long/short resolution or "window" switching scheme has been employed to overcome this problem but it has certain inherent disadvantages which become prominent at lower bit rates ( < 64 kbps for stereo). We propose a novel switched filterbank scheme which switches between a MDCT and a wavelet filterbank based on signal characteristics. A tree structured wavelet filterbank with properly designed filters offers natural advantages for the representation of non-stationary segments such as attacks. Furthermore, it allows for the optimum exploitation of perceptual irrelevancies.

Acrobat PDF file of scanned paper: ic961053.pdf

Acrobat PDF file of original paper: ic961053.pdf

TOP

转载于:https://www.cnblogs.com/gaozehua/archive/2012/04/03/2431449.html

dg0418

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Audio Coding

http://www.ece.umassd.edu/Faculty/acosta/ICASSP/ICASSP_1996/html/ic96s212.htmAudio CodingChair:Marina Bosi,Dolby LabsHomeA bi-dimensional coding scheme applied to audio bitrate ...
复制链接

扫一扫