Audio Coding

 

http://www.ece.umassd.edu/Faculty/acosta/ICASSP/ICASSP_1996/html/ic96s212.htm

Audio Coding

Chair: Marina Bosi, Dolby Labs

 Home


A bi-dimensional coding scheme applied to audio bitrate reduction

Authors:

Laurent MainardCCETT (France) 
Michel LeverCCETT (France)

Volume 2, Page 1017
Abstract:

In this paper we present an audio bidimensional encoding scheme. Taking advantage of a new complex filterbank, and of a regular lattice associated with a new hexagonal projection kernel, this scheme provides each step of the encoder and of the decoder with fast algorithms, which keeps the overall complexity low. Moreover variable or fix length encodings are available a without look-up table. Result show a very good quality at 80 kbit/s for monophonic signals, and a significant improvement with respect to normalized algorithms of a similar complexity.

Acrobat PDF file of scanned paper:  ic961017.pdf
Acrobat PDF file of original paper:  ic961017.pdf
 TOP


Audio Coding with a Dynamic Wavelet Packet Decomposition Based on Frequency-Varying Modulated Lapped Transforms

Authors:

Marcus PuratTechnical University of Berlin (Germany) 
Peter NollTechnical University of Berlin (Germany)

Volume 2, Page 1021
Abstract:

Optimum time-frequency decompositions are very useful in audio coding applications, because the signal energy can be maximally concentrated even for the wide variety of audio signal characteristics. Moreover, this signal representation is particularly well suited for a perceptual weighting of the quantization noise. The well known tree structure of cascaded 2-channel filterbanks allows a very flexible optimization, leading to a signal adaptive, dynamic wavelet packet decomposition. A major drawback of this technique are strong spectral side lobes which produce clearly audible aliasing in perceptual coders. In this paper we present a new dynamic wavelet packet decomposition, based on modulated lapped transforms, which allows the same flexibility while avoiding the disadvantage mentioned above. We propose a scheme for low bit rate audio coding that efficiently exploits the high energy concentration. This new codec yields excellent audio quality at about 55 kb/s for monophonic signals.

Acrobat PDF file of scanned paper:  ic961021.pdf
Acrobat PDF file of original paper:  ic961021.pdf
Sound files associated with this paper.
  •  0479_a.wav Piano signal prior to encoding-decoding
  •  0479_c.wav Male speech signal prior to encoding-decoding
  •  0479_e.wav Triangle signal prior to encoding-decoding
  •  0479_b.wav Piano signal following encoding-decoding (54kb/s)
  •  0479_d.wav Male speech signal following encoding-decoding (64kb/s)
  •  0479_f.wav Triangle signal following encoding-decoding (64kb/s)
 TOP


A Test of MPEG Using Time-inverted Spoken Audio

Authors:

Thomas McLaughlinLibrary of Congress (U.S.A.) 
John CooksonLibrary of Congress (U.S.A.) 
Lloyd RasmussenLibrary of Congress (U.S.A.)

Volume 2, Page 1025
Abstract:

We excerpted a 20 second sample from aDAT-mastered talking book segment and coded it at 32 and 48 kbit/sec using MPEG I, layer 3. We also coded the same segment at 80 kbit/sec using MPEG I, layer 2. We then coded a time-inverted version of the material in the same way. After decoding, we put the inverted segments back into normal sequence and compared them with the corresponding segments coded in normal temporal order. We did the comparison by means of an ABX test with volunteer listeners. Naive listeners were unable to reliably distinguish between material coded in normal temporal order and the same material coded in inverted order. Trained listeners could reliably make the distinction in layer 3 at 32 and 48 kbit/sec but not in layer 2 at 80 kbit/sec.

Acrobat PDF file of scanned paper:  ic961025.pdf
 TOP


Extension and Complexity Reduction of TwinVQ Audio Coder

Authors:

Takehiro MoriyaNTT Human Interface Laboratories (Japan) 
Naoki IwakamiNTT Human Interface Laboratories (Japan) 
Kazunaga IkedaNTT Human Interface Laboratories (Japan) 
Satoshi MikiNTT Human Interface Laboratories (Japan)

Volume 2, Page 1029
Abstract:

This paper proposes two novel techniques for TwinVQ (Transform domain Weighted Interleave VQ) high-quality audio coding scheme for lower rates than 64 kbit/s. One is an extension of the weighted interleave technique to time and input channel domains as well as the frequency domain. The other is an efficient representation scheme of the spectral envelope by means of a interpolated square root LPC (Linear Predictive Coding) spectrum.

Acrobat PDF file of scanned paper:  ic961029.pdf
Acrobat PDF file of original paper:  ic961029.pdf
 TOP


Minimising the Effects of Subband Quantisation of the Time Domain Aliasing Cancellation Filter Bank

Authors:

Conrad JakobRoyal Melbourne Institute of Technology (Australia) 
Alan BradleyRoyal Melbourne Institute of Technology (Australia)

Volume 2, Page 1033
Abstract:

The effect of the quantisation of filter bank subbands has been analysed by incorporating quantisation noise models into the Time Domain Aliasing Cancellation (TDAC) filter bank. We have found expressions for the reconstruction error of the quantised TDAC system in terms of several signal correlated components, and an uncorrelated component. These expressions allow easy identification of subjectively annoying errors, and provide the framework for a subjective optimisation of the quantisation process. Research has been carried out on alternative quantiser models and methods of quantiser-compensation.

Acrobat PDF file of scanned paper:  ic961033.pdf
 TOP


Speech Analysis and Coding Using a Multi-Resolution Sinusoidal Transform

Authors:

David V. AndersonGeorgia Institute of Technology (U.S.A.)

Volume 2, Page 1037
Abstract:

The sinusoidal transform, as developed by Quatieri and McAulay, provides a sparse representation for speech signals by taking advantage of psychoacoustic masking. The currently reported work takes the sinusoidal transform one step further by considering the frequency resolution abilities of the human auditory system in more detail. The new transform is based on the wavelet principle of variable resolution in time/frequency analysis. Specifically, a sinusoidal transform is developed which uses quadrature mirror filter (QMF) banks to obtain better time resolution at high frequencies and better frequency resolution at low frequencies. This naturally provides a perceptually improved allocation of the sinusoids. The new transform matches the human auditory system better than its predecessor and it also matches speech signals well, both fricative sounds and voiced speech. The QMF based ST is then shown to be equivalent to a more efficient FFT based implementation.

Acrobat PDF file of scanned paper:  ic961037.pdf
Acrobat PDF file of original paper:  ic961037.pdf
Sound files associated with this paper.
  •  0809_a.wav Unprocessed speech
  •  0809_b.wav Processed speech with 60 msec window, 4 bands, limit of 8 peaks per band
  •  0809_c.wav Processed speech with 40 msec window, 4 bands, limit of 12 peaks per band
 TOP


Audio coding using the wavelet packet transform and a combined scalar-vector quantization

Authors:

Simon BolandQueensland University of Technology (Australia) 
Mohamed DericheQueensland University of Technology (Australia)

Volume 2, Page 1041
Abstract:

This paper investigates a hybrid scalar-vector quantization scheme for coding high quality audio signals. A Wavelet Packet Transform (WPT) is used to decompose the audio signal into frequency bands slightly finer than the critical band divisions. A masking model computation is then used as input to the hybrid quantization scheme, where scalar quantization is used for coding the subbands from 0-5.5 kHz, and vector quantization is used for coding the subbands from 5.5-22 kHz. The performance of the proposed coder is assessed from Segmental Signal-to-Noise Ratios (SNR) and the perceived quality for a number of signals. The perceived quality is determined from informal comparisons between the uncoded signals at the original bitrate of 705 kb/s, and the same signals coded with (1) the proposed coder at 80 kb/s, (2) a coder using only scalar quantization at both 128 kb/s and 96 kb/s, and (3) the MPEG layer III coder at 64 kb/s. The comparisons indicate that very good coder quality is possible with the proposed coder at bitrates of approximately 80 kb/s. This represents a saving of about 16 kb/s over full scalar quantization with a similar quality. Further bitrate reduction with the proposed coder is possible by entropy coding of the scalar quantized transform coefficients and the VQ indices.

Acrobat PDF file of scanned paper:  ic961041.pdf
 TOP


Low Bit Rate High Quality Audio Coding with Combined Harmonic and Wavelet Representations

Authors:

Khaled N. HamdyUniversity of Minnesota (U.S.A.) 
Murtaza AliUniversity of Minnesota (U.S.A.) 
Ahmed H. TewfikUniversity of Minnesota (U.S.A.)

Volume 2, Page 1045
Abstract:

In this paper, we describe a novel high quality audio coding method using adaptive signal representation, based on sinusoidal and wavelet analysis of signals. First, we perform a harmonic analysis of the signal to remove strong periodic structures or tones from the signal. Then we carry out wavelet analysis that are useful in tracking the transients of the signal. These transients are then removed from the wavelet coefficients. The remaining coefficients have broadband noise-like structure. Since this method separates out tones (sinusoids), transients, and broadband noise, we may use tonal, noise, and temporal masking information to individually encode the tones and the wavelet coefficients. Our experiments suggest that this method yields a nominal bit rate of 1 bit/sample for high quality audio compression.

Acrobat PDF file of scanned paper:  ic961045.pdf
Acrobat PDF file of original paper:  ic961045.pdf
 TOP


A High Performance Software Implementation Of MPEG Audio Encoder

Authors:

Manoj KumarIBM T.J. Watson Research Center (U.S.A.) 
Mohammad ZubairIBM T.J. Watson Research Center (U.S.A.)

Volume 2, Page 1049
Abstract:

The MPEG/Audio is a standard for both transmitting and recording compressed audio. The MPEG algorithm achieves compression by exploiting the perceptual limitation of the human ear. The standard defines the decoding process and also the syntax of the coded bitstream. However, there is room for having different implementations to generate the compressed bitstream. In this paper we propose a high performance software implementation of the MPEG/Audio encoder. We obtained more than a factor of five improvement over a straightforward implementation on the IBM PowerPC, Model 250.

Acrobat PDF file of scanned paper:  ic961049.pdf
Acrobat PDF file of original paper:  ic961049.pdf
 TOP


Audio Compression At Low Bit Rates Using A Signal Adaptive Switched Filterbank

Authors:

Deepen SinhaAT&T Bell Laboratories (U.S.A.) 
James D. JohnstonAT&T Bell Laboratories (U.S.A.)

Volume 2, Page 1053
Abstract:

A perceptual audio coder typically consists of a filterbank which breaks the signal into its frequency components. These components are then quantized using a perceptual masking model. Previous efforts have indicated that a high resolution filterbank, e.g., the modified discrete cosine transform (MDCT) with 1024 subbands, is able to minimize the bit rate requirements for most of the music samples. The high resolution MDCT, however, is not suitable for the encoding of non-stationary segments of music. A long/short resolution or "window" switching scheme has been employed to overcome this problem but it has certain inherent disadvantages which become prominent at lower bit rates ( < 64 kbps for stereo). We propose a novel switched filterbank scheme which switches between a MDCT and a wavelet filterbank based on signal characteristics. A tree structured wavelet filterbank with properly designed filters offers natural advantages for the representation of non-stationary segments such as attacks. Furthermore, it allows for the optimum exploitation of perceptual irrelevancies.

Acrobat PDF file of scanned paper:  ic961053.pdf
Acrobat PDF file of original paper:  ic961053.pdf
 TOP

转载于:https://www.cnblogs.com/gaozehua/archive/2012/04/03/2431449.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值