Audio mix algorithm

https://www.dsprelated.com/showthread/comp.dsp/27372-1.php

Dear All !!


   ****************************************************

   Any shed of the Kowledge on this will help my me out

   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  

  I am working on the module in which i have to mix the two (audio/speech) files

  Its look simple to add the each samples of the two diffrent audio file and 

  then write into the Mixed file.


  But here comes the problem That if i simply add the two diffrent audio files

  (Each samples) then there may be over flow of the range, so I decided to  

  divide the each sample by two and then add the data and write into the file.


  what I observed that the resultant mixed wav file whcih I got has the low  

  volume, and this is obvious that as i am dividing the value of each sample by 

  two. So it is decreasing the amplitude level. 


  So I took another Way to mixed the audio files.


  Let the two signal be A and B respectively, the range is between 0 and 255.

  

   Y = A  +  B - A * B / 255


  Where Y is the resultant signal which contains both signal A and B, 

  merging two audio streams into single stream by this method solves the 

  problem of overflow and information loss to an extent.


  If the range of 8-bit sampling is between 127 to 128


  If both A and B are negative       Y = A +B - (A * B / (-127)) 

  Else     Y = A + B - A * B / 128           


  For  n-bit sampling audio signal


  If both A and B are negative       Y = A + B - (A * B  /  (-2 pow(n-1)

- 1))

  Else                               Y = A + B - (A * B /  (2 pow(n-1))


  Now the aplying the above approach I am geting the good sound qualites

  for the mixing of the two audio signal.

  But As I am increasing the number of the files to be mixed then I hear

  some sort of disturbance (Noise) in it. Means that as the number of the

  files is increased then the disturbence in the audio mixed file also

  increases. 


  WHat is the reason behind this ??? Is there is some underlying hard ware 

  Problem or The qualites Of the sound recording depend on the Recording Device 

  ??????????


  I want to have some review of your views on this.


  Personally what I think is that it may due to the folloing factors


  1: Digital computaion error

     http://www.filter-solutions.com/quant.html


  2: Due to aggressinve increase of the amplitude of the mixed file, 

     as we go on increasing the number of the audio files. I.e higher the number

     of the files the resultant values of the mixed audio fuiles will be 

     increased and will tend towards the higgher range i.e towards 32767 in the 

     case of the positive samples. and will tend towards the -32768 when the 

     two samples of the audio files are negative. { here I am talking about the

     16 bit audio data Recorded at the 8KHz sampled }


  So is there Any other approach So that I can approve my self that the Mixed

  audio data is Noise Free (At most I have to Mix the 10 Audio Files).


   One More queery is, what is the reason behind the distortion when the low

   level recording is done and when we paly the same file. Is there any 

   distortion in it. ????? and in my perception we have the distortion in the 

   recorded and the play back of the same Audio file. For which I am stating my

   views. (Correct me where ever I am wrong)


Explanation 1-->


    If we have a  good A/D-D/A converter also in recording and playback the 

    audio files, Then there comes the picture of the distortion also. we know 

    that the digital recording is extremely accurate due to its (S/N) high 

    signal to noise Ratio. Now Regarding at the low level, digital is actually

    better than analog, due to its 90 dB dynamic range. The best we can get from

    phonograph (Recording and Playing software/device) records is around 60 dB. 

    More precise is around 40 dB.


    we can hear the range of the 120-plus dB.  This is why recordings utilize a

    lot of compression (Compressor-->  a electronic device that quickly turns 

    the volume up when the music/speech is soft and quickly turns it down when 

    it is loud).


    Now here comes the Picture of the compressor which compress and the term 

    Quickly" which means some loss of digital data at the both the ends(High 

    and Low). Since low level Surroundings detail are completely stripped by 

    digitizing when we record at the low level.


    So the digitizing the low level signal lose the relevent information which 

    result in the distortion.


Note :

    In the Sound cards Use the A/D and The D/A converter and it is involved with

    the samling frequency and  It is not sure that Exact sampling frequnecy is 

    same for the difrent sound cards which may vary and very low level. So which

    also cause the Distortion at the low level.


Explainion 2-->


    Now suppose If we record the audio data from the one's system(Recording 

    Device) at the low level volume set, in the volume control. such that a 

    sound recorded at the 100% low level of the recording. And when this 

    recorded audio file is played back at the another System  at the 100% low 

    level of the volume control and if we dont vary the setting then it will 

    paly the same with out distortion


    And if there is diffrence in the Volume level control setting at which it is

    recorded and audio file played back will result in some sort of distortion.


Note :


    If there is variance in the recorded and the played back audio files volume 

    control then also their will be distortion. So for the Low level Recording  

    and listining there will be some distoortion will be seen if we play this 

    low level recorded file into another system at the very high level.



Explainnation 3-->


     Some software and the hard ware Use the Normalisation concept for various

     algorthim used. Some normalisers are basically "Volme expaders," and some 

     are the "Limiters" They stretch the dynamic range of the material, the low 

     sounds in the original remain low and that to at their original level, 

     while the level of the loudest sounds is raised peak level permiitted by 

     the recording proccess  and what eevr  lies in between is raised in level 

     pro-portionately. (Addaptive increase), Which also cause the distortion of 

     the original recorded sound. Hence to hear the low volumes sounding we have

     to increase the volume, to hear the lower volumes(soft volumes) parts of 

     audio file, Hence all the enhance signal is also plyaed causing the 

     distortion.


Note:

      Mostaly the sound Recorded under the concept of normalisation at low level

      can also cause the Distortion. Very High Music and the Speech are recorded

      at the (Compressor/Expansion) Algorthim which uses the Normalisation.



  One More Thing what is the Lowest and the upper limit for the recoerding of

  the 16 bit data 8Khz sampling frquency so that we dont have the NOISE 

  for the same recoerded and the play back audio file. ???????????????


  Any shed of the Kowledge on this will help my me out

  Thanks In Advance


Regards

Ranjeet



Posted by Shawn Steenhagen ●December 14, 2004

"ranjeet" <ranjeet.gupta@gmail.com> wrote in message

news:77c88a3b.0412141310.1397d0ff@posting.google.com...

>

[Snip]

>   Let the two signal be A and B respectively, the range is between 0 and

255.

>

>    Y = A  +  B - A * B / 255

>

> [Snip]


I do not understand what you are trying to do here, I have not seen the

approach before.  But I can tell you that multiplication in time is

equivalent to convolution in frequency so the spectra of signal Y(z)

contains the spectra of A(z) and B(z) Plus the convolution of A(z)(*)B(z)

which will add noise to the final result.  The more of these signals you mix

in this manner the more noise you are going to add.


Scaling by 1/2 to avoid overflow will guarantee that no y(k) result will

overflow, but at the cost of overall (on average) smaller signals.  In

making scaling decisions to prevent overflow, one approach is to think of

the signals as random and look the pdfs.  What is the probability that A + B

will be greater than 255?  Then make a tradeoff between nice large robust

signals and the probabilty that every once in a while a signal may be

clipped and choose a scale factor somewhere between 1 (highest probability

of overflow) and 1/2 (no probability of overflow).


Also, it helps to saturate on overflow (rather than wrap around) so that the

overflow only appears as a slight distortion.  (not a terribly wrong answer

with the wrong sign)


-Shawn Steenhagen



Posted by Jon Harris ●December 14, 2004

"Shawn Steenhagen" <shawn.NSsteenhagen@NSappliedsignalprocessing.com> wrote in

message news:SmJvd.731$qQ4.531@fe03.lga...

>

> "ranjeet" <ranjeet.gupta@gmail.com> wrote in message

> news:77c88a3b.0412141310.1397d0ff@posting.google.com...

> >

> >  [Snip]

> >   Let the two signal be A and B respectively, the range is between 0 and

> 255.

> >

> >    Y = A  +  B - A * B / 255

> >

> > [Snip]

>

> I do not understand what you are trying to do here, I have not seen the

> approach before.  But I can tell you that multiplication in time is

> equivalent to convolution in frequency so the spectra of signal Y(z)

> contains the spectra of A(z) and B(z) Plus the convolution of A(z)(*)B(z)

> which will add noise to the final result.  The more of these signals you mix

> in this manner the more noise you are going to add.

>

> Scaling by 1/2 to avoid overflow will guarantee that no y(k) result will

> overflow, but at the cost of overall (on average) smaller signals.  In

> making scaling decisions to prevent overflow, one approach is to think of

> the signals as random and look the pdfs.  What is the probability that A + B

> will be greater than 255?  Then make a tradeoff between nice large robust

> signals and the probabilty that every once in a while a signal may be

> clipped and choose a scale factor somewhere between 1 (highest probability

> of overflow) and 1/2 (no probability of overflow).

>

> Also, it helps to saturate on overflow (rather than wrap around) so that the

> overflow only appears as a slight distortion.  (not a terribly wrong answer

> with the wrong sign)


Regarding the scaling, since you are working with files, you may be able to

analyze the results after mixing and then apply an appropriate scaling factor to

maximize peak value but avoid clipping.  This is usually called normalization.

One simple approach would be to use a conservative scaling factor to guarantee

overflow will not occur and, as you are mixing the files, keep a running tab on

the maximum value you ever encounter.  When finished, find the scaling factor G

= full_scale/max_value, where full_scale is the maximum number your wave format

can handle (probably 2^15 - 1 for 16-bit signed).  Then multiply the mixed

result file by G.  The result should be a file whose maximum output level is as

large as possible without clipping.


Posted by Fred Marshall ●December 15, 2004

"ranjeet" <ranjeet.gupta@gmail.com> wrote in message 

news:77c88a3b.0412141310.1397d0ff@posting.google.com...

> Dear All !!

>

>   ****************************************************

>   Any shed of the Kowledge on this will help my me out

>   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>

>  I am working on the module in which i have to mix the two (audio/speech) 

> files

>  Its look simple to add the each samples of the two diffrent audio file 

> and

>  then write into the Mixed file.

>

>  But here comes the problem That if i simply add the two diffrent audio 

> files

(Each samples) then there may be over flow of the range, so I decided to

>  divide the each sample by two and then add the data and write into the 

> file.

>

>  what I observed that the resultant mixed wav file whcih I got has the low

>  volume, and this is obvious that as i am dividing the value of each 

> sample by

>  two. So it is decreasing the amplitude level.


The objective is to add the two files together.  So far, so good.


You didn't say how the files themselves were scaled in the first place - but 

it appears that their volume is adequate.  Is that right?


If you add two uncorrelated files together for mixing purposes then it may 

well be similar to adding to noise records together.  The resulting 

amplitude is an increase of sqrt(2) and not 2.  So, perhaps you'd do better 

to divide each by sqrt(2).  Some amount of clipping is likely but may be 

acceptable.  Obviously what you do is dependent on your implementation and 

the tools that are available.


Fred 



Posted by Stephan M. Bernsee ●December 15, 2004

Hi Ranjeet,


if you distort your signal you get distortion. It's as simple as that.


I'm not quite sure how I should read your formulas. For example when 

you write "Y = A + B &#2013266070; (A * B  /  (-2 pow(n-1) &#2013266070; 1))" what

is the * 

supposed to mean? Convolution? It can't be multiplication, because you 

write "-2 pow(n-1)" which contains an explicit multiplication that you 

don't write using '*'. And what is the last '1' standing for?


In general, if you use a nonlinear process for mixing your signals 

(which is how I *think* I can interpret your description) you are 

distorting the shape of their waveforms which will add distortion 

noise. The more signals you mix in this manner (and the more 

non-linearly you scale them) the more noise will be introduced.


As others have already said you need to scale the N signals by 1/N in 

the worst case, and if you start out with 8 bit signals you're losing a 

lot of information in the process. I would recommend you convert your 

signals to floating point first and do the mixing there.

You can then scale the sum later as you see fit, or better yet, 

normalize so your output signal fits into the target wordlength.

-- 

Stephan M. Bernsee

http://www.dspdimension.com


Posted by Stephan M. Bernsee ●December 15, 2004

On 2004-12-15 07:47:21 +0100, Stephan M. Bernsee <spam@dspdimension.com> said:


> I'm not quite sure how I should read your formulas. [...] And what is 

> the last '1' standing for?


Ah, looks like my news reader is ballsing up the formula. When I look 

at it through Google groups I see that there's a minus before the '1'.


In my news reader there isn't, because you didn't use a minus but an Em 

dash...!

Nevermind.

-- 

Stephan M. Bernsee

http://www.dspdimension.com


Posted by glen herrmannsfeldt ●December 15, 2004

ranjeet wrote:


>   I am working on the module in which i have to mix the two (audio/speech) files

>   Its look simple to add the each samples of the two diffrent audio file and 

>   then write into the Mixed file.


>   But here comes the problem That if i simply add the two diffrent audio files

>   (Each samples) then there may be over flow of the range, so I decided to  

>   divide the each sample by two and then add the data and write into the file.


You should add them together with one extra bit available, and then

divide by two.  The difference is in rounding.


>   what I observed that the resultant mixed wav file whcih I got has the low  

>   volume, and this is obvious that as i am dividing the value of each sample by 

>   two. So it is decreasing the amplitude level. 

>   So I took another Way to mixed the audio files.

>   Let the two signal be A and B respectively, the range is between 0 and 255.

>   

>    Y = A  +  B &#2013266070; A * B / 255


(snip)


Don't do that.  A*B is the equivalent of a modulator, with the

right sign convention a balanced modulator, but not what you want

when adding signals.   This is the term that creates intermodulation

distortion in audio signals.


-- glen


  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值