Audio mix algorithm_audio mix算法-CSDN博客

https://www.dsprelated.com/showthread/comp.dsp/27372-1.php

Dear All !!

****************************************************

Any shed of the Kowledge on this will help my me out

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I am working on the module in which i have to mix the two (audio/speech) files

Its look simple to add the each samples of the two diffrent audio file and

then write into the Mixed file.

But here comes the problem That if i simply add the two diffrent audio files

(Each samples) then there may be over flow of the range, so I decided to

divide the each sample by two and then add the data and write into the file.

what I observed that the resultant mixed wav file whcih I got has the low

volume, and this is obvious that as i am dividing the value of each sample by

two. So it is decreasing the amplitude level.

So I took another Way to mixed the audio files.

Let the two signal be A and B respectively, the range is between 0 and 255.

Y = A + B - A * B / 255

Where Y is the resultant signal which contains both signal A and B,

merging two audio streams into single stream by this method solves the

problem of overflow and information loss to an extent.

If the range of 8-bit sampling is between 127 to 128

If both A and B are negative Y = A +B - (A * B / (-127))

Else Y = A + B - A * B / 128

For n-bit sampling audio signal

If both A and B are negative Y = A + B - (A * B / (-2 pow(n-1)

- 1))

Else Y = A + B - (A * B / (2 pow(n-1))

Now the aplying the above approach I am geting the good sound qualites

for the mixing of the two audio signal.

But As I am increasing the number of the files to be mixed then I hear

some sort of disturbance (Noise) in it. Means that as the number of the

files is increased then the disturbence in the audio mixed file also

increases.

WHat is the reason behind this ??? Is there is some underlying hard ware

Problem or The qualites Of the sound recording depend on the Recording Device

??????????

I want to have some review of your views on this.

Personally what I think is that it may due to the folloing factors

1: Digital computaion error

http://www.filter-solutions.com/quant.html

2: Due to aggressinve increase of the amplitude of the mixed file,

as we go on increasing the number of the audio files. I.e higher the number

of the files the resultant values of the mixed audio fuiles will be

increased and will tend towards the higgher range i.e towards 32767 in the

case of the positive samples. and will tend towards the -32768 when the

two samples of the audio files are negative. { here I am talking about the

16 bit audio data Recorded at the 8KHz sampled }

So is there Any other approach So that I can approve my self that the Mixed

audio data is Noise Free (At most I have to Mix the 10 Audio Files).

One More queery is, what is the reason behind the distortion when the low

level recording is done and when we paly the same file. Is there any

distortion in it. ????? and in my perception we have the distortion in the

recorded and the play back of the same Audio file. For which I am stating my

views. (Correct me where ever I am wrong)

Explanation 1-->

If we have a good A/D-D/A converter also in recording and playback the

audio files, Then there comes the picture of the distortion also. we know

that the digital recording is extremely accurate due to its (S/N) high

signal to noise Ratio. Now Regarding at the low level, digital is actually

better than analog, due to its 90 dB dynamic range. The best we can get from

phonograph (Recording and Playing software/device) records is around 60 dB.

More precise is around 40 dB.

we can hear the range of the 120-plus dB. This is why recordings utilize a

lot of compression (Compressor--> a electronic device that quickly turns

the volume up when the music/speech is soft and quickly turns it down when

it is loud).

Now here comes the Picture of the compressor which compress and the term

Quickly" which means some loss of digital data at the both the ends(High

and Low). Since low level Surroundings detail are completely stripped by

digitizing when we record at the low level.

So the digitizing the low level signal lose the relevent information which

result in the distortion.

Note :

In the Sound cards Use the A/D and The D/A converter and it is involved with

the samling frequency and It is not sure that Exact sampling frequnecy is

same for the difrent sound cards which may vary and very low level. So which

also cause the Distortion at the low level.

Explainion 2-->

Now suppose If we record the audio data from the one's system(Recording

Device) at the low level volume set, in the volume control. such that a

sound recorded at the 100% low level of the recording. And when this

recorded audio file is played back at the another System at the 100% low

level of the volume control and if we dont vary the setting then it will

paly the same with out distortion

And if there is diffrence in the Volume level control setting at which it is

recorded and audio file played back will result in some sort of distortion.

Note :

If there is variance in the recorded and the played back audio files volume

control then also their will be distortion. So for the Low level Recording

and listining there will be some distoortion will be seen if we play this

low level recorded file into another system at the very high level.

Explainnation 3-->

Some software and the hard ware Use the Normalisation concept for various

algorthim used. Some normalisers are basically "Volme expaders," and some

are the "Limiters" They stretch the dynamic range of the material, the low

sounds in the original remain low and that to at their original level,

while the level of the loudest sounds is raised peak level permiitted by

the recording proccess and what eevr lies in between is raised in level

pro-portionately. (Addaptive increase), Which also cause the distortion of

the original recorded sound. Hence to hear the low volumes sounding we have

to increase the volume, to hear the lower volumes(soft volumes) parts of

audio file, Hence all the enhance signal is also plyaed causing the

distortion.

Note:

Mostaly the sound Recorded under the concept of normalisation at low level

can also cause the Distortion. Very High Music and the Speech are recorded

at the (Compressor/Expansion) Algorthim which uses the Normalisation.

One More Thing what is the Lowest and the upper limit for the recoerding of

the 16 bit data 8Khz sampling frquency so that we dont have the NOISE

for the same recoerded and the play back audio file. ???????????????

Any shed of the Kowledge on this will help my me out

Thanks In Advance

Regards

Ranjeet

Posted by Shawn Steenhagen ●December 14, 2004

"ranjeet" <ranjeet.gupta@gmail.com> wrote in message

news:77c88a3b.0412141310.1397d0ff@posting.google.com...

> [Snip]

> Let the two signal be A and B respectively, the range is between 0 and

255.

> Y = A + B - A * B / 255

> [Snip]

I do not understand what you are trying to do here, I have not seen the

approach before. But I can tell you that multiplication in time is

equivalent to convolution in frequency so the spectra of signal Y(z)

contains the spectra of A(z) and B(z) Plus the convolution of A(z)(*)B(z)

which will add noise to the final result. The more of these signals you mix

in this manner the more noise you are going to add.

Scaling by 1/2 to avoid overflow will guarantee that no y(k) result will

overflow, but at the cost of overall (on average) smaller signals. In

making scaling decisions to prevent overflow, one approach is to think of

the signals as random and look the pdfs. What is the probability that A + B

will be greater than 255? Then make a tradeoff between nice large robust

signals and the probabilty that every once in a while a signal may be

clipped and choose a scale factor somewhere between 1 (highest probability

of overflow) and 1/2 (no probability of overflow).

Also, it helps to saturate on overflow (rather than wrap around) so that the

overflow only appears as a slight distortion. (not a terribly wrong answer

with the wrong sign)

-Shawn Steenhagen

Posted by Jon Harris ●December 14, 2004

"Shawn Steenhagen" <shawn.NSsteenhagen@NSappliedsignalprocessing.com> wrote in

message news:SmJvd.731$qQ4.531@fe03.lga...

> "ranjeet" <ranjeet.gupta@gmail.com> wrote in message

> news:77c88a3b.0412141310.1397d0ff@posting.google.com...

> >

> > [Snip]

> > Let the two signal be A and B respectively, the range is between 0 and

> 255.

> >

> > Y = A + B - A * B / 255

> >

> > [Snip]

> I do not understand what you are trying to do here, I have not seen the

> approach before. But I can tell you that multiplication in time is

> equivalent to convolution in frequency so the spectra of signal Y(z)

> contains the spectra of A(z) and B(z) Plus the convolution of A(z)(*)B(z)

> which will add noise to the final result. The more of these signals you mix

> in this manner the more noise you are going to add.

> Scaling by 1/2 to avoid overflow will guarantee that no y(k) result will

> overflow, but at the cost of overall (on average) smaller signals. In

> making scaling decisions to prevent overflow, one approach is to think of

> the signals as random and look the pdfs. What is the probability that A + B

> will be greater than 255? Then make a tradeoff between nice large robust

> signals and the probabilty that every once in a while a signal may be

> clipped and choose a scale factor somewhere between 1 (highest probability

> of overflow) and 1/2 (no probability of overflow).

> Also, it helps to saturate on overflow (rather than wrap around) so that the

> overflow only appears as a slight distortion. (not a terribly wrong answer

> with the wrong sign)

Regarding the scaling, since you are working with files, you may be able to

analyze the results after mixing and then apply an appropriate scaling factor to

maximize peak value but avoid clipping. This is usually called normalization.

One simple approach would be to use a conservative scaling factor to guarantee

overflow will not occur and, as you are mixing the files, keep a running tab on

the maximum value you ever encounter. When finished, find the scaling factor G

= full_scale/max_value, where full_scale is the maximum number your wave format

can handle (probably 2^15 - 1 for 16-bit signed). Then multiply the mixed

result file by G. The result should be a file whose maximum output level is as

large as possible without clipping.

Posted by Fred Marshall ●December 15, 2004

"ranjeet" <ranjeet.gupta@gmail.com> wrote in message

news:77c88a3b.0412141310.1397d0ff@posting.google.com...

> Dear All !!

> ****************************************************

> Any shed of the Kowledge on this will help my me out

> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

> I am working on the module in which i have to mix the two (audio/speech)

> files

> Its look simple to add the each samples of the two diffrent audio file

> and

> then write into the Mixed file.

> But here comes the problem That if i simply add the two diffrent audio

> files

> (Each samples) then there may be over flow of the range, so I decided to

> divide the each sample by two and then add the data and write into the

> file.

> what I observed that the resultant mixed wav file whcih I got has the low

> volume, and this is obvious that as i am dividing the value of each

> sample by

> two. So it is decreasing the amplitude level.

The objective is to add the two files together. So far, so good.

You didn't say how the files themselves were scaled in the first place - but

it appears that their volume is adequate. Is that right?

If you add two uncorrelated files together for mixing purposes then it may

well be similar to adding to noise records together. The resulting

amplitude is an increase of sqrt(2) and not 2. So, perhaps you'd do better

to divide each by sqrt(2). Some amount of clipping is likely but may be

acceptable. Obviously what you do is dependent on your implementation and

the tools that are available.

Fred

Posted by Stephan M. Bernsee ●December 15, 2004

Hi Ranjeet,

if you distort your signal you get distortion. It's as simple as that.

I'm not quite sure how I should read your formulas. For example when

you write "Y = A + B &#2013266070; (A * B / (-2 pow(n-1) &#2013266070; 1))" what

is the *

supposed to mean? Convolution? It can't be multiplication, because you

write "-2 pow(n-1)" which contains an explicit multiplication that you

don't write using '*'. And what is the last '1' standing for?

In general, if you use a nonlinear process for mixing your signals

(which is how I *think* I can interpret your description) you are

distorting the shape of their waveforms which will add distortion

noise. The more signals you mix in this manner (and the more

non-linearly you scale them) the more noise will be introduced.

As others have already said you need to scale the N signals by 1/N in

the worst case, and if you start out with 8 bit signals you're losing a

lot of information in the process. I would recommend you convert your

signals to floating point first and do the mixing there.

You can then scale the sum later as you see fit, or better yet,

normalize so your output signal fits into the target wordlength.

Stephan M. Bernsee

http://www.dspdimension.com

Posted by Stephan M. Bernsee ●December 15, 2004

On 2004-12-15 07:47:21 +0100, Stephan M. Bernsee <spam@dspdimension.com> said:

> I'm not quite sure how I should read your formulas. [...] And what is

> the last '1' standing for?

Ah, looks like my news reader is ballsing up the formula. When I look

at it through Google groups I see that there's a minus before the '1'.

In my news reader there isn't, because you didn't use a minus but an Em

dash...!

Nevermind.

Stephan M. Bernsee

http://www.dspdimension.com

Posted by glen herrmannsfeldt ●December 15, 2004

ranjeet wrote:

> I am working on the module in which i have to mix the two (audio/speech) files

> Its look simple to add the each samples of the two diffrent audio file and

> then write into the Mixed file.

> But here comes the problem That if i simply add the two diffrent audio files

> (Each samples) then there may be over flow of the range, so I decided to

> divide the each sample by two and then add the data and write into the file.

You should add them together with one extra bit available, and then

divide by two. The difference is in rounding.

> what I observed that the resultant mixed wav file whcih I got has the low

> volume, and this is obvious that as i am dividing the value of each sample by

> two. So it is decreasing the amplitude level.

> So I took another Way to mixed the audio files.

> Let the two signal be A and B respectively, the range is between 0 and 255.

> Y = A + B &#2013266070; A * B / 255

(snip)

Don't do that. A*B is the equivalent of a modulator, with the

right sign convention a balanced modulator, but not what you want

when adding signals. This is the term that creates intermodulation

distortion in audio signals.

-- glen