Introduction
In this article, we'll look at how sound waves work and interact with each other, as well as how to represent waveforms in PCM WAVE format (*.wav). Then we'll build on that understanding to create a class that generates musical tones, which will allow us to create whole songs from scratch. Example code is included in the source download, which will generate clips from two of the most beautiful pieces of music ever written: Bach's "Minuet in G" and Salt -n- Peppa's "Push It."
Sound waves: the basics
When physical things interact, the interaction causes a vibration. That vibration travels through the air as a wave. If that wave has a size and shape that the ear can detect, then we call it "sound." For a more complete and scientific definition of sound, you should probably look at an encyclopedia or something, but for our purposes, we'll define sound as any physical vibration that you can hear.
Different types of sound waves
Arguably the simplest type of sound wave is the sine wave. It generates a pure tone, which sounds kind of like a tuning fork or an old Atari video game. Different wave shapes produce different tones, even if they're the same pitch. For example, you can hear the difference between a xylophone and a clarinet, even if they're both playing middle C. Here are some common simple wave shapes and their names:
Sine Wave | Square Wave |
Sawtooth Wave | Dead Fish Wave |
How a speaker works (more or less)
In order to understand how we're going to generate sound, you first need to understand roughly how a speaker works. I'll give a very simple overview here, but if you want a little more depth, you should check out this article at HowStuffWorks.com.
A speaker is made up of a movable cone called a diaphragm (shown in black), mounted in a stationary container called a basket (shown in gray). The diaphragm is fitted with an electromagnet at its base, and the basket is fitted with a permanent magnet. By applying an electrical charge on the electromagnet, we can make it repel the permanent magnet. Conversely, by reversing the charge, we can make it attract the permanent magnet. In short, we can make the diaphragm move forward and backward. But how does this create sound? Let's look at another example.
In this example, we're feeding a sound wave to the speaker. The hardware converts the sound wave to an electrical signal so that at the peak of a sound wave, the speaker diaphragm is pushed out. Similarly, at the bottom of a wave, the diaphragm is pulled in. When the motion of the diaphragm follows the pattern of the recorded sound wave, it creates a vibration in the air which very closely matches the recorded wave. That vibration ends up being the sound that we hear. This is, more or less, how a speaker works.
Wave file internals
Okay, now that we have a rough understanding of how sound works, let's try making our own. We'll be creating PCM wave files for this purpose because it's a very easy-to-understand format. A wave file is made up of two segments: the header segment and the data segment. The header contains information about the file's sample rate, bits per sample, number of channels (mono or stereo) and is well-documented. My code handles the formatting of the wave header for you, so we'll be focusing mostly on the data section in this article.
Let's look at how a wave file stores its data. We'll start with an actual sound wave:
In order to save this as a wave file, we slice the wave up into a bunch of little segments. We then take a sample of the wave position in each of the segments, like this:
We save each of the positions in order. Then when we read them out again, we can build a very close approximation of the original sound wave. It will look something like this:
Each sample position is stored as a number. For CD-quality audio, samples are stored as 16 bit numbers. This means that a sample at the very top of the graph will be 32,767, a sample at the bottom of the graph will be -32,768 and a sample in the middle will be 0. In addition, the smaller the slices we create, the more samples we'll end up with, and so the better the reproduction of the original wave will be. CD quality audio stores 44,100 samples per second. So if we were to create a 1-minute CD-quality wave file, it would be made up of the wave header followed by 2,646,000 16-bit numbers.
A note about stereo
All of the numbers listed so far have been for a single channel, which is to say, "mono." Wave files can also contain stereo sound by saving two separate channels, one for the left and one for the right. Of course, storing twice as many channels requires twice as much space, so one minute of full CD quality audio really requires 5,292,000 16-bit numbers, 2,646,000 for the left channel, and 2,646,000 for the right. This works out to about 10 megabytes per minute. Wave files are big.
Using the code
The tone of a sound wave is determined by its frequency, which is the number of times per second that it completes an up-and-down cycle. For example, middle C has a frequency of about 261.626 Hz or cycles per second. As we already know, CD-quality audio uses 44,100 samples per second. So, if we want to generate a CD-quality middle C, we have to create a waveform that repeats itself every 169 samples. 44,100 samples per second / 261.626 cycles per second = 168.56 samples per cycle. The following code will show you how to generate a 5-second middle C using the WaveWriter16
class:
using (WaveWriter16Bit writer = new WaveWriter16Bit(new
FileStream("c://test.wav", FileMode.Create), 44100, false))
{
double frequency = 261.626; // Middle C
double samplesPerCycle = writer.SampleRate / frequency;
int totalSamples = 5 * writer.SampleRate; // 5 seconds
Sample16Bit sample = new Sample16Bit(false);
for (int currentSample = 0;
currentSample < totalSamples; currentSample++)
{
// Set the maximum height of the wave
short amplitude = 32000;
// Generate a point on the sine wave
double sampleValue = Math.Sin(currentSample /
samplesPerCycle * 2 * Math.PI) * amplitude;
// cast the sample to a 16-bit integer
sample.LeftChannel = (short)sampleValue;
// save the sample
writer.Write(sample);
}
}
That may seem like a lot of work just to generate a single tone. Luckily, all that functionality is wrapped up in the SongWriter
class. Using the SongWriter
class, we can easily generate entire songs. Here's how to generate the first part of "Twinkle, Twinkle, Little Star:"
using (SongWriter writer = new SongWriter(new
FileStream("c://test.wav", FileMode.Create), 90))
{
writer.AddNote(Tones.C4, 1); // Twin-
writer.AddNote(Tones.C4, 1); // kle,
writer.AddNote(Tones.G4, 1); // twin-
writer.AddNote(Tones.G4, 1); // kle,
writer.AddNote(Tones.A4, 1); // Lit-
writer.AddNote(Tones.A4, 1); // tle
writer.AddNote(Tones.G4, 2); // Star,
writer.AddNote(Tones.F4, 1); // How
writer.AddNote(Tones.F4, 1); // I
writer.AddNote(Tones.E4, 1); // won-
writer.AddNote(Tones.E4, 1); // der
writer.AddNote(Tones.D4, 1); // what
writer.AddNote(Tones.D4, 1); // you
writer.AddNote(Tones.C4, 2); // are.
}
Now before we continue, think about what we've just done here. Essentially, we've taken fine-grained control over our computer's speakers and we've manually positioned them (44,100 times per second) to create a physical vibration that, when it reaches our ears, sounds like "Twinkle, Twinkle, Little Star." We've just taken some elementary physics and trigonometry, and we've used them to create a children's song out of nothing at all. That's pretty cool.
But what if I want to play chords?
We've seen how to generate a sound wave that produces a tone, but what if we want to produce multiple tones at the same time? How can we generate multiple overlapping sound waves in a single wave file? Well, here's an interesting piece of acoustics trivia. Any two individual sound waves can be combined into a single sound wave that encompasses both of them. Let's have a look at what I mean. Let's say that you want to play a C note and a G note together. The two sound waves will look something like this:
Now, let's slice them up and sample their positions:
Notice that each slice has a sample for the first wave and one for the second wave. Now, for each slice, we just add the two samples together. It'll create an entirely new wave that sounds just like the two original waves combined. The resulting wave comes out looking like this:
It's weird-looking, but when you play it, it'll sound just like the two original sounds played together. Once again, the SongWriter
class does the heavy lifting for you as far as combining the notes, so you don't have to sweat the details. If you wanted to generate the melody of the 3 Stooges' "Hello" song, it'd look like this:
using (SongWriter writer = new SongWriter(new
FileStream("c://stooges.wav", FileMode.Create), 180))
{
writer.DefaultVolume = 5000;
// First voice
writer.AddNote(Tones.C4, 1); // Hel-
writer.AddNote(Tones.C4, 15); // lo...
// Jump back to where the second voice comes in
writer.CurrentBeat = 4;
// Second voice
writer.AddNote(Tones.E4, 1); // Hel-
writer.AddNote(Tones.E4, 11); // lo...
// Jump back to where the third voice comes in
writer.CurrentBeat = 8;
// Third voice
writer.AddNote(Tones.G4, 1); // Hel-
writer.AddNote(Tones.G4, 7); // lo!!
}
A word of caution
Every time you add another voice to the wave, the maximum height of the wave increases. Remember that for a 16-bit wave, the highest valid point is 32,767 and the lowest valid point is -32,768. So, if adding a wave will result in a wave which has a point higher than the maximum or lower than the minimum, an overflow condition will occur. The resulting wave will sound terrible. So, choose your volume carefully to make sure that your wave always stays within the 16-bit boundaries.
Questions
What practical value does any of this have?
Apart from impressing the ladies, it probably has no practical value at all. Also, in the interest of full disclosure, I'll admit that it doesn't even impress the ladies very much. It's slow, processor-intensive, and the resulting music sounds much more like a stylized test of the Emergency Broadcast System than it does a piano. However, it's a very interesting application of the science of acoustics. It's also an opportunity to understand a little more about sound waves and sound files, and above all, it's just kind of cool. That right there is justification enough, as far as I'm concerned.
Can I write a wave file to a memory stream and then play it in-memory?
Yes. You have to be careful how you go about it, though. A wave file won't play until its header gets written. The WaveWriter class doesn't write the header information until you call Close()
or Dispose()
on it. By default, closing a WaveWriter object also closes its underlying stream. In the case of a MemoryStream, this renders it unusable. So, what we need is a way to dispose of the WaveWriter or SongWriter object without disposing of the stream that it has been writing to. In the latest version of the library, I've added constructor overloads to the WaveWriter and SongWriter classes. These allow you to specify that the underlying stream should not be closed. Here's a simple example of how to play middle C from a MemoryStream:
using (MemoryStream stream = new MemoryStream())
{
using (SongWriter writer = new SongWriter(
stream, // the MemoryStream to write to
60, // the tempo (60 bpm)
false)) // don't close the underlying stream when this
// object is disposed
{
writer.AddNote(Tones.C4, 4); // Write out middle C for 4 beats
}
// jump back to the beginning of the stream
stream.Position = 0;
// play the wave data in the MemoryStream
using (System.Media.SoundPlayer player =
new System.Media.SoundPlayer(stream))
{
player.PlaySync();
}
}
What's with the "Dead Fish Wave?"
Okay, so the Dead Fish Wave is not generally accepted as one of the common simple waves. It's actually something that my mom made up and drew freehand when I called her and told her about what I was working on. I haven't yet converted it to an actual sound file, but it's on my list of things to do. It's worth mentioning, though, that I have no reason to believe that the resulting sound will at all resemble the sound of an actual dead fish.
Where should I go for more information?
Of course, Google and Wikipedia are both good starting points. However, if you're looking to jump in head-first, I suggest that you read "The Physics and Psychophysics of Music: An Introduction" by Juan G. Roederer. Unless your local bookstore is much cooler than mine, you'll have to special order it. It's well worth it, though, if you're looking to sink your teeth in.
Do you have the samples from "Minuet in G" and "Push It" available for download?
Sure. If you don't want to download and build the code yourself, you can download the clips in MP3 format. Here's Minuet in G and here's Push It. Enjoy.