DRAWING WAVEFORMS



I get asked about drawing waveforms from time to time. Over the years, I came to realize that this is a black art of sorts, and it requires a combination of some audio and drawing know-how on the Mac to get it right.

But first, a little story.

Once upon a time I used to write audio software for BeOS while I was in university. As almost every audio software author eventually does, I came to a point where I needed to render audio waveforms to the screen. I hacked up a straightforward drawing algorithm, and it worked well.

When I started working on a follow-on project, I decided to re-use the algorithm I wrote for the first application, but it didn’t work so well. The trouble is, when I originally wrote that algorithm, the audio clips in question were all very tiny—less than 2s. Now I was dealing with much longer clips (up to a few minutes, in practice), and the algorithm didn’t scale well at all.

Around this time, I interviewed with Sonic Foundry, with the hopes of joining the Vegas team. During my interview, I asked, “How do you guys draw waveforms on-screen for large audio clips, and so quickly!?”

“That’s proprietary information, sorry.”

At the time, I just figured the guys were just avoiding a long, drawn-out response. I coded this up myself, except for the fact that it wasn’t so fast—so it can’t be that difficult, right? Unfortunately, I got similar responses from other people I had asked afterwards.

Regardless of whether you’re new to audio, or you’ve been doing it for a while, you are aware that there aren’t too many books on the topic. Furthermore, you probably aren’t going to find too much in the way of detailed algorithms, or even pseudocode, to help you out.

I’m starting to realize that the reason is two-fold.

First off, there really aren’t a lot of people out there who need to draw audio waveforms (or large data sets, for that matter) to screen. Second, it’s really not all that hard once you think about it for a while.

Overview

Drawing waveforms boils down to a few major stages: acquisition, reduction, storage, and drawing.

For each of the stages, you have many implementation options, and you’ll choose the simplest one that’ll serve your application. I don’t know what your application is, so I’ll use Capo as the main example for this post, and throw around some hypothetical situations where necessary.

Early on, you have to set some priorities: Speed, Accuracy, and Display Quality. The order of those priorities will help you decide how to build your drawing algorithm, down to the individual stages.

In Capo, I wanted to make Display Quality the top priority, followed by Speed, and then Accuracy. Because Capo would never be used to do sample-precise edits, I could throw away a whole lot of data, and then make the waveform look as good as possible in a short time frame.

If I were writing an audio editor, my priorities might be Accuracy, followed by Speed, and then Display Quality. For a sequencer (like Garage Band), I’d choose Speed, Display Quality, then Accuracy, because you’re only viewing the audio at a high level, and it’s part of a larger group of parts. Make sense?

Once you have an idea of what you need, you will have a clear picture of how to proceed.

Acquisition

This is almost worth a post of its own. I like using the ExtAudioFile{Open,Seek,Read,Close} API set from AudioToolbox.framework to open various audio file formats, but you may choose a combo of AudioFile+AudioConverter (ExtAudioFile wraps these for you), or QuickTime’s APIs, or whatever else floats your boat.

Your decision of API to get the source data is entirely up to your application. You can’t extract movie audio with (Ext)AudioFile APIs, for instance, so they might not help much when writing a video editing UI. Alternatively, you may have your own proprietary format, or record short samples into memory, etc.

Given the above, I’m going to assume you’re working with a list of floating-point values representing the audio, because that’ll be helpful later on. Using ExtAudioFile, or an AudioConverter, make sure that your host format is set for floats, and you should be good.

When you’re pulling data from a file, keep in mind that it’s not going to be very quick, even on an SSD drive, thanks to format conversions. I’d advise doing all this work in an auxiliary thread, no matter how you get your audio, because it’ll keep your application responsive.

In Capo’s case, there is a separate thread that walks the entire audio file, doing the acquisition, reduction, and storage steps all at once. Because Display Quality and Performance were high on the priority list, the drawing step is done only when needed.

Reduction

Audio contains tons of delicious data. Unfortunately, when accuracy isn’t the top priority, it’s far too much data to be shown on the screen. With 44,100 samples/second, a second of audio would span ~17 30″ Cinema Displays if you displayed one sample value per each horizontal pixel.

If accuracy is your top priority, you’re still going to be throwing lots of data away most of the time, except when your user wants to maintain a 1:1 sample:pixel ratio (or, in some cases, I’ve seen a sample take up more than 1 pixel, for very fine editing). If you’re writing an editor, or some other application that needs high-detail access to the source data, you will have to re-run the reduction step as the user changes the zoom level. When the user wants to see 1:1 samples:pixels, you won’t throw anything away. When the user wishes to see 200:1 samples:pixels, you’ll throw away 199 samples for every pixel you’re displaying.

In the case of Capo, I chose to create an overview data set for the ‘maximum zoom’ level, and keep that on the heap (a 5 minute song should take ~1MB RAM). In my case, I chose a maximum resolution of 50 samples per pixel, and created a data set from that. As the user zooms out, I then sample the overview data set to get the lower-resolution versions of the data. Accuracy isn’t great, but it’s pretty fast.

Now, when I talk about “throwing away”, or “sampling” the data set, I’m not simply discarding data. In some cases, randomly choosing samples to include in the final output will work just fine. However, you may encounter some pretty annoying artifacts (missing transients, jumping peaks, etc) when you change zoom levels or resize the display. If Display Quality is low on your list—who cares?

If you do care, you have a few options. Within each “bin” of the original audio, you can take a min/max pair, just the maximum magnitude, or an average. I have found the maximum magnitude to work well for the majority of cases. Here’s an example of what I do in Capo (in pseudocode, of sorts):

// source_audio contains the raw sample data
// overview_waveform will be filled with the 'sampled' waveform data
// N is the 'bin size' determined by the current zoom level
for ( i = 0; i < sizeof(source_audio); i += N ) {
    overview_waveform[i/N] = take_max_value_of( &(source_audio[i]), N )
}

Once you have your reduced data set, then you can put it on the screen.

Display

Here's where you have the most leeway in your implementation. I use the Quartz API to do my drawing. I prefer the family of C CoreGraphics CG* calls, because they're portable to CoreAnimation/iPhone coding, the most feature-rich, and generally quicker than their Cocoa equivalents. I won't get into any alternatives here (e.g. OpenGL), to keep it simple.

If we stick with the Capo example, then we've chosen to use the maximum magnitude data to draw our waveform. By doing so, we can exploit the fact that the waveform is going to be symmetric along the X axis, and only create one half of the final waveform path using some CGAffineTransform magic.

In the past, developers would create waveforms in pixel buffers using a series of vertical lines to represent the magnitudes of the samples. I like to call this the "traditional waveform drawing". It's still used quite a bit today, and in some cases it works great (especially when showing very small waveforms, and pixels are scarce like in a multitrack audio editor).

Traditional Waveform

I personally prefer to utilize Quartz paths so that I get some nice anti-aliasing to the waveform edge. Because Capo features the waveform so prominently in the display, I wanted to ensure I got top-notch output. Quartz paths gave me that guarantee.

To build the half-path, we'll also be exploiting the fact that both CoreAudio and Quartz represent points using floating-point values. Sadly, this code is slightly less awesome in 64-bit mode, since CGFloats become doubles, and you have to convert the single-precision audio floats over to double-precision pixels. Luckily there are quick routines for that conversion in Accelerate.framework (A whole 'nother blog post, I know...).

<

p>

- (CGPathRef)giveMeAPath
{
    // Assume mAudioPoints is a float* with your audio points 
    // (with {sampleIndex,value} pairs), and mAudioPointCount 
    // contains the # of points in the buffer.

CGMutablePathRef path = CGPathCreateMutable();
CGPathAddLines( path, NULL, mAudioPoints, mAudioPointCount ); // magic!
return path;

}

<

p>

Because magnitudes are represented in the range [0,1], and we're using Quartz, we can build a transform that'll scale the waveform path to fit inside half the height of the view, and then append another transform that'll translate/scale the path so it's flipped upside-down, and appears below the X axis line (which corresponds to a sample value of 0.0). Here's a zoomed in example of what I'm talking about.

Flipped Waveform

And here's some code to give you an idea of what's going on to create the whole path:

// Get the overview waveform data (taking into account the level of detail to
// create the reduced data set)
CGPathRef halfPath = [waveform giveMeAPath];

// Build the destination path
CGMutablePathRef path = CGPathCreateMutable();

// Transform to fit the waveform ([0,1] range) into the vertical space 
// ([halfHeight,height] range)
double halfHeight = floor( NSHeight( self.bounds ) / 2.0 );
CGAffineTransform xf = CGAffineTransformIdentity;
xf = CGAffineTransformTranslate( xf, 0.0, halfHeight );
xf = CGAffineTransformScale( xf, 1.0, halfHeight );

// Add the transformed path to the destination path
CGPathAddPath( path, &xf, halfPath );

// Transform to fit the waveform ([0,1] range) into the vertical space
// ([0,halfHeight] range), flipping the Y axis
xf = CGAffineTransformIdentity;
xf = CGAffineTransformTranslate( xf, 0.0, halfHeight );
xf = CGAffineTransformScale( xf, 1.0, -halfHeight );

// Add the transformed path to the destination path
CGPathAddPath( path, &xf, halfPath );

CGPathRelease( halfPath ); // clean up!

// Now, path contains the full waveform path.

Once you have this path, you have a bunch of options for drawing it. For instance, you could fill the path with a solid color, turn the path into a mask and draw a gradient (that's how Capo does it), etc.

Keep in mind, though, that a complex path with lots of points can be slow to draw. Be certain that you don't include more data points in your path than there are horizontal pixels on the screen—they won't be visible, anyway. If necessary, draw in a separate thread to an image, or use CoreAnimation to ensure your drawing happens asynchronously.

Use Shark/Instruments to help you decide whether this needs to be done—it's complicated work, and tough code to get working correctly with very few drawing artefacts. You don't even want to know the crazy code I had to get working in TapeDeck to have chunks of the waveform paged onto the screen. (Well, you might, but that's proprietary information, sorry. ;))

In Conclusion

People have suggested to me in the past that Apple should step up and hand us an API that would give waveform-drawing facilities (and graphs, too!). I disagree, and if Apple were to ever do this, I'd probably never use it. There are simply far too many application-specific design decisions that go into creating a waveform display engine, and whatever Apple would offer would probably only cover a small handful of use cases.

Hopefully the above information can help you build a waveform algorithm that suits your application well. I think that by breaking the problem up into separate sub-problems, you can build a solution that'll work best for your needs.

1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。 1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。 1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。
应用背景为变电站电力巡检,基于YOLO v4算法模型对常见电力巡检目标进行检测,并充分利用Ascend310提供的DVPP等硬件支持能力来完成流媒体的传输、处理等任务,并对系统性能做出一定的优化。.zip深度学习是机器学习的一个子领域,它基于人工神经网络的研究,特别是利用多层次的神经网络来进行学习和模式识别。深度学习模型能够学习数据的高层次特征,这些特征对于图像和语音识别、自然语言处理、医学图像分析等应用至关重要。以下是深度学习的一些关键概念和组成部分: 1. **神经网络(Neural Networks)**:深度学习的基础是人工神经网络,它是由多个层组成的网络结构,包括输入层、隐藏层和输出层。每个层由多个神经元组成,神经元之间通过权重连接。 2. **前馈神经网络(Feedforward Neural Networks)**:这是最常见的神经网络类型,信息从输入层流向隐藏层,最终到达输出层。 3. **卷积神经网络(Convolutional Neural Networks, CNNs)**:这种网络特别适合处理具有网格结构的数据,如图像。它们使用卷积层来提取图像的特征。 4. **循环神经网络(Recurrent Neural Networks, RNNs)**:这种网络能够处理序列数据,如时间序列或自然语言,因为它们具有记忆功能,能够捕捉数据中的时间依赖性。 5. **长短期记忆网络(Long Short-Term Memory, LSTM)**:LSTM 是一种特殊的 RNN,它能够学习长期依赖关系,非常适合复杂的序列预测任务。 6. **生成对抗网络(Generative Adversarial Networks, GANs)**:由两个网络组成,一个生成器和一个判别器,它们相互竞争,生成器生成数据,判别器评估数据的真实性。 7. **深度学习框架**:如 TensorFlow、Keras、PyTorch 等,这些框架提供了构建、训练和部署深度学习模型的工具和库。 8. **激活函数(Activation Functions)**:如 ReLU、Sigmoid、Tanh 等,它们在神经网络中用于添加非线性,使得网络能够学习复杂的函数。 9. **损失函数(Loss Functions)**:用于评估模型的预测与真实值之间的差异,常见的损失函数包括均方误差(MSE)、交叉熵(Cross-Entropy)等。 10. **优化算法(Optimization Algorithms)**:如梯度下降(Gradient Descent)、随机梯度下降(SGD)、Adam 等,用于更新网络权重,以最小化损失函数。 11. **正则化(Regularization)**:技术如 Dropout、L1/L2 正则化等,用于防止模型过拟合。 12. **迁移学习(Transfer Learning)**:利用在一个任务上训练好的模型来提高另一个相关任务的性能。 深度学习在许多领域都取得了显著的成就,但它也面临着一些挑战,如对大量数据的依赖、模型的解释性差、计算资源消耗大等。研究人员正在不断探索新的方法来解决这些问题。
深度学习是机器学习的一个子领域,它基于人工神经网络的研究,特别是利用多层次的神经网络来进行学习和模式识别。深度学习模型能够学习数据的高层次特征,这些特征对于图像和语音识别、自然语言处理、医学图像分析等应用至关重要。以下是深度学习的一些关键概念和组成部分: 1. **神经网络(Neural Networks)**:深度学习的基础是人工神经网络,它是由多个层组成的网络结构,包括输入层、隐藏层和输出层。每个层由多个神经元组成,神经元之间通过权重连接。 2. **前馈神经网络(Feedforward Neural Networks)**:这是最常见的神经网络类型,信息从输入层流向隐藏层,最终到达输出层。 3. **卷积神经网络(Convolutional Neural Networks, CNNs)**:这种网络特别适合处理具有网格结构的数据,如图像。它们使用卷积层来提取图像的特征。 4. **循环神经网络(Recurrent Neural Networks, RNNs)**:这种网络能够处理序列数据,如时间序列或自然语言,因为它们具有记忆功能,能够捕捉数据中的时间依赖性。 5. **长短期记忆网络(Long Short-Term Memory, LSTM)**:LSTM 是一种特殊的 RNN,它能够学习长期依赖关系,非常适合复杂的序列预测任务。 6. **生成对抗网络(Generative Adversarial Networks, GANs)**:由两个网络组成,一个生成器和一个判别器,它们相互竞争,生成器生成数据,判别器评估数据的真实性。 7. **深度学习框架**:如 TensorFlow、Keras、PyTorch 等,这些框架提供了构建、训练和部署深度学习模型的工具和库。 8. **激活函数(Activation Functions)**:如 ReLU、Sigmoid、Tanh 等,它们在神经网络中用于添加非线性,使得网络能够学习复杂的函数。 9. **损失函数(Loss Functions)**:用于评估模型的预测与真实值之间的差异,常见的损失函数包括均方误差(MSE)、交叉熵(Cross-Entropy)等。 10. **优化算法(Optimization Algorithms)**:如梯度下降(Gradient Descent)、随机梯度下降(SGD)、Adam 等,用于更新网络权重,以最小化损失函数。 11. **正则化(Regularization)**:技术如 Dropout、L1/L2 正则化等,用于防止模型过拟合。 12. **迁移学习(Transfer Learning)**:利用在一个任务上训练好的模型来提高另一个相关任务的性能。 深度学习在许多领域都取得了显著的成就,但它也面临着一些挑战,如对大量数据的依赖、模型的解释性差、计算资源消耗大等。研究人员正在不断探索新的方法来解决这些问题。
深度学习是机器学习的一个子领域,它基于人工神经网络的研究,特别是利用多层次的神经网络来进行学习和模式识别。深度学习模型能够学习数据的高层次特征,这些特征对于图像和语音识别、自然语言处理、医学图像分析等应用至关重要。以下是深度学习的一些关键概念和组成部分: 1. **神经网络(Neural Networks)**:深度学习的基础是人工神经网络,它是由多个层组成的网络结构,包括输入层、隐藏层和输出层。每个层由多个神经元组成,神经元之间通过权重连接。 2. **前馈神经网络(Feedforward Neural Networks)**:这是最常见的神经网络类型,信息从输入层流向隐藏层,最终到达输出层。 3. **卷积神经网络(Convolutional Neural Networks, CNNs)**:这种网络特别适合处理具有网格结构的数据,如图像。它们使用卷积层来提取图像的特征。 4. **循环神经网络(Recurrent Neural Networks, RNNs)**:这种网络能够处理序列数据,如时间序列或自然语言,因为它们具有记忆功能,能够捕捉数据中的时间依赖性。 5. **长短期记忆网络(Long Short-Term Memory, LSTM)**:LSTM 是一种特殊的 RNN,它能够学习长期依赖关系,非常适合复杂的序列预测任务。 6. **生成对抗网络(Generative Adversarial Networks, GANs)**:由两个网络组成,一个生成器和一个判别器,它们相互竞争,生成器生成数据,判别器评估数据的真实性。 7. **深度学习框架**:如 TensorFlow、Keras、PyTorch 等,这些框架提供了构建、训练和部署深度学习模型的工具和库。 8. **激活函数(Activation Functions)**:如 ReLU、Sigmoid、Tanh 等,它们在神经网络中用于添加非线性,使得网络能够学习复杂的函数。 9. **损失函数(Loss Functions)**:用于评估模型的预测与真实值之间的差异,常见的损失函数包括均方误差(MSE)、交叉熵(Cross-Entropy)等。 10. **优化算法(Optimization Algorithms)**:如梯度下降(Gradient Descent)、随机梯度下降(SGD)、Adam 等,用于更新网络权重,以最小化损失函数。 11. **正则化(Regularization)**:技术如 Dropout、L1/L2 正则化等,用于防止模型过拟合。 12. **迁移学习(Transfer Learning)**:利用在一个任务上训练好的模型来提高另一个相关任务的性能。 深度学习在许多领域都取得了显著的成就,但它也面临着一些挑战,如对大量数据的依赖、模型的解释性差、计算资源消耗大等。研究人员正在不断探索新的方法来解决这些问题。
深度学习是机器学习的一个子领域,它基于人工神经网络的研究,特别是利用多层次的神经网络来进行学习和模式识别。深度学习模型能够学习数据的高层次特征,这些特征对于图像和语音识别、自然语言处理、医学图像分析等应用至关重要。以下是深度学习的一些关键概念和组成部分: 1. **神经网络(Neural Networks)**:深度学习的基础是人工神经网络,它是由多个层组成的网络结构,包括输入层、隐藏层和输出层。每个层由多个神经元组成,神经元之间通过权重连接。 2. **前馈神经网络(Feedforward Neural Networks)**:这是最常见的神经网络类型,信息从输入层流向隐藏层,最终到达输出层。 3. **卷积神经网络(Convolutional Neural Networks, CNNs)**:这种网络特别适合处理具有网格结构的数据,如图像。它们使用卷积层来提取图像的特征。 4. **循环神经网络(Recurrent Neural Networks, RNNs)**:这种网络能够处理序列数据,如时间序列或自然语言,因为它们具有记忆功能,能够捕捉数据中的时间依赖性。 5. **长短期记忆网络(Long Short-Term Memory, LSTM)**:LSTM 是一种特殊的 RNN,它能够学习长期依赖关系,非常适合复杂的序列预测任务。 6. **生成对抗网络(Generative Adversarial Networks, GANs)**:由两个网络组成,一个生成器和一个判别器,它们相互竞争,生成器生成数据,判别器评估数据的真实性。 7. **深度学习框架**:如 TensorFlow、Keras、PyTorch 等,这些框架提供了构建、训练和部署深度学习模型的工具和库。 8. **激活函数(Activation Functions)**:如 ReLU、Sigmoid、Tanh 等,它们在神经网络中用于添加非线性,使得网络能够学习复杂的函数。 9. **损失函数(Loss Functions)**:用于评估模型的预测与真实值之间的差异,常见的损失函数包括均方误差(MSE)、交叉熵(Cross-Entropy)等。 10. **优化算法(Optimization Algorithms)**:如梯度下降(Gradient Descent)、随机梯度下降(SGD)、Adam 等,用于更新网络权重,以最小化损失函数。 11. **正则化(Regularization)**:技术如 Dropout、L1/L2 正则化等,用于防止模型过拟合。 12. **迁移学习(Transfer Learning)**:利用在一个任务上训练好的模型来提高另一个相关任务的性能。 深度学习在许多领域都取得了显著的成就,但它也面临着一些挑战,如对大量数据的依赖、模型的解释性差、计算资源消耗大等。研究人员正在不断探索新的方法来解决这些问题。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值