DRAWING WAVEFORMS



I get asked about drawing waveforms from time to time. Over the years, I came to realize that this is a black art of sorts, and it requires a combination of some audio and drawing know-how on the Mac to get it right.

But first, a little story.

Once upon a time I used to write audio software for BeOS while I was in university. As almost every audio software author eventually does, I came to a point where I needed to render audio waveforms to the screen. I hacked up a straightforward drawing algorithm, and it worked well.

When I started working on a follow-on project, I decided to re-use the algorithm I wrote for the first application, but it didn’t work so well. The trouble is, when I originally wrote that algorithm, the audio clips in question were all very tiny—less than 2s. Now I was dealing with much longer clips (up to a few minutes, in practice), and the algorithm didn’t scale well at all.

Around this time, I interviewed with Sonic Foundry, with the hopes of joining the Vegas team. During my interview, I asked, “How do you guys draw waveforms on-screen for large audio clips, and so quickly!?”

“That’s proprietary information, sorry.”

At the time, I just figured the guys were just avoiding a long, drawn-out response. I coded this up myself, except for the fact that it wasn’t so fast—so it can’t be that difficult, right? Unfortunately, I got similar responses from other people I had asked afterwards.

Regardless of whether you’re new to audio, or you’ve been doing it for a while, you are aware that there aren’t too many books on the topic. Furthermore, you probably aren’t going to find too much in the way of detailed algorithms, or even pseudocode, to help you out.

I’m starting to realize that the reason is two-fold.

First off, there really aren’t a lot of people out there who need to draw audio waveforms (or large data sets, for that matter) to screen. Second, it’s really not all that hard once you think about it for a while.

Overview

Drawing waveforms boils down to a few major stages: acquisition, reduction, storage, and drawing.

For each of the stages, you have many implementation options, and you’ll choose the simplest one that’ll serve your application. I don’t know what your application is, so I’ll use Capo as the main example for this post, and throw around some hypothetical situations where necessary.

Early on, you have to set some priorities: Speed, Accuracy, and Display Quality. The order of those priorities will help you decide how to build your drawing algorithm, down to the individual stages.

In Capo, I wanted to make Display Quality the top priority, followed by Speed, and then Accuracy. Because Capo would never be used to do sample-precise edits, I could throw away a whole lot of data, and then make the waveform look as good as possible in a short time frame.

If I were writing an audio editor, my priorities might be Accuracy, followed by Speed, and then Display Quality. For a sequencer (like Garage Band), I’d choose Speed, Display Quality, then Accuracy, because you’re only viewing the audio at a high level, and it’s part of a larger group of parts. Make sense?

Once you have an idea of what you need, you will have a clear picture of how to proceed.

Acquisition

This is almost worth a post of its own. I like using the ExtAudioFile{Open,Seek,Read,Close} API set from AudioToolbox.framework to open various audio file formats, but you may choose a combo of AudioFile+AudioConverter (ExtAudioFile wraps these for you), or QuickTime’s APIs, or whatever else floats your boat.

Your decision of API to get the source data is entirely up to your application. You can’t extract movie audio with (Ext)AudioFile APIs, for instance, so they might not help much when writing a video editing UI. Alternatively, you may have your own proprietary format, or record short samples into memory, etc.

Given the above, I’m going to assume you’re working with a list of floating-point values representing the audio, because that’ll be helpful later on. Using ExtAudioFile, or an AudioConverter, make sure that your host format is set for floats, and you should be good.

When you’re pulling data from a file, keep in mind that it’s not going to be very quick, even on an SSD drive, thanks to format conversions. I’d advise doing all this work in an auxiliary thread, no matter how you get your audio, because it’ll keep your application responsive.

In Capo’s case, there is a separate thread that walks the entire audio file, doing the acquisition, reduction, and storage steps all at once. Because Display Quality and Performance were high on the priority list, the drawing step is done only when needed.

Reduction

Audio contains tons of delicious data. Unfortunately, when accuracy isn’t the top priority, it’s far too much data to be shown on the screen. With 44,100 samples/second, a second of audio would span ~17 30″ Cinema Displays if you displayed one sample value per each horizontal pixel.

If accuracy is your top priority, you’re still going to be throwing lots of data away most of the time, except when your user wants to maintain a 1:1 sample:pixel ratio (or, in some cases, I’ve seen a sample take up more than 1 pixel, for very fine editing). If you’re writing an editor, or some other application that needs high-detail access to the source data, you will have to re-run the reduction step as the user changes the zoom level. When the user wants to see 1:1 samples:pixels, you won’t throw anything away. When the user wishes to see 200:1 samples:pixels, you’ll throw away 199 samples for every pixel you’re displaying.

In the case of Capo, I chose to create an overview data set for the ‘maximum zoom’ level, and keep that on the heap (a 5 minute song should take ~1MB RAM). In my case, I chose a maximum resolution of 50 samples per pixel, and created a data set from that. As the user zooms out, I then sample the overview data set to get the lower-resolution versions of the data. Accuracy isn’t great, but it’s pretty fast.

Now, when I talk about “throwing away”, or “sampling” the data set, I’m not simply discarding data. In some cases, randomly choosing samples to include in the final output will work just fine. However, you may encounter some pretty annoying artifacts (missing transients, jumping peaks, etc) when you change zoom levels or resize the display. If Display Quality is low on your list—who cares?

If you do care, you have a few options. Within each “bin” of the original audio, you can take a min/max pair, just the maximum magnitude, or an average. I have found the maximum magnitude to work well for the majority of cases. Here’s an example of what I do in Capo (in pseudocode, of sorts):

// source_audio contains the raw sample data
// overview_waveform will be filled with the 'sampled' waveform data
// N is the 'bin size' determined by the current zoom level
for ( i = 0; i < sizeof(source_audio); i += N ) {
    overview_waveform[i/N] = take_max_value_of( &(source_audio[i]), N )
}

Once you have your reduced data set, then you can put it on the screen.

Display

Here's where you have the most leeway in your implementation. I use the Quartz API to do my drawing. I prefer the family of C CoreGraphics CG* calls, because they're portable to CoreAnimation/iPhone coding, the most feature-rich, and generally quicker than their Cocoa equivalents. I won't get into any alternatives here (e.g. OpenGL), to keep it simple.

If we stick with the Capo example, then we've chosen to use the maximum magnitude data to draw our waveform. By doing so, we can exploit the fact that the waveform is going to be symmetric along the X axis, and only create one half of the final waveform path using some CGAffineTransform magic.

In the past, developers would create waveforms in pixel buffers using a series of vertical lines to represent the magnitudes of the samples. I like to call this the "traditional waveform drawing". It's still used quite a bit today, and in some cases it works great (especially when showing very small waveforms, and pixels are scarce like in a multitrack audio editor).

Traditional Waveform

I personally prefer to utilize Quartz paths so that I get some nice anti-aliasing to the waveform edge. Because Capo features the waveform so prominently in the display, I wanted to ensure I got top-notch output. Quartz paths gave me that guarantee.

To build the half-path, we'll also be exploiting the fact that both CoreAudio and Quartz represent points using floating-point values. Sadly, this code is slightly less awesome in 64-bit mode, since CGFloats become doubles, and you have to convert the single-precision audio floats over to double-precision pixels. Luckily there are quick routines for that conversion in Accelerate.framework (A whole 'nother blog post, I know...).

<

p>

- (CGPathRef)giveMeAPath
{
    // Assume mAudioPoints is a float* with your audio points 
    // (with {sampleIndex,value} pairs), and mAudioPointCount 
    // contains the # of points in the buffer.

CGMutablePathRef path = CGPathCreateMutable();
CGPathAddLines( path, NULL, mAudioPoints, mAudioPointCount ); // magic!
return path;

}

<

p>

Because magnitudes are represented in the range [0,1], and we're using Quartz, we can build a transform that'll scale the waveform path to fit inside half the height of the view, and then append another transform that'll translate/scale the path so it's flipped upside-down, and appears below the X axis line (which corresponds to a sample value of 0.0). Here's a zoomed in example of what I'm talking about.

Flipped Waveform

And here's some code to give you an idea of what's going on to create the whole path:

// Get the overview waveform data (taking into account the level of detail to
// create the reduced data set)
CGPathRef halfPath = [waveform giveMeAPath];

// Build the destination path
CGMutablePathRef path = CGPathCreateMutable();

// Transform to fit the waveform ([0,1] range) into the vertical space 
// ([halfHeight,height] range)
double halfHeight = floor( NSHeight( self.bounds ) / 2.0 );
CGAffineTransform xf = CGAffineTransformIdentity;
xf = CGAffineTransformTranslate( xf, 0.0, halfHeight );
xf = CGAffineTransformScale( xf, 1.0, halfHeight );

// Add the transformed path to the destination path
CGPathAddPath( path, &xf, halfPath );

// Transform to fit the waveform ([0,1] range) into the vertical space
// ([0,halfHeight] range), flipping the Y axis
xf = CGAffineTransformIdentity;
xf = CGAffineTransformTranslate( xf, 0.0, halfHeight );
xf = CGAffineTransformScale( xf, 1.0, -halfHeight );

// Add the transformed path to the destination path
CGPathAddPath( path, &xf, halfPath );

CGPathRelease( halfPath ); // clean up!

// Now, path contains the full waveform path.

Once you have this path, you have a bunch of options for drawing it. For instance, you could fill the path with a solid color, turn the path into a mask and draw a gradient (that's how Capo does it), etc.

Keep in mind, though, that a complex path with lots of points can be slow to draw. Be certain that you don't include more data points in your path than there are horizontal pixels on the screen—they won't be visible, anyway. If necessary, draw in a separate thread to an image, or use CoreAnimation to ensure your drawing happens asynchronously.

Use Shark/Instruments to help you decide whether this needs to be done—it's complicated work, and tough code to get working correctly with very few drawing artefacts. You don't even want to know the crazy code I had to get working in TapeDeck to have chunks of the waveform paged onto the screen. (Well, you might, but that's proprietary information, sorry. ;))

In Conclusion

People have suggested to me in the past that Apple should step up and hand us an API that would give waveform-drawing facilities (and graphs, too!). I disagree, and if Apple were to ever do this, I'd probably never use it. There are simply far too many application-specific design decisions that go into creating a waveform display engine, and whatever Apple would offer would probably only cover a small handful of use cases.

Hopefully the above information can help you build a waveform algorithm that suits your application well. I think that by breaking the problem up into separate sub-problems, you can build a solution that'll work best for your needs.

内容概要:本文档详细介绍了一个利用Matlab实现Transformer-Adaboost结合的时间序列预测项目实例。项目涵盖Transformer架构的时间序列特征提取与建模,Adaboost集成方法用于增强预测性能,以及详细的模型设计思路、训练、评估过程和最终的GUI可视化。整个项目强调数据预处理、窗口化操作、模型训练及其优化(包括正则化、早停等手段)、模型融合策略和技术部署,如GPU加速等,并展示了通过多个评估指标衡量预测效果。此外,还提出了未来的改进建议和发展方向,涵盖了多层次集成学习、智能决策支持、自动化超参数调整等多个方面。最后部分阐述了在金融预测、销售数据预测等领域中的广泛应用可能性。 适合人群:具有一定编程经验的研发人员,尤其对时间序列预测感兴趣的研究者和技术从业者。 使用场景及目标:该项目适用于需要进行高质量时间序列预测的企业或机构,比如金融机构、能源供应商和服务商、电子商务公司。目标包括但不限于金融市场的波动性预测、电力负荷预估和库存管理。该系统可以部署到各类平台,如Linux服务器集群或云计算环境,为用户提供实时准确的预测服务,并支持扩展以满足更高频率的数据吞吐量需求。 其他说明:此文档不仅包含了丰富的理论分析,还有大量实用的操作指南,从项目构思到具体的代码片段都有详细记录,使用户能够轻松复制并改进这一时间序列预测方案。文中提供的完整代码和详细的注释有助于加速学习进程,并激发更多创新想法。
waveforms是一种用于下载和传输各种数据的格式。它常用于数字信号处理、音频和视频等领域。waveforms下载通常指用户从互联网上获取waveforms格式的文件。这可以通过各种方式实现。 首先,用户可以通过浏览器下载waveforms文件。用户只需在浏览器中输入相关搜索关键字,找到包含所需waveforms文件的网站或在线资源。然后,他们可以点击下载按钮或链接,将waveforms文件保存到本地计算机。这种下载方式简单方便,适用于大多数用户。 第二种下载方式是通过专门的平台或软件进行。某些专业领域可能有特定的平台或软件,用户可以通过它们下载waveforms文件。这些平台或软件可能需要用户进行注册或付费,但提供更精确、易于搜索和专业的waveforms下载功能。 此外,个别网站和论坛上可能会有用户分享自己制作的waveforms文件。用户可以在这些网站或论坛上浏览和搜索自己感兴趣的waveforms文件,并进行下载。这种方式可以促进用户之间的交流和分享,拓宽了用户获取waveforms资源的渠道。 总的来说,waveforms下载是指用户从互联网上获取waveforms格式文件的过程。用户可以通过浏览器、专门平台或软件,以及在线资源分享等方式实现下载。这种下载方式可以帮助用户获取所需的waveforms数据,支持他们进行数字信号处理、音频和视频等领域的研究和应用。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值