From iPod Library to PCM Samples in Far Fewer Steps Than Were Previously

最新推荐文章于 2020-09-16 10:36:47 发布

directdraw

最新推荐文章于 2020-09-16 10:36:47 发布

阅读量3.8k

点赞数

文章标签： library dictionary file application output audio

http://www.subfurther.com/blog/2010/12/13/from-ipod-library-to-pcm-samples-in-far-fewer-steps-than-were-previously-necessary/

In a July blog entry, I showed a gruesome technique for getting raw PCM samples of audio from your iPod library, by means of an easily-overlooked metadata attribute in the Media Library framework, along with the export functionality of AV Foundation. The AV Foundation stuff was the gruesome part — with no direct means for sample-level access to the song “asset”, it required an intermedia export to.m4a, which was a lossy re-encode if the source was of a different format (like MP3), and then a subsequent conversion to PCM with Core Audio.

Please feel free to forget all about that approach… except for the Core Media timescale stuff, which you’ll surely see again before too long.

iOS 4.1 added a number of new classes to AV Foundation (indeed, these were among the most significant 4.1 API diffs) to provide an API for sample-level access to media. The essential classes areAVAssetReader and AVAssetWriter. Using these, we can dramatically simplify and improve the iPod converter.

I have an example project, VTM_AViPodReader.zip (70 KB) that was originally meant to be part of my session at the Voices That MatteriPhone conference in Philadelphia, but didn’t come together in time. I’m going to skip the UI stuff in this blog, and leave you to a screenshot and a simple description: tap “choose song”, pick something from your iPod library, tap “done”, and tap “Convert”.

To do the conversion, we’ll use an AVAssetReader to read from the original song file, and an AVAssetWriter to perform the conversion and write to a new file in our application’s Documents directory.

Start, as in the previous example, by using thevalueForProperty:MPMediaItemPropertyAssetURL attribute to get an NSURL representing the song in a format compatible with AV Foundation.



-(IBAction) convertTapped: (id) sender {
	// set up an AVAssetReader to read from the iPod Library
	NSURL *assetURL = [song valueForProperty:MPMediaItemPropertyAssetURL];
	AVURLAsset *songAsset =
		[AVURLAsset URLAssetWithURL:assetURL options:nil];

	NSError *assetError = nil;
	AVAssetReader *assetReader =
		[[AVAssetReader assetReaderWithAsset:songAsset
			   error:&assetError]
		  retain];
	if (assetError) {
		NSLog (@"error: %@", assetError);
		return;
	}

Sorry about the dangling retains. I’ll explain those in a little bit (and yes, you could use the alloc/init equivalents… I’m making a point here…). Anyways, it’s simple enough to take an AVAssetand make an AVAssetReader from it.

But what do you do with that? Contrary to what you might think, you don’t just read from it directly. Instead, you create another object, an AVAssetReaderOutput, which is able to produce samples from an AVAssetReader.


AVAssetReaderOutput *assetReaderOutput =
	[[AVAssetReaderAudioMixOutput
	  assetReaderAudioMixOutputWithAudioTracks:songAsset.tracks
				audioSettings: nil]
	retain];
if (! [assetReader canAddOutput: assetReaderOutput]) {
	NSLog (@"can't add reader output... die!");
	return;
}
[assetReader addOutput: assetReaderOutput];

AVAssetReaderOutput is abstract. Since we’re only interested in the audio from this asset, a AVAssetReaderAudioMixOutput will suit us fine. For reading samples from an audio/video file, like a QuickTime movie, we’d want AVAssetReaderVideoCompositionOutput instead. An important point here is that we set audioSettings to nil to get a generic PCM output. The alternative is to provide anNSDictionary specifying the format you want to receive; I ended up doing that later in the output step, so the default PCM here will be fine.

That’s all we need to worry about for now for reading from the song file. Now let’s start dealing with writing the converted file. We start by setting up an output file… the only important thing to know here is that AV Foundation won’t overwrite a file for you, so you should delete the exported.caf if it already exists.


NSArray *dirs = NSSearchPathForDirectoriesInDomains
				(NSDocumentDirectory, NSUserDomainMask, YES);
NSString *documentsDirectoryPath = [dirs objectAtIndex:0];
NSString *exportPath = [[documentsDirectoryPath
				 stringByAppendingPathComponent:EXPORT_NAME]
				retain];
if ([[NSFileManager defaultManager] fileExistsAtPath:exportPath]) {
	[[NSFileManager defaultManager] removeItemAtPath:exportPath
		error:nil];
}
NSURL *exportURL = [NSURL fileURLWithPath:exportPath];

Yeah, there’s another spurious retain here. I’ll explain later. For now, let’s take exportURL and create the AVAssetWriter:


AVAssetWriter *assetWriter =
	[[AVAssetWriter assetWriterWithURL:exportURL
		  fileType:AVFileTypeCoreAudioFormat
			 error:&assetError]
	  retain];
if (assetError) {
	NSLog (@"error: %@", assetError);
	return;
}

OK, no sweat there, but the AVAssetWriter isn’t really the important part. Just as the reader is paired with “reader output” objects, so too is the writer connected to “writer input” objects, which is what we’ll be providing samples to, in order to write them to the filesystem.

To create the AVAssetWriterInput, we provide an NSDictionarydescribing the format and contents we want to create… this is analogous to a step we skipped earlier to specify the format we receive from the AVAssetReaderOutput. The dictionary keys are defined in AVAudioSettings.h and AVVideoSettings.h. You may find you need to look in these header files to look for the value types to provide for these keys, and in some cases, they’ll point you to the Core Audio header files. Trial and error led me to ultimately specify all of the fields that would be encountered in aAudioStreamBasicDescription, along with anAudioChannelLayout structure, which needs to be wrapped in anNSData in order to be added to an NSDictionary



AudioChannelLayout channelLayout;
memset(&channelLayout, 0, sizeof(AudioChannelLayout));
channelLayout.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo;
NSDictionary *outputSettings =
[NSDictionary dictionaryWithObjectsAndKeys:
	[NSNumber numberWithInt:kAudioFormatLinearPCM], AVFormatIDKey,
	[NSNumber numberWithFloat:44100.0], AVSampleRateKey,
	[NSNumber numberWithInt:2], AVNumberOfChannelsKey,
	[NSData dataWithBytes:&channelLayout length:sizeof(AudioChannelLayout)],
		AVChannelLayoutKey,
	[NSNumber numberWithInt:16], AVLinearPCMBitDepthKey,
	[NSNumber numberWithBool:NO], AVLinearPCMIsNonInterleaved,
	[NSNumber numberWithBool:NO],AVLinearPCMIsFloatKey,
	[NSNumber numberWithBool:NO], AVLinearPCMIsBigEndianKey,
	nil];

With this dictionary describing 44.1 KHz, stereo, 16-bit, non-interleaved, little-endian integer PCM, we can create anAVAssetWriterInput to encode and write samples in this format.


AVAssetWriterInput *assetWriterInput =
	[[AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeAudio
				outputSettings:outputSettings]
	retain];
if ([assetWriter canAddInput:assetWriterInput]) {
	[assetWriter addInput:assetWriterInput];
} else {
	NSLog (@"can't add asset writer input... die!");
	return;
}
assetWriterInput.expectsMediaDataInRealTime = NO;

Notice that we’ve set the propertyassetWriterInput.expectsMediaDataInRealTime to NO. This will allow our transcode to run as fast as possible; of course, you’d set this to YES if you were capturing or generating samples in real-time.

Now that our reader and writer are ready, we signal that we’re ready to start moving samples around:


[assetWriter startWriting];
[assetReader startReading];
AVAssetTrack *soundTrack = [songAsset.tracks objectAtIndex:0];
CMTime startTime = CMTimeMake (0, soundTrack.naturalTimeScale);
[assetWriter startSessionAtSourceTime: startTime];

These calls will allow us to start reading from the reader and writing to the writer… but just how do we do that? The key is theAVAssetReaderOutput method copyNextSampleBuffer. This call produces a Core Media CMSampleBufferRef, which is what we need to provide to the AVAssetWriterInput‘s appendSampleBuffermethod.

But this is where it starts getting tricky. We can’t just drop into awhile loop and start copying buffers over. We have to be explicitly signaled that the writer is able to accept input. We do this by providing a block to the asset writer’srequestMediaDataWhenReadyOnQueue:usingBlock. Once we do this, our code will continue on, while the block will be called asynchronously by Grand Central Dispatch periodically. This explains the earlier retains… autoreleased variables created here in convertTapped: will soon be released, while we need them to still be around when the block is executed. So we need to take care that stuff we need is available inside the block: objects need to not be released, and local primitives need the __block modifier to get into the block.


__block UInt64 convertedByteCount = 0;
dispatch_queue_t mediaInputQueue =
	dispatch_queue_create("mediaInputQueue", NULL);
[assetWriterInput requestMediaDataWhenReadyOnQueue:mediaInputQueue
										usingBlock: ^
 {

The block will be called repeatedly by GCD, but we still need to make sure that the writer input is able to accept new samples.


while (assetWriterInput.readyForMoreMediaData) {
	CMSampleBufferRef nextBuffer =
		[assetReaderOutput copyNextSampleBuffer];
	if (nextBuffer) {
		// append buffer
		[assetWriterInput appendSampleBuffer: nextBuffer];
		// update ui
		convertedByteCount +=
			CMSampleBufferGetTotalSampleSize (nextBuffer);
		NSNumber *convertedByteCountNumber =
			[NSNumber numberWithLong:convertedByteCount];
		[self performSelectorOnMainThread:@selector(updateSizeLabel:)
			withObject:convertedByteCountNumber
		waitUntilDone:NO];

What’s happening here is that while the writer input can accept more samples, we try to get a sample from the reader output. If we get one, appending it to the writer output is a one-line call. Updating the UI is another matter: since GCD has us running on an arbitrary thread, we have to use performSelectorOnMainThread for any updates to the UI, such as updating a label with the current total byte-count. We would also have to do call out to the main thread to update the progress bar, currently unimplemented because I don’t have a good way to do it yet.

If the writer is ever unable to accept new samples, we fall out of thewhile and the block, though GCD will continue to re-run the block until we explicitly stop the writer.

How do we know when to do that? When we don’t get a sample fromcopyNextSampleBuffer, which means we’ve read all the data from the reader.


} else {
	// done!
	[assetWriterInput markAsFinished];
	[assetWriter finishWriting];
	[assetReader cancelReading];
	NSDictionary *outputFileAttributes =
		[[NSFileManager defaultManager]
			  attributesOfItemAtPath:exportPath
			  error:nil];
	NSLog (@"done. file size is %ld",
		    [outputFileAttributes fileSize]);
	NSNumber *doneFileSize = [NSNumber numberWithLong:
			[outputFileAttributes fileSize]];
	[self performSelectorOnMainThread:@selector(updateCompletedSizeLabel:)
			withObject:doneFileSize
			waitUntilDone:NO];
	// release a lot of stuff
	[assetReader release];
	[assetReaderOutput release];
	[assetWriter release];
	[assetWriterInput release];
	[exportPath release];
	break;
}

Reaching the finish state requires us to tell the writer to finish up the file by sending finish messages to both the writer input and the writer itself. After we update the UI (again, with the song-and-dance required to do so on the main thread), we release all the objects we had to retain in order that they would be available to the block.

Finally, for those of you copy-and-pasting at home, I think I owe you some close braces:


		}
	 }];
	NSLog (@"bottom of convertTapped:");
}

Once you’ve run this code on the device (it won’t work in the Simulator, which doesn’t have an iPod Library) and performed a conversion, you’ll have converted PCM in an exported.caf file in your app’s Documents directory. In theory, your app could do something interesting with this file, like representing it as a waveform, or running it through a Core Audio AUGraph to apply some interesting effects. Just to prove that we actually have performed the desired conversion, use the Xcode Organizer to open up the “iPod Reader” application and drag its “Application Data” to your Mac:

The exported folder will have a Documents, in which you should find exported.caf. Drag it over to QuickTime Player or any other application that can show you the format of the file you’ve produced:

Hopefully this is going to work for you. It worked for most Amazon and iTunes albums I threw at it, but found I had an iTunes Plus album, Ashtray Rock by the Joel Plaskett Emergency, whose songs throw an inexplicable error when opened, so I can’t presume to fully understand this API just yet:


2010-12-12 15:28:18.939 VTM_AViPodReader[7666:307] *** Terminating app
 due to uncaught exception 'NSInvalidArgumentException', reason:
 '*** -[AVAssetReader initWithAsset:error:] invalid parameter not
 satisfying: asset != ((void *)0)'

Still, the arrival of AVAssetReader and AVAssetWriter open up a lot of new possibilities for audio and video apps on iOS. With the reader, you can inspect media samples, either in their original format or with a conversion to a form that suits your code. With the writer, you can supply samples that you receive by transcoding (as I’ve done here), by capture, or even samples you generate programmatically (such as a screen recorder class that just grabs the screen as often as possible and writes it to a movie file).

Date: Monday, December 13th, 2010 | Filed under avfoundation, coreaudio,idevblogaday, iphone, ipod

2 Comments

1. nick_king_equate replies at 25th January 2011 um 6:26 am :

Hi,

first off, thanks for this useful bit of code. Im still a bit fuzzy about blocks, but understand the rest surprisingly enough.

I am working on an app at present that uses the ipod media library. I have lifted a bit of your code to do this. Unfortunately when I run instruments over it it starts generating leaks despite what seems to be good releasing of the memory involved. This problem compounds for every successive access to the library until it gets an out of memory message.

I thought this was bound to be something stupid that I did, but after much banging of head against the screen I ran your original code through instruments and saw that it was having the same issues.

Is this something you are aware of and possibly a pitfall of using the library in this manner. Any direction you could give at this point would make my life a little better

Cheers

N.
2. cocell replies at 4th February 2011 um 2:19 pm :

// release a lot of stuff
[assetReader release];
[assetReaderOutput release];
[assetWriter release];
[assetWriterInput release];
[exportPath release];

Yeah, I notice that these aren’t being released at all and is causing major leaks and then crashes. Anyone know a fix to this? I’m trying right now, I thought it was my AU.