wav文件提取数据_从WAV文件中提取节选-CSDN博客

wav文件提取数据

Although PHP is well known for building web pages and applications, it can do more than that. I recently needed to extract a piece of audio from a WAV file on-the-fly and let the user download it through his browser. I tried to find a library that fit my needs but wasn’t successful and had to write the code myself. It was a good opportunity to study in depth how a WAV file is made. In this article I’ll give you a brief overview of the WAV file format and explain the library I developed, Audero Wav Extractor.

尽管PHP以构建网页和应用程序而闻名，但它可以做的还不止这些。最近，我需要即时从WAV文件中提取一段音频，然后让用户通过浏览器下载它。我试图找到一个适合我需要的库，但没有成功，必须自己编写代码。这是深入研究如何制作WAV文件的好机会。在本文中，我将向您简要概述WAV文件格式，并说明我开发的库Audero Wav Extractor 。

WAV格式概述 (Overview of the WAV Format)

The Waveform Audio File Format, also known as WAVE or WAV, is a Microsoft file format standard for storing digital audio data. A WAV file is composed of a set of chunks of different types representing different sections of the audio file. You can envision the format as an HTML page: the first chunks are like the <head> section of a web page, so inside it you will find several pieces of information about the file itself, while the chunk having the audio data itself would be in the <body> section of the page. In this case, the word “chunk” refers to the data sections contained in the file.

波形音频文件格式，也称为WAVE或WAV，是用于存储数字音频数据的Microsoft文件格式标准。 WAV文件由代表音频文件不同部分的一组不同类型的块组成。您可以将格式设想为HTML页面：第一个块类似于网页的<head>部分，因此在其中可以找到有关文件本身的几条信息，而具有音频数据本身的块将是在页面的<body>部分中。在这种情况下，“块”一词是指文件中包含的数据段。

The most important format’s chunks are “RIFF”, which contains the number of bytes of the file, “Fmt”, which has vital information such as the sample rate and the number of channels, and “Data”, which actually has the audio stream data. Each chunk must have at least two field, the id and the size. Besides, every valid WAV must have at least 2 chunks: Fmt and Data. The first is usually at the beginning of the file but after the RIFF.

最重要的格式块是“ RIFF”，其中包含文件的字节数；“ Fmt”，其具有重要的信息，例如采样率和通道数；“ Data”，其实际具有音频流数据。每个块必须至少具有两个字段，即id和size。此外，每个有效的WAV必须至少包含2个块：Fmt和Data。第一个通常在文件的开头，但在RIFF之后。

Each chunk has its own format and fields, and a field constitutes a sub-sections of the chunk. The WAV format has been underspecified in the past and this lead to files having headers that don’t follow the rule strictly. So, while you’re working with an audio, you may find one having one or more fields, or even the most important set to zero or to a wrong value.

每个块都有其自己的格式和字段，并且一个字段构成块的子部分。过去对WAV格式的指定不足，这导致文件的头文件未严格遵循该规则。因此，在使用音频时，您可能会发现一个具有一个或多个字段，甚至最重要的字段设置为零或错误的值。

To give you an idea of what’s inside a chunk, the first one of each WAV file is RIFF. Its first 4 bytes contain the string “RIFF”, and the next 4 contain the file’s size minus the 8 bytes used for these two pieces of data. The final 4 bytes of the RIFF chunk contain the string “WAVE”. You might guess what’s the aim of this data. In this case, you could use them to identify if the file you’re parsing is actually a WAV file or not as I did in the setFilePath() method of the Wav class of my library.

为了让您了解块中的内容，每个WAV文件的第一个是RIFF。它的前4个字节包含字符串“ RIFF”，后4个字节包含文件的大小减去用于这两段数据的8个字节。 RIFF块的最后4个字节包含字符串“ WAVE”。您可能会猜到这些数据的目的是什么。在这种情况下，您可以使用它们来识别要解析的文件是否实际上是WAV文件，就像我在库的Wav类的setFilePath()方法中所做的setFilePath() 。

Another interesting thing to explain is how the duration of a WAV file is calculated. All the information you need, can be retrieved from the two must-have chunks cited before and are: Data chunk size, sample rate, number of channels, and bits per sample. The formula to calculate the file time in seconds is the following:

另一个有趣的解释是如何计算WAV文件的持续时间。您需要的所有信息都可以从前面引用的两个必备数据块中检索到，分别是：数据块大小，采样率，通道数和每个采样的位数。以秒为单位计算文件时间的公式如下：

time = dataChunkSize / (sampleRate * channelsNumber * bitsPerSample / 8)

Say we have:

说我们有：

dataChunkSize = 4498170
sampleRate = 22050
channelsNumber = 16
bitsPerSample = 1

Applying this values to the formula, we have:

将此值应用于公式，我们有：

time = 4498170 / (22050 * 1 * 16 / 8)

And the result is 102 seconds (rounded).

结果是102秒(四舍五入)。

Explaining in depth how a WAV file is structured is outside the scope of this article. If you want to study it further, read these pages I came across when I worked on this:

深入说明WAV文件的结构超出了本文的范围。如果您想进一步研究它，请阅读我从事此工作时遇到的以下页面：

什么是Audero Wav提取器 (What’s Audero Wav Extractor)

Audero Wav Extractor is a PHP library that allows you to extract an exceprt from a WAV file. You can save the extracted excerpt to the local hard disk, download through the user’s browser, or return it as a string for a later processing. The only special requirement the library has is PHP 5.3 or higher because it uses namespaces.

Audero Wav Extractor是一个PHP库，可让您从WAV文件中提取摘录。您可以将提取的摘录保存到本地硬盘，通过用户的浏览器下载，或将其作为字符串返回以供以后处理。该库具有的唯一特殊要求是PHP 5.3或更高版本，因为它使用名称空间。

All the classes of the library are inside the WavExtractor directory, but you’ll notice there is an additional directory Loader where you can find the library’s autoloader. The entry point for the developers is the AuderoWavExtractor class that has the three main methods of the project:

库的所有类都在WavExtractor目录中，但是您会注意到还有一个附加目录Loader ，您可以在其中找到库的自动加载器。开发人员的入口点是AuderoWavExtractor类，该类具有项目的三种主要方法：

downloadChunk(): To download the exceprt
downloadChunk() ：要下载摘录
saveChunk(): To save it on the hard disk
saveChunk() ：将其保存在硬盘上
getChunk(): To retrieve the exceprt as a string
getChunk() ：以字符串形式检索摘录

All of these methods have the same first two parameters: $start and $end that represent the start and the end time, in milliseconds, of the portion to extract respectively. Moreover, both downloadChunk() and saveChunk() accept an optional third argument to set the name of the extracted snippet. If no name is provided, then the method generates one on its own in the format “InputFilename-Start-End.wav”.

所有这些方法都具有相同的前两个参数： $start和$end代表要提取的部分的开始时间和结束时间(以毫秒为单位)。此外， downloadChunk()和saveChunk()接受可选的第三个参数来设置提取的代码片段的名称。如果未提供名称，则该方法将以“ InputFilename-Start-End.wav”格式自行生成一个名称。

Inside the WavExtractor directory there are two sub-folders: Utility, containing the Converter class that has some utility methods, and Wav. The latter contains the Wav, Chunk, and ChunkField classes. The first, as you might expect, represents the WAV file and is composed by one or more chunks (of Chunk type). This class allows you to retrieve the WAV headers, the duration of the audio, and some other useful information. Its most pertinent method is getWavChunk(), the one that retrieve the specified audio portion by reading the bytes from the file.

在WavExtractor目录中，有两个子文件夹： Utility ，包含具有一些实用程序方法的Converter类，以及Wav 。后者包含Wav ， Chunk和ChunkField类。如您所料，第一个表示WAV文件，由一个或多个块( Chunk类型)组成。此类允许您检索WAV标头，音频的持续时间以及其他一些有用的信息。它最相关的方法是getWavChunk() ，该方法通过从文件中读取字节来检索指定的音频部分。

The Chunk class represents a chunk of the WAV file and it’s extended by specialized classes contained in the Chunk folder. The latter doesn’t support all of the existing chunk types, just the most important ones. Unrecognized sections are managed by the generic class and simply ignored in the overall process.

Chunk类代表WAV文件的一部分，并由Chunk文件夹中包含的专门类进行扩展。后者不支持所有现有的块类型，仅支持最重要的类型。无法识别的部分由通用类管理，并在整个过程中被忽略。

The last class described is ChunkField. As I pointed out, each chunk has its own type and fields and each of them have a different length (in bytes) and format. It is very important information to know because you need to pass the right parameters to parse the bytes properly using PHP’s pack() and the unpack() functions or you’ll receive an error. To help manage the data, I decided to wrap them into a class that saves the format, the size, and the value of each field.

描述的最后一个类是ChunkField 。正如我所指出的，每个块都有其自己的类型和字段，并且每个块都有不同的长度(以字节为单位)和格式。这是非常重要的信息，因为您需要传递正确的参数以使用PHP的pack()和unpack()函数正确解析字节，否则会收到错误消息。为了帮助管理数据，我决定将它们包装到一个类中，该类保存每个字段的格式，大小和值。

如何使用Audero Wav Extractor (How to use Audero Wav Extractor)

You can obtain “Audero Wav Extractor” via Composer, adding the following lines to your composer.json file and running its install command.

您可以通过Composer获得“ Audero Wav Extractor”，将以下几行添加到composer.json文件中并运行其install命令。

"require": {
    "audero/audero-wav-extractor": "2.1.*"
}

Composer will download and place the library in the project’s vendor/audero directory.

Composer将下载该库并将其放置在项目的vendor/audero目录中。

Alternatively, you can download the library directly from its repository.

或者，您可以直接从其存储库下载该库。

To extract an exceprt and force the download to the user’s browser, you’ll write code that resembles the following:

要提取摘要并将其强制下载到用户的浏览器，您将编写类似于以下内容的代码：

<?php
//  include the Composer autoloader
require_once "vendor/autoload.php";

$inputFile = "sample1.wav";
$outputFile = "excerpt.wav";
$start = 0 * 1000; // from 0 seconds
$end = 2 * 1000;  // to 2 seconds

try {
    $extractor = new AuderoWavExtractorAuderoWavExtractor($inputFile);
    $extractor->downloadChunk($start, $end, $outputFile);
    echo "Chunk extraction completed. ";
}
catch (Exception $e) {
    echo "An error has occurred: " . $e->getMessage();
}

In the first lines I included the Composer autoloader and then set the values I’ll be working with. As you can see, I provided the source file, the output path including the filename and the time range I want to extract. Then I created an instance of AuderoWavExtractor, giving the source file as a parameter, and then called the downloadChunk() method. Please note that because the output path is passed by reference, you always need to set it into a variable.

在第一行中，我包括了Composer自动加载器，然后设置将要使用的值。如您所见，我提供了源文件，输出路径，包括文件名和要提取的时间范围。然后，我创建了AuderoWavExtractor的实例，将源文件作为参数，然后调用了downloadChunk()方法。请注意，由于输出路径是通过引用传递的，因此您始终需要将其设置为变量。

Let’s look at another example. I’ll show you how to select a time range and save the file into the local hard disk. Moreover, I’ll use the autoloader included in the project.

让我们看另一个例子。我将向您展示如何选择时间范围并将文件保存到本地硬盘中。此外，我将使用项目中包含的自动加载器。

<?php
// set include path
set_include_path(get_include_path() . PATH_SEPARATOR . __DIR__ . "/../src/");

// include the library autoloader
require_once "AuderoLoaderAutoLoader.php";

// Set the classes' loader method
spl_autoload_register("AuderoLoaderAutoLoader::autoload");

$inputFile = "sample2.wav";
$start = 0 * 1000; // from 0 seconds
$end = 2 * 1000;  // to 2 seconds

try {
    $extractor = new AuderoWavExtractorAuderoWavExtractor($inputFile);
    $extractor->saveChunk($start, $end);
    echo "Chunk extraction completed.";
}
catch (Exception $e) {
    echo "An error has occurred: " . $e->getMessage();
}

Apart from the loader configuration, the snippet is very similar to the previous. In fact I only made two changes: the first one is the method called, saveChunk() instead of downloadChunk(), and the second is I haven’t set the output filename (which will use the default format explained earlier).

除了加载程序配置之外，该代码段与之前的代码非常相似。实际上，我仅作了两项更改：第一个是名为saveChunk()的方法，而不是downloadChunk() ，第二个是我没有设置输出文件名(它将使用前面解释的默认格式)。

结论 (Conclusion)

In this article I showed you “Audero Wav Extractor” and how you can use easily extract one or more snippets from a given WAV file. I wrote the library for a work project with requirements for working with a very narrow set of tiles, so if a WAV or its headers are heavily corrupted then the library will probably fail, but I wrote the code to try to recover from errors when possible. Feel free to play with the demo and the files included in the repository as I’ve released it under the CC BY-NC 3.0 license.

在本文中，我向您展示了“ Audero Wav提取器”以及如何轻松地从给定的WAV文件中提取一个或多个片段。我为一个工作项目编写了库，要求使用一组非常狭窄的图块，因此，如果WAV或其标题严重损坏，则该库可能会失败，但是我编写了代码，尝试在可能的情况下从错误中恢复。我已经按照CC BY-NC 3.0许可发布了该演示和该存储库中的文件，请随意使用。