PHP Master | Extract an Excerpt from a WAV File

Although PHP is well known for building web pages and applications, it can do more than that. I recently needed to extract a piece of audio from a WAV file on-the-fly and let the user download it through his browser. I tried to find a library that fit my needs but wasn’t successful and had to write the code myself. It was a good opportunity to study in depth how a WAV file is made. In this article I’ll give you a brief overview of the WAV file format and explain the library I developed, Audero Wav Extractor.

Overview of the WAV Format

The Waveform Audio File Format, also known as WAVE or WAV, is a Microsoft file format standard for storing digital audio data. A WAV file is composed of a set of chunks of different types representing different sections of the audio file. You can envision the format as an HTML page: the first chunks are like the <head> section of a web page, so inside it you will find several pieces of information about the file itself, while the chunk having the audio data itself would be in the <body> section of the page. In this case, the word “chunk” refers to the data sections contained in the file. The most important format’s chunks are “RIFF”, which contains the number of bytes of the file, “Fmt”, which has vital information such as the sample rate and the number of channels, and “Data”, which actually has the audio stream data. Each chunk must have at least two field, the id and the size. Besides, every valid WAV must have at least 2 chunks: Fmt and Data. The first is usually at the beginning of the file but after the RIFF. Each chunk has its own format and fields, and a field constitutes a sub-sections of the chunk. The WAV format has been underspecified in the past and this lead to files having headers that don’t follow the rule strictly. So, while you’re working with an audio, you may find one having one or more fields, or even the most important set to zero or to a wrong value. To give you an idea of what’s inside a chunk, the first one of each WAV file is RIFF. Its first 4 bytes contain the string “RIFF”, and the next 4 contain the file’s size minus the 8 bytes used for these two pieces of data. The final 4 bytes of the RIFF chunk contain the string “WAVE”. You might guess what’s the aim of this data. In this case, you could use them to identify if the file you’re parsing is actually a WAV file or not as I did in the setFilePath() method of the Wav class of my library. Another interesting thing to explain is how the duration of a WAV file is calculated. All the information you need, can be retrieved from the two must-have chunks cited before and are: Data chunk size, sample rate, number of channels, and bits per sample. The formula to calculate the file time in seconds is the following:

time = dataChunkSize / (sampleRate * channelsNumber * bitsPerSample / 8)

Say we have:

dataChunkSize = 4498170
sampleRate = 22050
channelsNumber = 16
bitsPerSample = 1

Applying this values to the formula, we have:

time = 4498170 / (22050 * 1 * 16 / 8)

And the result is 102 seconds (rounded). Explaining in depth how a WAV file is structured is outside the scope of this article. If you want to study it further, read these pages I came across when I worked on this:

What’s Audero Wav Extractor

Audero Wav Extractor is a PHP library that allows you to extract an exceprt from a WAV file. You can save the extracted excerpt to the local hard disk, download through the user’s browser, or return it as a string for a later processing. The only special requirement the library has is PHP 5.3 or higher because it uses namespaces. All the classes of the library are inside the WavExtractor directory, but you’ll notice there is an additional directory Loader where you can find the library’s autoloader. The entry point for the developers is the AuderoWavExtractor class that has the three main methods of the project:

downloadChunk(): To download the exceprt
saveChunk(): To save it on the hard disk
getChunk(): To retrieve the exceprt as a string

All of these methods have the same first two parameters: $start and $end

that represent the start and the end time, in milliseconds, of the portion to extract respectively. Moreover, both downloadChunk() and saveChunk() accept an optional third argument to set the name of the extracted snippet. If no name is provided, then the method generates one on its own in the format “InputFilename-Start-End.wav”. Inside the WavExtractor directory there are two sub-folders: Utility, containing the Converter class that has some utility methods, and Wav. The latter contains the Wav, Chunk, and ChunkField classes. The first, as you might expect, represents the WAV file and is composed by one or more chunks (of Chunk type). This class allows you to retrieve the WAV headers, the duration of the audio, and some other useful information. Its most pertinent method is getWavChunk(), the one that retrieve the specified audio portion by reading the bytes from the file. The Chunk class represents a chunk of the WAV file and it’s extended by specialized classes contained in the Chunk folder. The latter doesn’t support all of the existing chunk types, just the most important ones. Unrecognized sections are managed by the generic class and simply ignored in the overall process. The last class described is ChunkField. As I pointed out, each chunk has its own type and fields and each of them have a different length (in bytes) and format. It is very important information to know because you need to pass the right parameters to parse the bytes properly using PHP’s pack() and the unpack() functions or you’ll receive an error. To help manage the data, I decided to wrap them into a class that saves the format, the size, and the value of each field.

How to use Audero Wav Extractor

You can obtain “Audero Wav Extractor” via Composer, adding the following lines to your composer.json file and running its install command.

"require": {
"audero/audero-wav-extractor": "2.1.*"
}

Composer will download and place the library in the project’s vendor/audero directory. Alternatively, you can download the library directly from its repository. To extract an exceprt and force the download to the user’s browser, you’ll write code that resembles the following:

<?php
// include the Composer autoloader
require_once "vendor/autoload.php";

$inputFile = "sample1.wav";
$outputFile = "excerpt.wav";
$start = 0 * 1000; // from 0 seconds
$end = 2 * 1000; // to 2 seconds

try {
$extractor = new AuderoWavExtractorAuderoWavExtractor($inputFile);
$extractor->downloadChunk($start, $end, $outputFile);
echo "Chunk extraction completed. ";
}
catch (Exception $e) {
echo "An error has occurred: " . $e->getMessage();
}

In the first lines I included the Composer autoloader and then set the values I’ll be working with. As you can see, I provided the source file, the output path including the filename and the time range I want to extract. Then I created an instance of AuderoWavExtractor, giving the source file as a parameter, and then called the downloadChunk() method. Please note that because the output path is passed by reference, you always need to set it into a variable. Let’s look at another example. I’ll show you how to select a time range and save the file into the local hard disk. Moreover, I’ll use the autoloader included in the project.

<?php
// set include path
set_include_path(get_include_path() . PATH_SEPARATOR . __DIR__ . "/../src/");

// include the library autoloader
require_once "AuderoLoaderAutoLoader.php";

// Set the classes' loader method
spl_autoload_register("AuderoLoaderAutoLoader::autoload");

$inputFile = "sample2.wav";
$start = 0 * 1000; // from 0 seconds
$end = 2 * 1000; // to 2 seconds

try {
$extractor = new AuderoWavExtractorAuderoWavExtractor($inputFile);
$extractor->saveChunk($start, $end);
echo "Chunk extraction completed.";
}
catch (Exception $e) {
echo "An error has occurred: " . $e->getMessage();
}

Apart from the loader configuration, the snippet is very similar to the previous. In fact I only made two changes: the first one is the method called, saveChunk() instead of downloadChunk(), and the second is I haven’t set the output filename (which will use the default format explained earlier).

Conclusion

In this article I showed you “Audero Wav Extractor” and how you can use easily extract one or more snippets from a given WAV file. I wrote the library for a work project with requirements for working with a very narrow set of tiles, so if a WAV or its headers are heavily corrupted then the library will probably fail, but I wrote the code to try to recover from errors when possible. Feel free to play with the demo and the files included in the repository as I’ve released it under the CC BY-NC 3.0 license.

Frequently Asked Questions (FAQs) about Extracting Excerpts from a WAV File

How can I extract a specific part of a WAV file?

To extract a specific part of a WAV file, you need to use audio editing software like Audacity. Open the WAV file in Audacity, select the part you want to extract using the selection tool, and then choose “Export Selection” from the File menu. You can then save the selected part as a new WAV file.

Can I extract data from a WAV file using a programming language?

Yes, you can extract data from a WAV file using a programming language like Python. Libraries such as scipy.io.wavfile and wave can be used to read WAV files and extract data. You can then manipulate this data as per your requirements.

How can I extract a secret message from an audio file?

Extracting a secret message from an audio file involves a process called steganography. This process involves hiding information within non-secret text or data. There are various software and tools available that can help you extract hidden messages from audio files.

Can I extract one voice or person speaking inside a WAV file?

Extracting one voice from a WAV file is a complex task that involves audio source separation or voice separation. This can be achieved using advanced signal processing techniques and machine learning algorithms. Software like Audacity can help to some extent, but for more complex tasks, you might need to use more advanced tools or services.

What function in R extracts the dB values from a WAV file?

In R, you can use the tuneR package to read WAV files and extract data. The function readWave() can be used to read a WAV file, and the resulting object can be used to access the dB values. However, you might need to convert the amplitude values to dB using appropriate mathematical formulas.

How can I extract the frequency information from a WAV file?

Extracting frequency information from a WAV file involves performing a Fourier Transform on the data. This can be done using libraries like numpy in Python or fft in R. The result of the Fourier Transform will give you the frequency components of the audio signal.

Can I extract metadata from a WAV file?

Yes, you can extract metadata from a WAV file. This can include information like the sample rate, bit depth, number of channels, and duration. This can be done using audio processing libraries in various programming languages.

How can I extract multiple parts from a WAV file?

To extract multiple parts from a WAV file, you can use audio editing software like Audacity. You can select each part you want to extract and export it as a new file. This process can be repeated for each part you want to extract.

Can I extract audio from a video file and save it as a WAV file?

Yes, you can extract audio from a video file and save it as a WAV file. This can be done using video editing software or conversion tools. The process involves opening the video file, extracting the audio track, and saving it as a WAV file.

How can I convert a WAV file to another audio format?

To convert a WAV file to another audio format, you can use audio conversion software or tools. These tools allow you to open a WAV file and save it in another format like MP3, FLAC, or AAC. The conversion process usually involves choosing the output format and setting the desired quality or bitrate.