Icebreaker
My obsession with using found sounds in my recording, often gives me a sudden itch to bounce a few seconds of audio output from a random application on my computer. Say, a web application or a video player.
And then I really feel, how cumbersome and inflated it is to start a general purpose DAW for that, and crawl through an awkward ritual of gently rubbing its multichannel mixer and transport controls for that. It is the same as summoning a 12 feet tall four-horned purple demon to open a can of beans for you.
There are of course lightweight open source wave editors, like Audacity.
Sadly, previous versions of it had some problems with recording “virtual” audio outputs akin to BlackHole 16ch,
I’m currently using.
Seems to be working correctly from version 3.2.5 onwards, though. Till this version was published I had plenty of free time to develop my own crappy solution for yet another purposely invented problem. But this is how you mine proper tech-blog material, isn’t it? So, let’s go!
The Ingredients
- SoX - swiss army knife for DSP via terminal, as it is nominated by its developers. And, oh boy, indeed it is.
- ffplay - a part of FFmpeg audio suite, and it is a very simple portable media player using the FFmpeg libraries
The Magic
UNIX-like terminal command piping using the |
operator. Pipe operator placed between two commands simply tells the shell,
that first commands input becomes second command output. This thing combined with another UNIX paradigm called IO Redirect
is what really unlocks that “shell magic” at the reach of your fingertips (unless half of them had forever ingrown into
that horrible bio-prosthetic named after a small rodent species). So the commands for bouncing live output to a file are the following:
sox -V6 -t coreaudio "BlackHole 16ch" -t wav - | ffplay -
sox -V6 -t coreaudio "BlackHole 16ch" -t wav output.wav
As you can see, I provided two very similar commands here. Indeed, I use the same source for both of them, which is my
virtual input device.
First command just redirects the output of my virtual device to ffplay
. This solves the recording monitoring problem.
Next one actually does the recording of audio to disk.
Monitoring pipeline sometimes fails, while recording is stable, so I would recommend to run it in separate terminal
sessions.
But wait, there is more!
Albeit sox
does not look all that “user-friendly”, it can be much smarter than trivial methods of live record bouncing. Especially, when it comes to trimming silence.
Let’s consider the following command:
sox -V6 -t coreaudio "BlackHole 16ch" -t wav output.wav silence 1 0.1 1% 1 0.5 1%
This command will record audio from my virtual device, and trim silence from the beginning and the end of the recording. Technically, this applies “silence” effect as a part the internal FX chain of sox
.
Let’s break down the parameters a bit:
Parameter | Description |
---|---|
1 0.1 1% | Detects silence segments that are at least 1 second long, with a silence threshold of 0.1 (10% of maximum volume). |
1 0.5 1% | Removes silence segments that are at least 1 second long, with a silence threshold of 0.5 (50% of maximum volume). |
You can read more about sox
silence effect here.
When bouncing live audio, you would probably want some safeguard to prevent overwriting existing files. sox
does not have a built-in solution for that, but you can easily write a small shell script to do that for you. Here is a small example:
#!/bin/bash
wildcard="$1"
counter=1
while true; do
filename="${wildcard/\*/$counter}"
if [ ! -e "$filename" ]; then
echo "$filename"
break
fi
((counter++))
done
Then you can use it like this:
output_filename=$(./find_next_file.sh "output*.wav")
sox -V6 -t coreaudio "BlackHole 16ch" -t wav "$output_filename"
Additional Notes
sox
will record all the available input channels of your device into a single file. In my case it is 16 channels. How often do you deal with 16-channel WAV files? I was personally quite surprised that standard WAVEs support more than 2 channels. Indeed, it was a relatively recent addition to the standard.
The WAVE format began supporting more than two channels with the introduction of the WAVE Format Extensible (WAVEFORMATEXTENSIBLE
) specification.
This specification was introduced as part of the DirectX Media Objects (DMO) architecture, which was included with DirectX 8.0, released by Microsoft in 2000. It is worth noting, that this specification was most likely designed to support spatial audio, and not multi-track recording.
However, the spatilisation mapping is not a part of the WAVE specification, and is defined by the application. There are several different spatialisation schemes, and they are not compatible with each other. So, the exact spatialisation you get will depend of the software.
I was wondering, why this problem was not addressed by the BWF convention, which adds broadcasting-specific metadata to the WAVE format, such as timecodes and project information. Probably, because BWF was introduced in 1997, three years prior to the WAVEFORMATEXTENSIBLE specification.
It is a bit of a shame, because it would be a perfect place to store the channel mapping information. However, there is a proposal and also, a recommendation to extend the BWF specification to support multi-channel audio. It is not a part of the official BWF specification yet, but it is already supported by some software, like Sound Devices Wave Agent.
Max MSP is a very powerful tool for audio processing, and it can be used for live audio recording as well. However, it is not a lightweight solution for that by any means, despite it does not have the overhead of conventional linear multi-track audio edtitor. For bouncing live audio using Max there is a very convenient ~sfrecord object.
Seems like this tech-blog does a much better job at breaking down the capabilities of sox
than me.