This article shows how easy it is to read or write audio files in a few lines Python, by calling the external software FFMPEG through pipes. If you want a battle-tested and more sophisticated version, check out my module MoviePy. Check also that other article for the same with video files.
Before we start, you must have FFMPEG installed on your computer and you must know the name (or path) of the FFMPEG binary on your computer. It should be one of the following:
1 2 |
|
Reading
To read the audio file “mySong.mp3” we first ask FFMPEG to open this file and to direct its output to Python:
1 2 3 4 5 6 7 8 9 10 |
|
In the code above -i mySong.mp3
indicates the input file, while s16le/pcm_s16le
asks for a raw 16-bit sound output. The -
at the end tells FFMPEG that it is being used with a pipe by another program. In sp.Popen
, the bufsize
parameter must be bigger than the biggest chunk of data that you will want to read (see below). It can be omitted most of the time in Python 2 but not in Python 3 where its default value is pretty small.
Now you just have to read the output of FFMPEG. In our case we have two channels (stereo sound) so one frame of out output will be represented by a pair of integers, each coded on 16 bits (2 bytes). Therefore one frame will be 4-bytes long. To read 88200 audio frames (2 seconds of sound in our case) we will write:
1 2 3 4 5 6 7 |
|
You can now play this sound using for instance Pygame’s sound mixer:
1 2 3 4 5 |
|
Finally, you can get informations on a file (audio format, frequency, etc.) by calling
1 2 3 4 5 |
|
Now infos
contains a text describing the file, that you would need to parse to obtain the relevant informations. See section Going Further below for a link to an implementation.
Writing
To write an audio file we open FFMPEG and specify that the input will be piped and that it will consist in raw audio data:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
The codec can be any valid FFMPEG audio codec. For some codecs providing the output bitrate is optional. Now you just have to write raw audio data into the file. For instance, if your sound is represented have a Nx2 Numpy array of integers, you will just write
1
|
|
Going further
I tried to keep the code as simple as possible here. With a few more lines you can make useful classes to manipulate video files, like FFMPEG_AudioReader and FFMPEG_AudioWriter that I wrote for my video editing software. In these files in particular how to parse the information on the video, how to save/load pictures using FFMPEG, etc.