__del__( self )

Eaten by the Python.

Read and Write Audio Files in Python Using FFMPEG

| Comments

This article shows how easy it is to read or write audio files in a few lines Python, by calling the external software FFMPEG through pipes. If you want a battle-tested and more sophisticated version, check out my module MoviePy. Check also that other article for the same with video files.

Before we start, you must have FFMPEG installed on your computer and you must know the name (or path) of the FFMPEG binary on your computer. It should be one of the following:

1
2
FFMPEG_BIN = "ffmpeg" # on Linux
FFMPEG_BIN = "ffmpeg.exe" # on Windows

Reading

To read the audio file “mySong.mp3” we first ask FFMPEG to open this file and to direct its output to Python:

1
2
3
4
5
6
7
8
9
10
import subprocess as sp

command = [ FFMPEG_BIN,
        '-i', 'mySong.mp3',
        '-f', 's16le',
        '-acodec', 'pcm_s16le',
        '-ar', '44100', # ouput will have 44100 Hz
        '-ac', '2', # stereo (set to '1' for mono)
        '-']
pipe = sp.Popen(command, stdout=sp.PIPE, bufsize=10**8)

In the code above -i mySong.mp3 indicates the input file, while s16le/pcm_s16le asks for a raw 16-bit sound output. The - at the end tells FFMPEG that it is being used with a pipe by another program. In sp.Popen, the bufsize parameter must be bigger than the biggest chunk of data that you will want to read (see below). It can be omitted most of the time in Python 2 but not in Python 3 where its default value is pretty small.

Now you just have to read the output of FFMPEG. In our case we have two channels (stereo sound) so one frame of out output will be represented by a pair of integers, each coded on 16 bits (2 bytes). Therefore one frame will be 4-bytes long. To read 88200 audio frames (2 seconds of sound in our case) we will write:

1
2
3
4
5
6
7
raw_audio = pipe.proc.stdout.read(88200*4)

# Reorganize raw_audio as a Numpy array with two-columns (1 per channel)
import numpy

audio_array = numpy.fromstring(raw_audio, dtype="int16")
audio_array = audio_array.reshape((len(audio_array)/2,2))

You can now play this sound using for instance Pygame’s sound mixer:

1
2
3
4
5
import pygame
pygame.init()
pygame.mixer.init(44100, -16, 2) # 44100 Hz, 16bit, 2 channels
sound = pygame.sndarray.make_sound( audio_array )
sound.play()

Finally, you can get informations on a file (audio format, frequency, etc.) by calling

1
2
3
4
5
pipe = sp.Popen([FFMPEG_BINARY,"-i", 'mySong.mp3', "-"],
                stdin=sp.PIPE, stdout=sp.PIPE,  stderr=sp.PIPE)
pipe.stdout.readline()
pipe.terminate()
infos = proc.stderr.read()

Now infos contains a text describing the file, that you would need to parse to obtain the relevant informations. See section Going Further below for a link to an implementation.

Writing

To write an audio file we open FFMPEG and specify that the input will be piped and that it will consist in raw audio data:

1
2
3
4
5
6
7
8
9
10
11
12
pipe = sp.Popen([ FFMPEG_BIN,
       '-y', # (optional) means overwrite the output file if it already exists.
       "-f", 's16le', # means 16bit input
       "-acodec", "pcm_s16le", # means raw 16bit input
       '-r', "44100", # the input will have 44100 Hz
       '-ac','2', # the input will have 2 channels (stereo)
       '-i', '-', # means that the input will arrive from the pipe
       '-vn', # means "don't expect any video input"
       '-acodec', "libfdk_aac" # output audio codec
       '-b', "3000k", # output bitrate (=quality). Here, 3000kb/second
       'my_awesome_output_audio_file.mp3'],
        stdin=sp.PIPE,stdout=sp.PIPE, stderr=sp.PIPE)

The codec can be any valid FFMPEG audio codec. For some codecs providing the output bitrate is optional. Now you just have to write raw audio data into the file. For instance, if your sound is represented have a Nx2 Numpy array of integers, you will just write

1
audio_array.astype("int16").tofile(self.proc.stdin)

Going further

I tried to keep the code as simple as possible here. With a few more lines you can make useful classes to manipulate video files, like FFMPEG_AudioReader and FFMPEG_AudioWriter that I wrote for my video editing software. In these files in particular how to parse the information on the video, how to save/load pictures using FFMPEG, etc.

Comments