In this post I use Fourier transforms to revive a forgotten Gershwin piano piece.
Piano rolls are these rolls of perforated paper that you feed to the saloon’s mechanical piano. They have been very popular until the 1950s, and the piano roll repertory counts thousands of arrangements (some by greatest names of jazz) which have never been published in any other form.
Here is Limehouse Nights, played circa 1918 by a 20-year-old George Gershwin:
It is cool, it is public domain music, and I want to play it. But like for so many other rolls, there is no published sheet music.
Fortunately, someone else filmed the same performance with a focus on the roll:
In this post I show how to turn that video into playable sheet music with the help of a few lines of Python. At the end I provide the sheet music, a human rendition, and a Python package that implements the method (and can also be used to transcribe from MIDI files).
Downloading the video
You can download the video from Youtube using youtube-dl in a terminal:
1
|
|
Step 1: Segmentation of the roll
In each frame of the video we will focus on a well-located line of pixels:
By extracting this line from each video frame and stacking the obtained lines on one another we can reconstitute an approximate scan of the piano roll:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
We can see that the holes are placed along columns. Each of these columns corresponds to one key of the piano. A possible way to find the x-coordinates of these columns in the picture is to look at the minimal luminosity of each column of pixels:
1 2 3 4 5 6 |
|
Holes are low-luminosity zones in the picture, therefore the x-coordinates with lower luminosity in the curve above indicate hole-columns. They are not equally spaced because some piano keys are not used in this piece, but there is clearly a dominant period, which we will find by looking at the frequency spectrum of the curve.
We compute that spectrum using a continuous Fourier transform. The peaks in the spectrum below mean that a periodic pattern is present in the curve:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
The higher peak of the spectrum indicates a period of x=5.46 pixels, and this is indeed the distance in pixels between two hole-columns. This, plus the phase of the spectrum in this point, gives us the coordinates of the centers of the hole-columns (vertical lines below).
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
We can now reduce our image of the piano roll to keep only one pixel per hole-column. In the resulting picture, one column gives the time profile of one key in the piano: when it is pressed, and when it is released.
1 2 3 4 5 |
|
To reconstitute the sheet music the most important is to know when a key is pressed, not really when it is released. So we will look for the beginning of the holes, i.e. pixels that present a hole, while the pixel just above them doesn’t.
1 2 3 4 5 6 7 |
|
This worked quite well: in the picture above red dots indicate key strikes and blue dots indicate key releases. Let us gather all the key strikes in a list.
1 2 3 4 5 |
|
Step 2: Finding the pitch
We know that the columns correspond to piano keys. They are sorted left to right from the lowest to the highest note. But which column corresponds to the C4 (the middle C)?
I cheated a little and I looked at the first video (the one where you can see the piano keyboard) to see which notes were pressed in the first chords. I concluded that C4 is represented by column 34.
From now on I would like the musical notes C4, C#4, D4… to be coded by their respective numbers in the MIDI norm: 60, 61, 62… So I will transpose my list of key strikes by adding 26 to each note.
1 2 3 |
|
Step 3: Quantization of the notes
We have a list of notes with the time (or frame) at which they are played. We will now determine which notes are quarters, which are eights, etc. This operation is equivalent to finding the tempo of the piece. Let us first have a look at the times at which the the piano keys are striken:
1 2 3 |
|
We observe regularly-spaced peaks corresponding to chords (several notes striken together). In this kind of music, chords are mainly played on the beat. Therefore, computing the main period in the graph above will give us the duration of a beat (or quarter). Let us have a look at the spectrum.
1 2 3 4 5 6 7 8 9 |
|
The higher peak indicates that a quarter has a duration corresponding to 7.1 frames of the video. Just for info, we can estimate the tempo of the piece with
1
|
|
We will now separate the hands. Let us keep things simple and say that the left hand takes all the notes below the middle C.
1 2 3 |
|
Then we quantize the notes of each hand with the following algorithm: compute the time duration $d$ between a note and the previous note, and compare $d$ to the duration $Q$ of the quarter:
- If $d < Q/4$, consider that the two notes belong to the same chord.
- Else, if $Q/4 \leq d < 3Q/4$ , consider that the previous note was an eighth.
- Else, if $ 3Q/4 \leq d < 5Q/4 $, consider that the previous note was a quarter
- etc.
And we treat the notes one after another:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
The final data looks like this:
>>> right_hand_q[:4]
#> [{'duration': 1.0, 'notes': [70, 72, 76, 80], 't_strike': 20},
#> {'duration': 1.0, 'notes': [68, 74, 78, 82], 't_strike': 28},
#> {'duration': 1.0, 'notes': [66, 76, 80, 84], 't_strike': 35},
#> {'duration': 1.0, 'notes': [68, 74, 78, 82], 't_strike': 43}]
Step 4: Export to sheet music with Lilypond
Our script’s last task is to convert these lists of quantized notes to a music notation language called Lilypond, which wan be compiled into high-quality sheet music. Some packages like music21 can do that, but it is also fairly easy to program your own converter:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
Then we write this lilyfied sheet music in a file and render the sheet music by calling lilypond as an external program:
1 2 3 4 5 6 7 8 9 |
|
The resulting PDF file starts like this (we only asked for the right-hand part):
The script has made a pretty good work, all the notes are there with the right pitch and the right duration. If we transcribe the whole piece we will see some mistakes (mostly notes attributed to the wrong hand, and more rarely notes with a wrong duration, wrong pitch, etc.), which have to be corrected, but still it is pretty cool to have these 1500 notes crunched in just a few seconds.
Final result
After 3 hours of editing (with the Lylipond editor Frescobaldi, which I recommend) we come to this playable sheet music (PDF) and I can tease the keyboard like I’m George Gershwin !
Ok, it’s just the first bars - I am still unhappy with my rendition of the rest, it’s a pretty demanding piece.
Since the piece is in the public domain I also put my transcription in the public domain, and placed its lilypond source here on Github (feel free to share/correct/modify it !).
I also wrapped this code into a python package called Unroll which can transcribe from a video of from a midi file (it uses the package music21 for lilypond conversion, and also provides a convenient LilyPond piano template).
1 2 3 4 5 6 7 |
|
Oh, and that video of me playing was also made with Python (and my library MoviePy). Here is the script that generated it.
A final word on piano rolls transcription
I have been transcribing rolls as an occasional hobby for years, and I am not the only one: here is another transcriber, and another and yet another. Even Limehouse Nights has apparently been recorded in 1992 but the pianist didn’t publish his transcription.
Most of us transcribe from MIDI files which are made from piano rolls scans (starting from MIDI files is equivalent to starting directly to Step 3, quantization and hands separation). Thousands of MIDI files from rolls scans are available on the internet (like here or here) but not all mechanical piano owners have an appropriate scanner, so there must be thousands of other rolls in private collections which have never been scanned and pushed on the Internet. With this post I wanted to show that just filming piano rolls in action is enough for transcriptions purposes.