del( self )

An algorithm to extract looping GIFs from videos

2015-02-01T15:34:00+01:00

Yet another big problem of the Internet era tackled by Mathematics.

Looping GIFs are a very popular form of art on the Web, with two dedicated forums on Reddit (r/perfectloops and r/cinemagraphs) and countless Tumblr pages.

Finding and extracting well-looping segments from a movie requires much attention and patience, and will likely leave you like this in front of your computer:

To make things easier I wrote a Python script which automates the task. This post explains the math behind the algorithm and provides a few examples of use.

When is a video segment well-looping ?

We will say that a video segment loops well when its first and last video frames are very similar. A video frame $F$ can be represented by a sequence of $N$ integers $(F[1], \cdots, F[N])$ whose values indicate the colors of the image’s pixels. For instance, $F[1]$ $F[2]$ and $F[3]$ give the Red, Green, Blue values of the first pixel, $F[4]$ , $F[5]$ , $F[6]$ define the color of the second pixel, etc.

Given two frames $F_1$ , $F_2$ of a same video, we define the difference between these frames as the sum of the differences between their color values:

$d(F_1, F_2) = \sqrt{\sum_{i=1}^N (F_1[i] - F_2[i])^2}.$

We will consider that the two frames are similar when $d(F_1,F_2)$ is under some arbitrary threshold $T$ .

For what follows, it is important to note that $d(F_1, F_2)$ defines a distance between the frames, and can be seen as a generalization of the geometrical distance between two points in a plane:

As a consequence $d(F_1,F_2)$ has nice mathematical properties which we will use in the next section to speed up computations.

Finding well-looping segments

In this section we want to find the times (start and end) of all the well-looping video segments of duration 3 seconds or less in a given video. A simple way to do this is to compare each frame of the movie with all the frames in the previous three seconds. When we find two similar frames (that is, whose distance in under some pre-defined threshold $T$ ), we add their corresponding times to our list.

The problem is that this method requires a huge number of frame comparisons (around ten millions in a standard video) which takes hours. So let us see a few tricks to makes computations faster.

Trick 1: use reduced versions of the frames. HD videos frames can have millions of pixels, so computing the distance between them will require millions of operations. When reduced to small (150-pixel-wide) thumbnails these frames are still detailed enough for our purpose, and their distance can be computed much faster (they also take less place in the RAM).

Trick 2: use triangular inequalities. With this very efficient trick we will be able to deduce whether two frames match, without having to compute their distance. Since $d(F_1, F_2)$ defines a mathematical distance between two frames, many results from classical geometry apply, and in particular the following inequalities on the lengths of a triangle:

The first inequality tells us that if A is very close to B which in turn is very close to C, then A is also close to C. In terms of video frames, this becomes:

$d(F_1, F_3) < d(F_1, F_2) + d(F_2, F_3)$

In practice we will use it as follows: if we already know that a frame $F_1$ is very similar to a frame $F_2$ , and that $F_2$ is very similar to another frame $F_3$ , then we do not need to compute $d(F_1,F_3)$ to know that $F_1$ and $F_3$ are also very similar.

The second inequality tells us that if a point A is very near from B, and B is far from C, then A is also far from C. Or in terms of frames:

$d(F_1, F_3) > d(F_2, F_3) - d(F_1, F_2)$

If $F_1$ is very similar to $F_2$ , and $F_2$ is different from $F_3$ , then we do not need to compute $d(F_1,F_3)$ to know that $F_1$ and $F_3$ are also very different.

Now it gets a little more complicated: we will apply these triangular inequalities to get information on the upper and lower bounds of the distances between frames, which will be updated every time we compute a distance between two frames. For instance, after computing the distance $d(F_1, F_2)$ , the upper and lower bounds of $d(F_1, F_3)$ , denoted $\overline{F_1F_3}$ and $\underline{F_1F_3}$ , can be updated as follows:

$\mbox{Eq.1}\, \begin{cases} \overline{F_1F_3} \leftarrow \min\left(\overline{F_1F_3} ,\,\,\, d(F_1,F_2) + \overline{F_2F_3}\right) \\ \underline{F_1F_3} \leftarrow \max\left(\underline{F_1F_3},\,\,\, d(F_1,F_2) - \overline{F_2F_3},\,\,\, \underline{F_2F_3} - d(F_1,F_2) \right) \end{cases}$

If after the update we have $\overline{F_1F_3}<T$ , we conclude that $F_1$ and $F_3$ are a good match. And if at some point $\underline{F_1F_3}>T$ , we know that $F_1$ and $F_3$ don’t match. If we cannot decide whether $F_1$ and $F_3$ match using this technique, we will eventually need to compute $d(F_1, F_3)$ , but then knowing $d(F_1, F_3)$ will in turn enable us to update the bounds on another distance, $d(F_1, F_4)$ , and so on.

As an illustration, suppose that a video has the following frames in this order:

When the algorithm arrives at $F_4$ , it first computes the distance between this frame and $F_3$ and finds that they don’t match. At this point the algorithm has already found thaft $F_3$ is quite similar to $F_2$ and $F_1$ , so it deduces that neither $F_1$ nor $F_2$ match with $F_4$ (and, certainly, neither do the dozen frames before ). In practice, this method avoids computing 80% to 90% of the distances between frames.

Trick 3: use an efficient formula for the distance. When we compute the distance between two frames using the formula from the last section, we need approximately $3N$ operations: $N$ subtractions, $N$ products, and $(N-1)$ additions to obtain the final sum. But the formula for $d(F_1, F_2)$ can also be rewritten under this form, known as the law of cosines:

$d(F_1, F_2) = \sqrt{ \|F_1\|_2^2 + \| F_2\|_2^2 - 2 (F_1 \cdot F_2) },$

where we used the following notations:

$\|F\|_2^2 = \sum_{i=1}^N F[i]^2, \,\,\,\,\,\,\, F_1 \cdot F_2 = \sum_{i=1}^N F_1[i]F_2[i]$

The interesting thing with this expression of $d(F_1, F_2)$ is that if we first compute the norm $\|F\|$ of each frame once, we can obtain the distance between any pair of $F_1$ and $F_2$ simply by computing $(F_1 \cdot F_2)$ , which requires only $2N$ operations and is therefore 50% faster.

Another advantage of computing $\|F\|$ for each frame is that for two frames $F_1$ and $F_2$ we have

$\mbox{abs}(\|F_1\| - \|F_2\|) \leq d(F_1, F_2) \leq \|F_1\| + \|F_2\|$

which provides initial values for the upper and lower bounds on the frame distances used in Trick 2:

$\mbox{Eq.2}\, \begin{cases} \overline{F_1F_2} \leftarrow \|F_1\| + \|F_2\| \\ \underline{F_1F_2} \leftarrow \mbox{abs}(\|F_1\| - \|F_2\|). \end{cases}$

Final algorithm in pseudo-code. Putting everything together, we obtain the following algorithm:

for each frame F1 in the movie:

    F1 <- downsized( F1 )
    previous_frames <- list of frames in the 3 seconds before F1
    list_of_matching_couples = []

    compute and store |F1|

    for each frame F2 in previous_frames:
        compute upper_F1_F2 and lower_F1_F2 using Eq.2
        if upper_F1_F2 < T:
            mark (F1, F2) as matching
            add (F1, F2) to the list_of_matching_couples
        if lower_F1_F2 > T:
            mark (F1, F2) as non-matching
    
    for each frame F2 in previous_frames:
        if couple (F1,F2) isn't already marked matching or non-matching:
            compute d(F1, F2)
            for each frame F3 after F2 in previous_frames:
                update upper_F1_F3 and lower_F1_F3 using Eq.1
                if upper_F1_F3 < T:
                    mark (F1, F3) as matching
                    add (F1, F3) to the list_of_matching_couples
                if lower_F1_F3 > T:
                    mark (F1, F3) as non-matching

Here is the implementation in Python. The computation time may depend on the quality of the video file, but most movies I tried were processed in circa 20 minutes. Impressive, right, Eugene ?

Selecting interesting segments

The algorithm described in the previous section finds all pairs of matching frames, including consecutive frames (which often look very much alike) and frames from still segments (typically, black screens). So we end up with typically a hundred thousand video segments, only a few of which are really interesting, and we must find a way to filter out all the segments we don’t want before extracting GIFs. This filtering operation takes just a few seconds but its success depends greatly on the filtering criteria you use. Here are some examples that work well:

The first and last frames must be separated by at least 0.5 second.
There must be at least one frame in the sequence which doesn’t match at all with the first frame. This criterion enables to eliminate still segments.
The start of the first frame must be at least 0.5 seconds after the start of the last extracted segment. This is to avoid duplicates (segments which start and end almost at the same times).

I try to be not too restrictive (to avoid filtering out good segments by accident) so I generally end up with about 200 GIFs, many of them them only midly interesting (blinking eyes and such). The last step is a manual filtering which looks like this:

Examples of use

I implemented this algorithm as a plugin of my Python video library MoviePy. Here is an example script with much details:

from moviepy.editor import VideoFileClip
from moviepy.video.tools.cuts import FramesMatches

# Open a video file (any format should work)
clip = VideoFileClip("myvideo.avi")

# Downsize the clip to a width of 150px to speed up things
clip_small = clip.resize(width=150)

# Find all the pairs of matching frames an return their
# corresponding start and end times. Takes 15-60 minutes.
matches = FramesMatches.from_clip(clip_small, 5, 3)

# (Optional) Save the matches for later use. 
# matches.save("myvideo_matches.txt")
# matches = FramesMatches.load("myvideo_matches.txt")

# Filter the scenes: keep only segments with duration >1.5 seconds,
# where the first and last frame have a per-pixel distance < 1,
# with at least one frame at a distance 2 of the first frame,
# and with >0.5 seconds between the starts of the selected segments.
selected_scenes = matches.select_scenes(match_thr=1,
    min_time_span=1.5, nomatch_thr=2, time_distance=0.5)

# The final GIFs will be 450 pixels wide
clip_medium = clip.resize(width=450)

# Extract all the selected scenes as GIFs in folder "myfolder"
selected_scenes.write_gifs(clip_medium, "myfolder")

Here is what be obtain when we try it on Disney’s Snow White:

import moviepy.editor as mp
from moviepy.video.tools.cuts import FramesMatches
clip = mp.VideoFileClip("snowwhite.mp4")
scenes = FramesMatches.from_clip(clip.resize(width=120), 5, 2)
selected_scenes = scenes.select_scenes(2, 0.5, 4, 0.5)
selected_scenes.write_gifs(clip.resize(width=270), "snow_white")

Some of these GIFs could be cut better, some are not really interesting (too short), and a few looping segments have been missed. I think the culprits are the parameters in the last filtering step, which could have been tuned better.

As another example, someone recently posted a Youtube video on r/perfectloops and required that it be transformed into a looping GIF. The following script does just that: it downloads the video from Youtube, finds the best times (t1,t2) to cut a looping sequence, and generates a GIF:

import moviepy.editor as mpy
from moviepy.video.tools.cuts import FramesMatches

# Get the video from youtube, save it as "hamac.mp4"
mpy.download_webfile("NpxD9TZIlv8", "hamac.mp4")

clip = mpy.VideoFileClip("hamac.mp4").resize(width=200)
matches = FramesMatches.from_clip(clip, 40, 3) # loose matching
# find the best matching pair of frames > 1.5s away
best = matches.filter(lambda x: x.time_span >1.5).best()
# Write the sequence to a GIF (with speed=30% of the original)
final = clip.subclip(best.t1, best.t2).speedx(0.3)
final.write_gif("hamac.gif", fps=10)

With MoviePy you can also post-process your GIFs to add text:

code

And since you read until there, here is a more advanced trick for you:

code

Your turn !

The algorithm I presented here is not perfect. It works poorly with low-luminosity clips, and sometimes a slight camera movement or a moving object in the background can prevent a segment from looping. While these segments could be easily corrected by a human, they are more difficult to spot and process with an algorithm.

So my script didn’t completely kill the game, and making looping gifs is still an art. If you have any ideas or remarks on the algorithm, or if you tried it and found some interesting loops in a movie, I’ll be happy to hear about it ! Until then, cheers, and happy GIFing !

Data animations with Python and MoviePy

2014-11-29T22:04:00+01:00

Python has some great data visualization librairies, but few can render GIFs or video animations. This post shows how to use MoviePy as a generic animation plugin for any other library.

MoviePy lets you define custom animations with a function make_frame(t), which returns the video frame corresponding to time t (in seconds):

from moviepy.editor import VideoClip

def make_frame(t):
    """ returns an image of the frame at time t """
    # ... create the frame with any library
    return frame_for_time_t # (Height x Width x 3) Numpy array

animation = VideoClip(make_frame, duration=3) # 3-second clip

# For the export, many options/formats/optimizations are supported
animation.write_videofile("my_animation.mp4", fps=24) # export as video
animation.write_gif("my_animation.gif", fps=24) # export as GIF (slow)

In previous posts I used this method to animate vector graphics (with the library Gizeh), and ray-traced 3D scenes (generated by POV-Ray). This post covers the scientific libraries Mayavi, Vispy, Matplotlib, Numpy, and Scikit-image.

Animations with Mayavi

Mayavi is a Python module for interactive 3D data visualization with a simple interface. In this first example we animate a surface whose elevation depends on the time t:

import numpy as np
import mayavi.mlab as mlab
import  moviepy.editor as mpy

duration= 2 # duration of the animation in seconds (it will loop)

# MAKE A FIGURE WITH MAYAVI

fig_myv = mlab.figure(size=(220,220), bgcolor=(1,1,1))
X, Y = np.linspace(-2,2,200), np.linspace(-2,2,200)
XX, YY = np.meshgrid(X,Y)
ZZ = lambda d: np.sinc(XX**2+YY**2)+np.sin(XX+d)

# ANIMATE THE FIGURE WITH MOVIEPY, WRITE AN ANIMATED GIF

def make_frame(t):
    mlab.clf() # clear the figure (to reset the colors)
    mlab.mesh(YY,XX,ZZ(2*np.pi*t/duration), figure=fig_myv)
    return mlab.screenshot(antialiased=True)

animation = mpy.VideoClip(make_frame, duration=duration)
animation.write_gif("sinc.gif", fps=20)

Another example with a wireframe mesh whose coordinates and view angle depend on the time :

import numpy as np
import mayavi.mlab as mlab
import  moviepy.editor as mpy

duration = 2 # duration of the animation in seconds (it will loop)

# MAKE A FIGURE WITH MAYAVI

fig = mlab.figure(size=(500, 500), bgcolor=(1,1,1))

u = np.linspace(0,2*np.pi,100)
xx,yy,zz = np.cos(u), np.sin(3*u), np.sin(u) # Points
l = mlab.plot3d(xx,yy,zz, representation="wireframe", tube_sides=5,
                line_width=.5, tube_radius=0.2, figure=fig)

# ANIMATE THE FIGURE WITH MOVIEPY, WRITE AN ANIMATED GIF

def make_frame(t):
    """ Generates and returns the frame for time t. """
    y = np.sin(3*u)*(0.2+0.5*np.cos(2*np.pi*t/duration))
    l.mlab_source.set(y = y) # change y-coordinates of the mesh
    mlab.view(azimuth= 360*t/duration, distance=9) # camera angle
    return mlab.screenshot(antialiased=True) # return a RGB image

animation = mpy.VideoClip(make_frame, duration=duration).resize(0.5)
# Video generation takes 10 seconds, GIF generation takes 25s
animation.write_videofile("wireframe.mp4", fps=20)
animation.write_gif("wireframe.gif", fps=20)

As Mayavi relies on the powerful ITK visualization engine it can also process complex datasets. Here is an animation derived from a Mayavi example:

code

Animations with Vispy

Vispy is another interactive 3D data visualization library, based on OpenGL. As for Mayavi, we first create a figure and a mesh, that we animate with MoviePy.

from moviepy.editor import VideoClip
import numpy as np
from vispy import app, scene
from vispy.gloo.util import _screenshot

canvas = scene.SceneCanvas(keys='interactive')
view = canvas.central_widget.add_view()
view.set_camera('turntable', mode='perspective', up='z', distance=2,
                azimuth=30., elevation=65.)

xx, yy = np.arange(-1,1,.02),np.arange(-1,1,.02)
X,Y = np.meshgrid(xx,yy)
R = np.sqrt(X**2+Y**2)
Z = lambda t : 0.1*np.sin(10*R-2*np.pi*t)
surface = scene.visuals.SurfacePlot(x= xx-0.1, y=yy+0.2, z= Z(0),
                        shading='smooth', color=(0.5, 0.5, 1, 1))
view.add(surface)
canvas.show()

# ANIMATE WITH MOVIEPY

def make_frame(t):
    surface.set_data(z = Z(t)) # Update the mathematical surface
    canvas.on_draw(None) # Update the image on Vispy's canvas
    return _screenshot((0,0,canvas.size[0],canvas.size[1]))[:,:,:3]

animation = VideoClip(make_frame, duration=1).resize(width=350)
animation.write_gif('sinc_vispy.gif', fps=20, opt='OptimizePlus')

Here are more advanced examples (derived from the Vispy gallery) where C code snippets are embedded in the Python code to fine-tune the 3D shaders:

code

Animations with Matplotlib

The 2D/3D plotting library Matplotlib already has an animation module, but I found that MoviePy produces lighter, better quality videos, while being up to two times faster (not sure why, see here for more details). Here is how you animate Matplotlib with MoviePy:

import matplotlib.pyplot as plt
import numpy as np
from moviepy.video.io.bindings import mplfig_to_npimage
import moviepy.editor as mpy

# DRAW A FIGURE WITH MATPLOTLIB

duration = 2

fig_mpl, ax = plt.subplots(1,figsize=(5,3), facecolor='white')
xx = np.linspace(-2,2,200) # the x vector
zz = lambda d: np.sinc(xx**2)+np.sin(xx+d) # the (changing) z vector
ax.set_title("Elevation in y=0")
ax.set_ylim(-1.5,2.5)
line, = ax.plot(xx, zz(0), lw=3)

# ANIMATE WITH MOVIEPY (UPDATE THE CURVE FOR EACH t). MAKE A GIF.

def make_frame_mpl(t):
    line.set_ydata( zz(2*np.pi*t/duration))  # <= Update the curve
    return mplfig_to_npimage(fig_mpl) # RGB image of the figure

animation =mpy.VideoClip(make_frame_mpl, duration=duration)
animation.write_gif("sinc_mpl.gif", fps=20)

Matplotlib has many beautiful themes and works well with numerical modules like Pandas or Scikit-Learn. Let us watch a SVM classifier getting a better understanding of the map as the number of training point increases.

import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm # sklearn = scikit-learn
from sklearn.datasets import make_moons
from moviepy.editor import VideoClip
from moviepy.video.io.bindings import mplfig_to_npimage

X, Y = make_moons(50, noise=0.1, random_state=2) # semi-random data

fig, ax = plt.subplots(1, figsize=(4, 4), facecolor=(1,1,1))
fig.subplots_adjust(left=0, right=1, bottom=0)
xx, yy = np.meshgrid(np.linspace(-2,3,500), np.linspace(-1,2,500))

def make_frame(t):
    ax.clear()
    ax.axis('off')
    ax.set_title("SVC classification", fontsize=16)

    classifier = svm.SVC(gamma=2, C=1)
    # the varying weights make the points appear one after the other
    weights = np.minimum(1, np.maximum(0, t**2+10-np.arange(50)))
    classifier.fit(X, Y, sample_weight=weights)
    Z = classifier.decision_function(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    ax.contourf(xx, yy, Z, cmap=plt.cm.bone, alpha=0.8,
                vmin=-2.5, vmax=2.5, levels=np.linspace(-2,2,20))
    ax.scatter(X[:,0], X[:,1], c=Y, s=50*weights, cmap=plt.cm.bone)

    return mplfig_to_npimage(fig)

animation = VideoClip(make_frame, duration = 7)
animation.write_gif("svm.gif", fps=15)

Put simply, the background colors tell us where the classifier thinks the black points and white points belong. At the begining it has no real clue, but as more points appear it progressively understands that they are distributed along moon-shaped regions.

Animations with Numpy

If you are working with Numpy arrays (Numpy is the central numerical library in Python), you don’t need any external plotting library, you can feed the arrays directly to MoviePy.

This is well illustrated by this simulation of a zombie outbreak in France (inspired by this blog post by Max Berggren). France is modelled as a grid (Numpy array) on which all the computations for dispersion and infection are done. At regular intervals, a few Numpy operations tranform the grid into a valid RGB image, and send it to MoviePy.

code

Putting animations together

What is better than an animation ? Two animations ! You can take advantage of MoviePy’s video composition capabilities to mix animations from different libraries:

import moviepy.editor as mpy
# We use the GIFs generated earlier to avoid recomputing the animations.
clip_mayavi = mpy.VideoFileClip("sinc.gif")
clip_mpl = mpy.VideoFileClip("sinc_mpl.gif").resize(height=clip_mayavi.h)
animation = mpy.clips_array([[clip_mpl, clip_mayavi]])
animation.write_gif("sinc_plot.gif", fps=20)

Or for something more artistic:

# Make the white color transparent in clip_mayavi
clip_mayavi2 = (clip_mayavi.fx( mpy.vfx.mask_color, [255,255,255])
                .set_opacity(.4) # whole clip is semi-transparent
                .resize(height=0.85*clip_mpl.h)
                .set_pos('center'))

animation = mpy.CompositeVideoClip([clip_mpl, clip_mayavi2])
animation.write_gif("sinc_plot2.gif", fps=20)

It may be a tad too flashy, but sometimes you must give your audience something they can tweet.

You can also annotate the animations, which is useful when comparing different filters or algorithms. Let’s display four image transformations from the library Scikit-image:

import moviepy.editor as mpy
import skimage.exposure as ske # rescaling, histogram eq.
import skimage.filter as skf # gaussian blur

clip = mpy.VideoFileClip("sinc.gif")
gray = clip.fx(mpy.vfx.blackwhite).to_mask()

def apply_effect(effect, title, **kw):
    """ Returns a clip with the effect applied and a title"""
    filtr = lambda im: effect(im, **kw)
    new_clip = gray.fl_image(filtr).to_RGB()
    txt = (mpy.TextClip(title, font="Purisa-Bold", fontsize=15)
           .set_position(("center","top"))
           .set_duration(clip.duration))
    return mpy.CompositeVideoClip([new_clip,txt])

# Apply 4 different effects to the original animation
equalized = apply_effect(ske.equalize_hist, "Equalized")
rescaled  = apply_effect(ske.rescale_intensity, "Rescaled")
adjusted  = apply_effect(ske.adjust_log, "Adjusted")
blurred   = apply_effect(skf.gaussian_filter, "Blurred", sigma=4)

# Put the clips together on a 2x2 grid, and write to a file.
finalclip = mpy.clips_array([[ equalized, adjusted ],
                             [ blurred,   rescaled ]])
final_clip.write_gif("test2x2.gif", fps=20)

If we replace CompositeVideoClip and clips_array by concatenate_videoclips we get a title-effect type animation:

import moviepy.editor as mpy
import skimage.exposure as ske
import skimage.filter as skf

clip = mpy.VideoFileClip("sinc.gif")
gray = clip.fx(mpy.vfx.blackwhite).to_mask()

def apply_effect(effect, label, **kw):
    """ Returns a clip with the effect applied and a top label"""
    filtr = lambda im: effect(im, **kw)
    new_clip = gray.fl_image(filtr).to_RGB()
    txt = (mpy.TextClip(label, font="Amiri-Bold", fontsize=25,
                        bg_color='white', size=new_clip.size)
           .set_position(("center"))
           .set_duration(1))
    return mpy.concatenate_videoclips([txt, new_clip])

equalized = apply_effect(ske.equalize_hist, "Equalized")
rescaled  = apply_effect(ske.rescale_intensity, "Rescaled")
adjusted  = apply_effect(ske.adjust_log, "Adjusted")
blurred   = apply_effect(skf.gaussian_filter, "Blurred", sigma=4)

clips = [equalized, adjusted, blurred, rescaled]
animation = mpy.concatenate_videoclips(clips)
animation.write_gif("sinc_cat.gif", fps=15)

Finally, MoviePy will be particularly practical when dealing with video data, as it is its first job. For our last example we estimate the size of a growing bacterial population by thresholding the video frames and counting the white pixels. The third panel shows that the population size grows exponentially in time.

code

One library to animate them all ?

I hope to have given you enough recipes to impress your colleagues at your next presentation. Any other library could be animated with MoviePy, as long as its output can be converted to a Numpy array.

Some libraries have their own animation modules, but these are usually a pain to fix and maintain. Thanks to the many users who have tested it in very different contexts, MoviePy seems to have become stable (or people stopped reporting bugs), and can be adapted to many situations. There is still a lot to do, but it would be nice if authors started relying on it for video and GIF rendering, like Pandas and Scikit-Learn rely on Matplotlib for plotting.

For completeness, and because it may better fit your needs, I must mention ImageIO, another Python library with video writing capabilities which focuses on providing a very simple interface to read or write any kind of image, video or volumetric data. For instance you use imwrite() to write any image, mimwrite() for any video/GIF, volwrite() for volumetric data, or simply write() for streamed data.

Cheers, and happy GIFing !

Things you can do with Python and POV-Ray

2014-11-13T17:05:00+01:00

This post presents Vapory, a library I wrote to bring POV-Ray’s 3D rendering capabilities to Python.

POV-ray is a popular 3D rendering software which produces photo-realistic scenes like this one:

It may not be as good as Cinema4D or Pixar’s RenderMan, but POV-Ray is free, open-source, and cross-platform. Rendering is launched from the terminal with povray myscene.pov, where myscene.pov contains the description of a 3D scene:

/* LET'S DRAW A PURPLE SPHERE ! */
camera { location <0, 2, -3> look_at <0, 1, 2>  }
light_source { <2, 4, -3> color <1, 1, 1> }
sphere { <0, 1, 2>, 2 texture{ pigment{ color <1, 0, 1> } } }

While POV-Ray has a very nice and sophisticated scene description language, I wanted to use it together with libraries from the Python world, so I wrote Vapory, a library to render POV-Ray scenes directly from Python, like this:

# LET'S DRAW A PURPLE SPHERE !
from vapory import *

camera = Camera( 'location', [0, 2, -3], 'look_at', [0, 1, 2] )
light = LightSource( [2, 4, -3], 'color', [1, 1, 1] )
sphere = Sphere( [0, 1, 2], 2, Texture( Pigment( 'color', [1, 0, 1] )))

scene = Scene( camera, objects= [light, sphere] )
scene.render("purple_sphere.png", width=400, height=300 )

This script simply generates a scene.pov file (hat tip this script by Simon Burton) and then sends the file to POV-Ray for rendering. Vapory can also pipe the resulting image back to Python, and has a few additional features to make it easy to use in an IPython Notebook.

Example 1: Basic animation with post-processing

We first create a scene where the positions of the objects depend on the time :

from vapory import *

color = lambda col: Texture( Pigment( 'color', col))

def scene(t):
    """ Returns the scene at time 't' (in seconds) """
    return Scene( Camera( 'location', [0, 2, -3], 'look_at',  [1, 1, 2] ),
            [ LightSource( [2, 4, -3], 'color', [1.5,1.5,1.5] ),
              Background( "color", [1,1,1] ),
              Sphere( [0, 1, 2] , 2,   color([.8, 1, .2])),
              Box( [-.8 + .5 * t, -1.5, -.5] , [-.75+.5*t, 3.5, 5], # <= t
                    color([1,.6,.5]), 'rotate', [0, 30, 0] ),
              Sphere( [ 3 - 2 * t , 1, 1.1] , .75,  color([.5, .5, .9]))])

Then we animate this scene with MoviePy:

from moviepy.editor import VideoClip

def make_frame(t):
    return scene(t).render(width = 300, height=200, antialiasing=0.001)

VideoClip(make_frame, duration=4).write_gif("anim.gif",fps=20)

Note that one can also make basic animations directly with POV-Ray. But since we use Python we can use its image processing libraries for post-processing. As an example, let us use Scikit-image’s sobel filter to obtain a nice geometry animation

from skimage.filter import sobel
import numpy as np

def make_frame(t):
    # We will use "quality=1" so that shadows won't be rendered,
    # and double the rendering resolution to avoid pixelization.
    im= scene(t).render(width = 600, height=400,
                        antialiasing=0.001, quality=1)
    sobelized = np.array([sobel(1.0 * im[:,:,i]) for i in [0, 1, 2]])
    return np.dstack(3*[255*(sobelized.max(axis=0)==0)])

clip = VideoClip(make_frame, duration=4).resize(0.5)

clip.write_gif("anim_sobel.gif",fps=20)

The contours look pretty nice because POV-Ray uses exact formulas to render geometrical objects (contrary to libraries like ITK or OpenGL, which rely on triangular meshes). With a few more lines we can mix the two animations to create a cel-shading effect:

from moviepy.editor import VideoFileClip
normal = VideoFileClip("anim.gif") # The first animation
sobelized = VideoFileClip("anim_sobel.gif") # The second animation
# We take the frame-by-frame minimum of the two animations
cel_shade = lambda gf, t: np.minimum(gf(t), sobelized.get_frame(t))
normal.fl(cel_shade).write_gif("cel_shaded.gif")

Example 2: Embedding a video in a 3D scene

Since we are playing around with MoviePy, let’s embed an actual movie in a 3D scene:

We start with a basic scene:

from vapory import *

light = LightSource([10, 15, -20], [1.3, 1.3, 1.3])
wall = Plane([0, 0, 1], 20, Texture(Pigment('color', [1, 1, 1])))
ground = Plane( [0, 1, 0], 0,
                Texture( Pigment( 'color', [1, 1, 1]),
                         Finish( 'phong', 0.1,
                                 'reflection',0.4,
                                 'metallic', 0.3)))
sphere1 = Sphere([-4, 2, 2], 2.0, Pigment('color', [0, 0, 1]),
                                           Finish('phong', 0.8,
                                                  'reflection', 0.5))
sphere2 =Sphere([4, 1, 0], 1.0, Texture('T_Ruby_Glass'),
                Interior('ior',2))

scene = Scene( Camera("location", [0, 5, -10], "look_at", [1, 3, 0]),
               objects = [ ground, wall, sphere1, sphere2, light],
               included=["glass.inc"] )

To this scene we will add a flat box (our theater screen), and for each frame of the movie we will make a PNG image file that will be used by POV-Ray as the texture of our flat box.

from moviepy.video.io.ffmpeg_writer import ffmpeg_write_image

def embed_in_scene(image):

    ffmpeg_write_image("__temp__.png", image)
    image_ratio = 1.0*image.shape[1]/image.shape[0]
    screen = Box([0, 0, 0], [1, 1, 0], Texture(
                    Pigment( ImageMap('png', '"__temp__.png"', 'once')),
                    Finish('ambient', 1.2) ),
                 'scale', [10, 10/image_ratio,1],
                 'rotate', [0, 20, 0],
                 'translate', [-3, 1, 3])
    new_scene = scene.add_objects([screen])
    return new_scene.render(width=800, height=480, antialiasing=0.001)

clip = (VideoFileClip("bunny.mp4") # File containing the original video
        .subclip(23, 47) # cut between t=23 and 47 seconds
        .fl_image(embed_in_scene)  # <= The magic happens
        .fadein(1).fadeout(1)
        .audio_fadein(1).audio_fadeout(1))
clip.write_videofile("bunny2.mp4",bitrate='8000k')

This 25-seconds clip takes 150 minutes to generate (!!!) which may be due to the good resolution settings, numerous light reflexions in the balls and the ground, and the complex texture of the screen.

Example 3: A more complex scene

In this exemple we write “VAPORY” using 240 bricks:

First, we generate an image of the white-on-black text “VAPORY”. Many libraries can do that, here we use ImageMagick through MoviePy:

from moviepy.editor import TextClip

txtclip = TextClip("VAPORY", font="8BIT-WONDER-Nominal", kerning=2,
                   fontsize=8, bg_color='black', color='white')
txt_image = txtclip.get_frame(0)

Here is the result:

We then get the coordinates of the non-black pixels is this image, and use them to place the bricks in the 3D scene, with small random variations around the depth-axis:

from vapory import *
import numpy as np

# Compmute the coordinates of the 241 bricks

xx,yy = txt_image[:,:,0].nonzero()[::-1] # the non-black pixels
bricks_x = xx - 1.0 * (xx.max() + xx.min()) / 2
bricks_y = max(yy)  - yy + 1
bricks_z = np.random.normal(0, 0.08, len(xx))

# Generate / render the scene

bricks = [Box([x,y,z], [x+1,y+1,z-1], Texture("Sandalwood")) # The bricks
         for (x, y, z) in zip(bricks_xx, bricks_yy, bricks_zz)]
light = LightSource([-0, 50, -50], 'color', 1)
camera = Camera( 'location', [0, 5, -17], 'look_at', [0, 5, 0])

scene = Scene(camera, [light, Background("White")]+ boxes,
              included=["colors.inc", "textures.inc"])

scene.render("vapory.png", width=1000, height=240, antialiasing=0.001)

Example 4: Rendering a Physics simulation

Python as many nice scientific and engineering libraries that could benefit from a photorealistic rendering engine. Here I simulated the cube trajectories with PyODE (a Python binding of the physics engine ODE), and fed the results to Vapory and MoviePy for rendering and animation, all in a hundred lines.

Example 5: The ghost of J.Lawrence Cook

In a previous post I talked about how piano rolls can be scanned and turned into MIDI files (which are some sort of electronic sheet music). Here is a 1997 student project where they used such a MIDI file to animate a 3D piano programatically:

Python has now all the libraries for such a project: we can parse the MIDI file with the package mido, and render the piano keyboard with Vapory. We can convert the MIDI file to an MP3 audio file by calling FluidSynth externally and finally use MoviePy to animate everything and incorporate the audio.

Here is Let’s Fall in Love, from a 1933 piano roll arranged by J. Lawrence Cook, and animated with just ~100 lines of code:

Final words

I hope to have shown that Python and POV-Ray can do nice things together, all easy-peasy with Vapory. On the longer term, it would be nice if more recent softwares like Blender (which has a huge user community and modern features like GPU acceleration) had proper Python bindings. But apparently this will never happen.

Vector animations with Python

2014-09-20T16:48:00+02:00

I am a big fan of Dave Whyte’s vector animations, like this one:

It was generated using a special animation language called Processing (here is Dave’s code). While it seems powerful, Processing it is not very elegant in my opinion ; this post shows how to do similar animations using two Python libraries, Gizeh (for the graphics) and MoviePy (for the animations).

Gizeh and Moviepy

Gizeh is a Python library I wrote on top of cairocffi ( a binding of the popular Cairo library) to make it more intuitive. To make a picture with Gizeh you create a surface, draw on it, and export it:

import gizeh
surface = gizeh.Surface(width=320, height=260) # dimensions in pixel
circle = gizeh.circle (r=40, # radius, in pixels
                       xy= [156, 200], # coordinates of the center
                       fill= (1,0,0)) # 'red' in RGB coordinates
circle.draw( surface ) # draw the circle on the surface
surface.get_npimage() # export as a numpy array (we will use that)
surface.write_to_png("my_drawing.png") # export as a PNG

We obtain this magnificent Japanese flag:

To make an animation with MoviePy, you write a function make_frame which, given some time t, returns the video frame at time t:

from moviepy.editor import VideoClip

def make_frame(t):
    """ returns a numpy array of the frame at time t """
    # ... here make a frame_for_time_t
    return frame_for_time_t

clip = VideoClip(make_frame, duration=3) # 3-second clip
clip.write_videofile("my_animation.mp4", fps=24) # export as video
clip.write_gif("my_animation.gif", fps=24) # export as GIF

Example 1

We start with an easy one. In make_frame we just draw a red circle, whose radius depends on the time t:

import gizeh
import moviepy.editor as mpy

W,H = 128,128 # width, height, in pixels
duration = 2 # duration of the clip, in seconds

def make_frame(t):
    surface = gizeh.Surface(W,H)
    radius = W*(1+ (t*(duration-t))**2 )/6
    circle = gizeh.circle(radius, xy = (W/2,H/2), fill=(1,0,0))
    circle.draw(surface)
    return surface.get_npimage()

clip = mpy.VideoClip(make_frame, duration=duration)
clip.write_gif("circle.gif",fps=15, opt="OptimizePlus", fuzz=10)

Example 2

Now there are more circles, and we start to see the interest of making animations programmatically using for loops. The useful function polar2cart transforms polar coordinates (radius, angle) into cartesian coordinates (x,y).

import numpy as np
import gizeh
import moviepy.editor as mpy

W,H = 128,128
duration = 2
ncircles = 20 # Number of circles

def make_frame(t):

    surface = gizeh.Surface(W,H)

    for i in range(ncircles):
        angle = 2*np.pi*(1.0*i/ncircles+t/duration)
        center = W*( 0.5+ gizeh.polar2cart(0.1,angle))
        circle = gizeh.circle(r= W*(1.0-1.0*i/ncircles),
                              xy= center, fill= (i%2,i%2,i%2))
        circle.draw(surface)

    return surface.get_npimage()

clip = mpy.VideoClip(make_frame, duration=duration)
clip.write_gif("circles.gif",fps=15, opt="OptimizePlus", fuzz=10)

Example 3

Here we fill the circles with a slightly excentred radial gradient to give and impression of volume. The colors, initial positions and centers of rotations of the circles are chosen randomly at the beginning.

import gizeh as gz
import numpy as np
import moviepy.editor as mpy

W = H = 150
D = 2 # duration
nballs=60

# generate random values of radius, color, center
radii = np.random.randint(.1*W,.2*W, nballs)
colors = np.random.rand(nballs,3)
centers = np.random.randint(0,W, (nballs,2))

def make_frame(t):
    surface = gz.Surface(W,H)
    for r,color, center in zip(radii, colors, centers):
        angle = 2*np.pi*(t/D*np.sign(color[0]-.5)+color[1])
        xy = center+gz.polar2cart(W/5,angle) # center of the ball
        gradient = gz.ColorGradient(type="radial",
                     stops_colors = [(0,color),(1,color/10)],
                     xy1=[0.3,-0.3], xy2=[0,0], xy3 = [0,1.4])
        ball = gz.circle(r=1, fill=gradient).scale(r).translate(xy)
        ball.draw(surface)
    return surface.get_npimage()

clip = mpy.VideoClip(make_frame, duration=D)
clip.write_gif("balls.gif",fps=15,opt="OptimizePlus")

Example 4

The shadow is done using a circle with radial fading black gradient whose intensity diminishes when the ball is higher, for more realism (?). The shadow is then squeezed vertically using scale(r,r/2), so that its width is twice its height.

import numpy as np
import gizeh as gz
import moviepy.editor as mpy

W,H = 200,75
D = 3
r = 10 # radius of the ball
DJ, HJ = 50, 35 # distance and height of the jumps
ground = 0.75*H # y-coordinate of the ground


gradient = gz.ColorGradient(type="radial",
                stops_colors = [(0,(1,0,0)),(1,(0.1,0,0))],
                xy1=[0.3,-0.3], xy2=[0,0], xy3 = [0,1.4])

def make_frame(t):
    surface = gz.Surface(W,H, bg_color=(1,1,1))
    x = (-W/3)+(5*W/3)*(t/D)
    y = ground - HJ*4*(x % DJ)*(DJ-(x % DJ))/DJ**2
    coef = (HJ-y)/HJ
    shadow_gradient = gz.ColorGradient(type="radial",
                stops_colors = [(0,(0,0,0,.2-coef/5)),(1,(0,0,0,0))],
                xy1=[0,0], xy2=[0,0], xy3 = [0,1.4])
    shadow = (gz.circle(r=(1-coef/4), fill=shadow_gradient)
               .scale(r,r/2).translate((x,ground+r/2)))
    shadow.draw(surface)
    ball = gz.circle(r=1, fill=gradient).scale(r).translate((x,y))
    ball.draw(surface)
    return surface.get_npimage()

clip = mpy.VideoClip(make_frame, duration=D)
clip.write_gif("bouncingball.gif",fps=25, opt="OptimizePlus")

Example 5

This is a derivative of the Dave Whyte animation shown in the introduction. It is made of stacked circles moving towards the picture’s border, with carefully chosen sizes, starting times, and colors (I say carefully chosen because it took me a few dozens random tries). The black around the picture is simply a big circle with no fill and a very very thick black border.

import numpy as np
import gizeh as gz
import moviepy.editor as mpy

W,H = 256, 256
DURATION = 2.0
NDISKS_PER_CYCLE = 8
SPEED = .05

def make_frame(t):

    dt = 1.0*DURATION/2/NDISKS_PER_CYCLE # delay between disks
    N = int(NDISKS_PER_CYCLE/SPEED) # total number of disks
    t0 = 1.0/SPEED # indicates at which avancement to start

    surface = gz.Surface(W,H)
    for i in range(1,N):
        a = (np.pi/NDISKS_PER_CYCLE)*(N-i-1)
        r = np.maximum(0, .05*(t+t0-dt*(N-i-1)))
        center = W*(0.5+ gz.polar2cart(r,a))
        color = 3*((1.0*i/NDISKS_PER_CYCLE) % 1.0,)
        circle = gz.circle(r=0.3*W, xy = center,fill = color,
                              stroke_width=0.01*W)
        circle.draw(surface)
    contour1 = gz.circle(r=.65*W,xy=[W/2,W/2], stroke_width=.5*W)
    contour2 = gz.circle(r=.42*W,xy=[W/2,W/2], stroke_width=.02*W,
                            stroke=(1,1,1))
    contour1.draw(surface)
    contour2.draw(surface)
    return surface.get_npimage()

clip = mpy.VideoClip(make_frame, duration=DURATION)
clip.write_gif("shutter.gif",fps=20, opt="OptimizePlus", fuzz=10)

Example 6

You can draw more than circles ! And you can group different elements so that they will move together (here, a letter and a pentagon).

import numpy as np
import gizeh as gz
import moviepy.editor as mpy

W,H = 300, 75
D = 2 # duration in seconds
r = 22 # size of the letters / pentagons

gradient= gz.ColorGradient("linear",((0,(0,.5,1)),(1,(0,1,1))),
                           xy1=(0,-r), xy2=(0,r))
polygon = gz.regular_polygon(r, 5, stroke_width=3, fill=gradient)

def make_frame(t):
    surface = gz.Surface(W,H, bg_color=(1,1,1))
    for i, letter in enumerate("GIZEH"):
        angle = max(0,min(1,2*t/D-1.0*i/5))*2*np.pi
        txt = gz.text(letter, "Amiri", 3*r/2, fontweight='bold')
        group = (gz.Group([polygon, txt])
                 .rotate(angle)
                 .translate((W*(i+1)/6,H/2)))
        group.draw(surface)
    return surface.get_npimage()

clip = mpy.VideoClip(make_frame, duration=D)
clip.write_gif("gizeh.gif",fps=20, opt="OptimizePlus")

Example 7

We start with just a triangle. By rotating this triangle three time we obtain four triangles which fit nicely into a square. Then we copy this square following a checkerboard pattern. Finally we do the same with another color to fill the missing tiles. Now, if the original triangle is rotated, all the triangles on the picture will also be rotated.

import numpy as np
import gizeh as gz
import moviepy.editor as mpy

W,H = 200,200
WSQ = W/4 # width of one 'square'
D = 2 # duration
a = np.pi/8 # small angle in one triangle
points = [(0,0),(1,0),(1-np.cos(a)**2,np.sin(2*a)/2),(0,0)]

def make_frame(t):
    surface = gz.Surface(W,H)
    for k, (c1,c2) in enumerate([[(.7,0.05,0.05),(1,0.5,0.5)],
                                [(0.05,0.05,.7),(0.5,0.5,1)]]):

        grad = gz.ColorGradient("linear",xy1=(0,0), xy2 = (1,0),
                               stops_colors= [(0,c1),(1,c2)])
        r = min(np.pi/2,max(0,np.pi*(t-D/3)/D))
        triangle = gz.polyline(points,xy=(-0.5,0.5), fill=grad,
                        angle=r, stroke=(1,1,1), stroke_width=.02)
        square = gz.Group([triangle.rotate(i*np.pi/2)
                              for i in range(4)])
        squares = (gz.Group([square.translate((2*i+j+k,j))
                            for i in range(-3,4)
                            for j in range(-3,4)])
                   .scale(WSQ)
                   .translate((W/2-WSQ*t/D,H/2)))

        squares.draw(surface)

    return surface.get_npimage()

clip = mpy.VideoClip(make_frame=make_frame).set_duration(D)
clip.write_gif("blueradsquares.gif",fps=15, fuzz=30)

Example 8

A nice thing to do with vector graphics is fractals. We first build a ying-yang, then we use this ying-yang as the dots of a bigger ying-yang, and we use the bigger ying-yang as the dots of an even bigger ying yang etc. In the end we go one level deep into the imbricated ying-yangs, and we start zooming.

import numpy as np
import gizeh as gz
import moviepy.editor as mpy

W,H = 256,256
R=1.0*W/3
D = 4
yingyang = gz.Group( [
      gz.arc(R,0,np.pi, fill=(0,0,0)),
      gz.arc(R,-np.pi,0, fill=(1,1,1)),
      gz.circle(R/2,xy=(-R/2,0), fill=(0,0,0)),
      gz.circle(R/2,xy=(R/2,0), fill=(1,1,1))])

fractal = yingyang
for i in range(5):
    fractal = gz.Group([yingyang,
                fractal.rotate(np.pi).scale(0.25).translate([R/2,0]),
                fractal.scale(0.25).translate([-R/2,0]),
                gz.circle(0.26*R, xy=(-R/2,0),
                    stroke=(1,1,1), stroke_width=1),
                gz.circle(0.26*R, xy=(R/2,0),
                    stroke=(0,0,0), stroke_width=1)])

# Go one level deep into the fractal
fractal = fractal.translate([(R/2),0]).scale(4)

def make_frame(t):
    surface = gz.Surface(W,H)
    G = 2**(2*(t/D)) # zoom coefficient
    (fractal.translate([R*2*(1-1.0/G)/3,0]).scale(G) # zoom
     .translate(W/2+gz.polar2cart(W/12,2*np.pi*t/D)) # spiral effect
     .draw(surface))
    return surface.get_npimage()

clip = mpy.VideoClip(make_frame, duration=D)
clip.write_gif("yingyang.gif",fps=15, fuzz=30, opt="OptimizePlus")

Example 9

That one is inspired by this Dave Whyte animation. We draw white-filled circles, each of these being almost completely transparent so that they only add 1 to the value of the pixels that they cover. Pixels with an even value, which are the pixels covered by an even number of circles, are then painted white, while the others will be black. To complexify and have a nicely-looping animation, we draw two circles in each direction, one being a time-shifted version of the other.

import numpy as np
import gizeh as gz
import moviepy.editor as mpy

W,H = 400,400
D = 5 # duration, in seconds
ncircles = 10

def make_frame(t):
    surface = gz.Surface(W,H)
    for angle in np.linspace(0,2*np.pi,ncircles+1)[:-1]:
        center = np.array([W/2,H/2]) + gz.polar2cart(.2*W,angle)
        for i in [0,1]: # two circles belongin to two groups
            circle = gz.circle(W*.45*(i+t/D),xy=center,
                                  fill=(1,1,1,1.0/255))
            circle.draw(surface)
    return 255*((surface.get_npimage()+1) % 2)

clip = mpy.VideoClip(make_frame, duration=D).resize(.5)
clip.write_gif("rose.gif",fps=15, fuzz=30, opt="OptimizePlus")

Example 10

A pentagon made of rotating squares ! Interestingly, making the squares rotate the other direction creates a very different-looking animation. The squares are placed according to this polar equation.

The difficulty in this animation is that the last square drawn will necessarily be on top of all the others, and not, as it should be, below the first square ! The solution is to draw each frame twice. The first time, we draw the squares starting from the right, so that the faulty square will also be on the right, and we only keep the left part of that picture. The second time we start drawing the squares from the left, so that the faulty square is on the left, and we keep the right part. By assembling the two valid parts we reconstitute a valid picture.

import numpy as np
import moviepy.editor as mpy
import colorsys
import gizeh as gz

W,H = 256,256
NFACES, R, NSQUARES, DURATION = 5, 0.3,  100, 2

def half(t, side="left"):
    points = gz.geometry.polar_polygon(NFACES, R, NSQUARES)
    ipoint = 0 if side=="left" else NSQUARES/2
    points = (points[ipoint:]+points[:ipoint])[::-1]

    surface = gz.Surface(W,H)
    for (r, th, d) in points:
        center = W*(0.5+gz.polar2cart(r,th))
        angle = -(6*np.pi*d + t*np.pi/DURATION)
        color= colorsys.hls_to_rgb((2*d+t/DURATION)%1,.5,.5)
        square = gz.square(l=0.17*W, xy= center, angle=angle,
                   fill=color, stroke_width= 0.005*W, stroke=(1,1,1))
        square.draw(surface)
    im = surface.get_npimage()
    return (im[:,:W/2] if (side=="left") else im[:,W/2:])


def make_frame(t):
    return np.hstack([half(t,"left"),half(t,"right")])

clip = mpy.VideoClip(make_frame, duration=DURATION)
clip.write_gif("pentagon.gif",fps=15, opt="OptimizePlus")

Mixing videos and vector graphics

A nice advantage of combining Gizeh with MoviePy is that you can read actual video files (or gifs) and use the frames to fill shapes drawn with Gizeh.

We will use this video from the Blender Foundation (it’s under a Creative Common licence). Since you have read until there I’ll show you a little unrelated trick: at 4:32 the rabbit is jumping rope, so there is a potential for a well-looping GIF. We open the video around 4:32, and let MoviePy automatically decide where to cut to have the best-looping GIF possible:

from moviepy.editor import VideoFileClip
import moviepy.video.tools.cuts as cuts

clip = mpy.VideoFileClip("bunny.mp4").resize(0.2).subclip((4,32),(4,33))
t_loop = cuts.find_video_period(clip) # gives t=0.56
clip.subclip(0,t_loop).write_gif('jumping_bunny.gif')

Now we can feed the frames of this GIF to Gizeh, using MoviePy’s clip.fl(some_filter), which means “I want a new clip made by transforming the frames of the current clip with some_filter”.

import moviepy.editor as mpy
import numpy as np
import gizeh as gz

clip = mpy.VideoFileClip("jumping_bunny.gif")
(w, h), d = clip.size, clip.duration
center=  np.array([w/2, h/2])

def my_filter(get_frame, t):
    """ Transforms a frame (given by get_frame(t)) into a different
    frame, using vector graphics."""

    surface = gz.Surface(w,h)
    fill = (gz.ImagePattern(get_frame(t), pixel_zero=center)
            .scale(1.5, center=center))
    for (nfaces,angle,f) in ([3, 0, 1.0/6],
                              [5, np.pi/3, 3.0/6],
                              [7, 2*np.pi/3, 5.0/6]):
        xy = (f*w, h*(.5+ .05*np.sin(2*np.pi*(t/d+f))))
        shape = gz.regular_polygon(w/6,nfaces, xy = xy,
                fill=fill.rotate(angle, center))
        shape.draw(surface)
    return surface.get_npimage()

clip.fl(my_filter).write_gif("jumping_bunny_shapes.gif")

Finally, this function adds a zoom on some part of the video.

import gizeh as gz
import moviepy.editor as mpy
import numpy as np

def add_zoom(clip, target_center, zoom_center, zoom_radius, zoomx):

    w, h = clip.size

    def fl(im):
        """ transforms the image by adding a zoom """

        surface = gz.Surface.from_image(im)
        fill = gz.ImagePattern(im, pixel_zero=target_center,
                               filter='best')
        line = gz.polyline([target_center, zoom_center],
                           stroke_width=3)
        circle_target= gz.circle(zoom_radius, xy=target_center,
                                 fill=fill, stroke_width=2)
        circle_zoom = gz.circle(zoom_radius, xy=zoom_center, fill=fill,
                       stroke_width=2).scale(zoomx, center=zoom_center)
        for e in line, circle_zoom, circle_target:
            e.draw(surface)
        return surface.get_npimage()

    return clip.fl_image(fl)


clip = mpy.VideoFileClip("jumping_bunny.gif")
w, h = clip.size
clip_with_zoom = clip.fx(add_zoom, target_center = [w/2, h/3], zoomx=3,
                   zoom_center = [5*w/6, h/4], zoom_radius=15)
clip_with_zoom.write_gif("jumping_bunnyt_zoom.gif")

Your turn now !

I hope I have convinced you that Python is a nice language for making vector animations. If you give it a try, let me know of any difficulty you may meet installing or using MoviePy and Gizeh. And any feedback, improvement ideas, commits, etc. are also very appreciated.

A Python script controlled via Twitter

2014-07-26T09:01:00+02:00

Let us watch and react to the lattest tweets with Python, the dirty way.

Python modules to interact with Twitter, like tweepy, python-twitter, twitter, or twython, all depend on the Twitter API, which makes them a little complicated to use: you must open a Twitter account, register at dev.twitter.com, open a new application there, and at each connection dance with the OAuth.

If you just want to read the lattest tweets of some Twitter user, instead of using these libraries, you can simply parse the HTML of that user’s Twitter page:

from urllib import urlopen
from bs4 import BeautifulSoup # module for HTML parsing

def get_tweets(username):
    """ Gets the texts and links of username's lattest tweets"""

    url = urlopen( "https://twitter.com//" + username)
    page = BeautifulSoup( url )
    url.close()

    texts = [p.text for p in page.findAll("p")
             if ("class" in p.attrs) and
             ("ProfileTweet-text" in p.attrs["class"])]

    links = [a.attrs["href"] for a in page.findAll("a")
             if ("class" in a.attrs) and
             ("ProfileTweet-timestamp" in a.attrs["class"])]

    return zip(texts, links)

Let us try it on John D. Cook:

>>> print(get_tweets("JohnDCook")[-1]) # John's lattest tweet

(u"Data cleaning code cannot be clean. It's a sort of sin eater.",
  '/StatFact/status/492753200190341120')

As an application, here is a script that watches my (useless) Twitter page every 20 seconds, and each time I tweet something like cmd: my_command it executes my_command in a terminal:

import time
import subprocess

old_tweets = [] # tweets that have already been read
while True:
    tweets = [tweet for tweet in get_tweets("Zulko___")
              if tweet not in old_tweets]
    for (text, link) in tweets:
        if text.startswith("cmd: "):
            subprocess.Popen(text[5:], shell="True")
    old_tweets += tweets
    time.sleep(20) # wait 20 seconds

I can now tweet-control, from my smartphone, any computer that is running this script. If I tweet cmd: firefox the computer will open firefox, if I tweet cmd: echo "Hello" it will print Hello in the terminal, etc.

Introducing Twittcher

If you want more, I wrote Twittcher, a small Python module which doesn’t depend on the Twitter API, to make bots that watch search results or user pages and react to the tweets they find.

For instance this script checks the search results for chocolate milk every 20 seconds, and sends all the new tweets (with date, username, and link) to my mail box.

from twittcher import TweetSender, SearchWatcher
sender = TweetSender(smtp="smtp.gmail.com", port=587, # use gmail smtp
                     login="tintin.zulko@gmail.com", # gmail login
                     password="fibo112358", # be nice, don't try.
                     to_addrs="tintin.zulko@gmail.com", # where to send
                     sender_id = "chocolate milk") # appears in 'Subject'
bot = SearchWatcher("chocolate milk", action=sender.send)
bot.watch_every(20) # check every 20s

Just run that script all day on your computer (or rather on your Raspberry Pi) and you will be updated every time someone drinks chocolate milk and feels the urge to tweet about it (which is very often).

Automatic Soccer Highlights Compilations with Python

2014-07-04T20:36:00+02:00

Python and soccer… who knew ?

In this post we will make a video summary of this soccer game, using the fact that supporters (and commentators) tend to be louder when something interesting happens.

The next lines open the video file with Python and compute the audio volume of each second of the match:

import numpy as np # for numerical operations
from moviepy.editor import VideoFileClip, concatenate

clip = VideoFileClip("soccer_game.mp4")
cut = lambda i: clip.audio.subclip(i,i+1).to_soundarray(fps=22000)
volume = lambda array: np.sqrt(((1.0*array)**2).mean())
volumes = [volume(cut(i)) for i in range(0,int(clip.duration-1))]

If we plot the obtained volumes we see that each goal is followed by a few seconds of loudness:

It is much clearer if we compute the average volumes over periods of 10 seconds:

averaged_volumes = np.array([sum(volumes[i:i+10])/10
                             for i in range(len(volumes)-10)])

The five higher peaks in the above graph give us the times of the five goals of the game, but other peaks may also indicate interesting events. In the next lines, we select the times of the 10% highest peaks:

increases = np.diff(averaged_volumes)[:-1]>=0
decreases = np.diff(averaged_volumes)[1:]<=0
peaks_times = (increases * decreases).nonzero()[0]
peaks_vols = averaged_volumes[peaks_times]
peaks_times = peaks_times[peaks_vols>np.percentile(peaks_vols,90)]

As a refinement, we regroup the times that are less than one minute apart, as they certainly correspond to the same event:

final_times=[peaks_times[0]]
for t in peaks_times:
    if (t - final_times[-1]) < 60:
        if averaged_volumes[t] > averaged_volumes[final_times[-1]]:
            final_times[-1] = t
    else:
        final_times.append(t)

Now final_times contains the times (in seconds) of 21 events, from which we can cut our video. For each event we will start five seconds before its time and stop five seconds after :

final = concatenate([clip.subclip(max(t-5,0),min(t+5, clip.duration))
                     for t in final_times])
final.to_videofile('soccer_cuts.mp4') # low quality is the default

Results

We obtain the following 3:30 video summary (sorry for the external links, these videos can’t be embedded).

Nicely enough, the same 25 lines of code can be used to cut this other summary of this other match. The limitations of the method appear in yet another summary which only captured 8 out of the 9 goals of the match, one or two being badly cut. The algorithm can be confused by broadcasters which make lots of replays or lower the sound of the crowd after goals, and it may miscut some goals on penalties, because the crowd starts whistling long before the shoot. So large-scale applications would require a less naive model.

If you want to try it at home, here is the whole script. It would be interesting to see how the method works on other sports, or how it could be generalized to other uses, like spotting action scenes in movies.

Some more Videogreping with Python

2014-06-21T18:03:00+02:00

In this post I give a minimalistic version of Sam Lavigne’s Videogrep and use it to promote world peace.

This week Sam Lavigne wrote a very entertaining blog post introducing Videogrep, a Python script that searches through dialog in videos (using the associated subtitles file), selects scenes (for instance all scenes containing a given word), and cuts together a new video.

The script on Github implements many tweaks and goodies (such as working on multiple files, identifying complex patterns, etc.). In this post I present the code for a minimal videogreper in Python and attempt to refine cuts to get scenes containing whole sentences or single words.

Getting started

A good place to find public domain videos with subtitles is the White House channel on Youtube. In what follows I will be working on the 2012 State Of The Union Address:

To get both the video and the subtitles you can use youtube-dl in a terminal:

youtube-dl --write-srt --srt-lang en Zgfi7wnGZlE state.mp4

This downloads a video file state.mp4 and a text file state.en.srt indicating the subtitles as follows:

1
00:00:00,166 --> 00:00:00,667
(applause)

2
00:00:00,667 --> 00:00:02,066
The President:
Thank you.

This file can be easily parsed in Python to get a list of elements of the form ([t_start,t_end], text_block):

import re # module for regular expressions

def convert_time(timestring):
    """ Converts a string into seconds """
    nums = map(float, re.findall(r'\d+', timestring))
    return 3600*nums[0] + 60*nums[1] + nums[2] + nums[3]/1000

with open("state.en.srt") as f:
    lines = f.readlines()

times_texts = []
current_times , current_text = None, ""
for line in lines:
    times = re.findall("[0-9]*:[0-9]*:[0-9]*,[0-9]*", line)
    if times != []:
        current_times = map(convert_time, times)
    elif line == '\n':
        times_texts.append((current_times, current_text))
        current_times, current_text = None, ""
    elif current_times is not None:
        current_text = current_text + line.replace("\n"," ")

print (times_texts)

>>> [([0.166, 0.667], '(applause) '),
>>>  ([0.667, 2.066], 'The President: Thank you. ')
>>>   ... ]

A simple videogreper

Let us have a look at the most common words in the speech:

from collections import Counter
whole_text = " ".join([text for (time, text) in times_texts])
all_words = re.findall("\w+", whole_text)
counter = Counter([w.lower() for w in all_words if len(w)>5])
print (counter.most_common(10))

>>> [('applause', 82), ('american', 35), ('america', 33), ('because', 25),
>>>  ('should', 24), ('energy', 23), ('people', 23), ('americans', 20),
>>>  ('country', 18), ('cheering', 15)]

Seems like the word “should” has been pronounced a lot. Let us find the times of all the subtitle blocks in which it appears:

cuts = [times for (times,text) in times_texts
        if (re.findall("should",text) != [])]

Now we cut and put together all these scenes using MoviePy:

from moviepy.editor import VideoFileClip, concatenate

video = VideoFileClip("state.mp4")

def assemble_cuts(cuts, outputfile):
    """ Concatenate cuts and generate a video file. """
    final = concatenate([video.subclip(start, end)
                         for (start,end) in cuts])
    final.to_videofile(outputfile)

assemble_cuts(cuts, "should.mp4")

Here is the result:

It is promising, but in some scenes we don’t get to know exactly what should be done, which is frustrating. In the next section we add a little content-awareness to get more relevant cuts.

Greping whole sentences

We now want to cut together all the sentences containing the word “should”. We first explore the whole text looking for sentences containing that word, then we find the subtitle blocks corresponding to the start and end of each sentence, and we cut the video file accordingly.

times, texts = zip(*times_texts)
txt_lengths = map(len, texts) # length of each subtitle block
indices = [sum(txt_lengths[:i]) for i in range(len(texts))]

def find_times(position):
    """ Finds the (t_start, t_end) in the subtitles
        for a given position in the whole text. """
    return times[ max([i for i in range(len(indices))
                       if (indices[i] <= position)])]

# Regular expression matching all sentences with 'should'
regexpr = "([A-Z][^\.!?]*%s[^\.!?]*[\.!?])"%("should")

cuts = [ (find_times(m.start())[0], find_times(m.end())[1])
         for m in re.finditer(regexpr, whole_text) ]

assemble_cuts(cuts, "should_sentence.mp4")

It’s much better:

Note that with just a little more code you could achieve much more. In Videogrep the author uses the Python package pattern to look for advanced phrase constructions, such as all phrases of the form gerund-determiner-adjective-noun.

Greping single words

Let us take a step in the other direction and see if it is possible to automatically cut a scene with exactly one word or expression, and as little possible of the words around it. Consider the following subtitle block:

00:03:20,100 --> 00:03:23,100
We can do this.

We can roughly evaluate that the word “We” will be pronounced in the first quarter of the time span (from 3:20.1 to 3:20.85), “can” in the second quarter (from 3:20.85 to 3:21.6), etc. Following this reasoning, here is a function that finds the exact times using the relative position of the characters in the subtitles blocks:

def find_word(word, padding=.05):
    """ Finds all 'exact' (t_start, t_end) for a word """
    matches = [re.search(word, text)
               for (t,text) in times_texts]
    return [(t1 + m.start()*(t2-t1)/len(text) - padding,
             t1 + m.end()*(t2-t1)/len(text) + padding)
             for m,((t1,t2),text) in zip(matches, times_texts)
             if (m is not None)]

Let us try it on “Americans”:

assemble_cuts( find_word("Americans"), "americans.mp4")

At least some of the cuts worked properly. If we use much-pronounced words we may find at least one correct cut for each of them and we can build a whole sentence:

words = ["Americans", "must", "develop", "open ", "source",
          " software", "for the", " rest ", "of the world",
          "instead of", " soldiers"]
numbers = [3,0,4,3,4,0,1,2,0,1,0] # take clip number 'n'

cuts = [find_word(word)[n] for (word,n) in zip(words, numbers)]
assemble_cuts(cuts, "fake_speech.mp4")

Wow ! That seemed so real, and it almost made sense. From there the cuts could be refined by hand, but the script did most of the work and surely deserved, if not a Nobel Peace Prize, fourteen minutes of applause:

cuts = [times for (times,text) in times_texts
              if (re.findall("applause",text) != [])]
assemble_cuts(cuts, "applause.mp4")

A covers mix with Python

2014-06-07T16:02:00+02:00

My first ambitious video project with MoviePy

I just finished this mix of 60 covers of the Cup Song, entirely edited using this Python script !

The code uses extensively MoviePy, a video editing library I wrote to automatize simple tasks such as title insertions, concatenations, transitions, etc. With this video I hope to show that MoviePy is becoming mature, and that it can be more than just a FFMPEG wrapper or a GIF editor.

Viennese Mazes: what they are, and how to make one

2014-04-27T23:34:00+02:00

In this post I present an original concept of labyrinths and explain how they can be programmatically generated.

For some time now I have been designing labyrinths based on traffic lights, like this one:

I call these Viennese mazes (long story) and since I couldn’t find anything similar on the Web, I assume that this is something new. Here are some more with other shapes, and their solutions.

These mazes are very difficult to design by hand, and this post is about how to ask your computer to do the work for you. We will see what a good Viennese maze is made of, and how to generate one using a simple evolutionary algorithm.

Viennese mazes are (a special kind of) normal mazes

My first intention with Viennese mazes was to make dynamic mazes, with moving walls. But under each Viennese maze there is actually a standard, old-school labyrinth.

To see this we must think in terms of states. A state describes where you are in the maze, and determines where you can go from there. In the maze above, state (c,1,a) means “I am in (c), I have passed 1 traffic light until then, and just before that I was in (a)”. From this state you cannot reach (d) as the light in this street has turned red, and you cannot reach (a) because you just came from here. But you can move to (b) or (g), that is, to state (b,2,c) or state (g,2,c). Note that states such as (c,1,a), (c,4,a), and (c,7,a) are actually the same state, because afer three moves all traffic lights come back to their original position. So there will always be a finite number of states in a Viennese maze.

If we draw a map of all (reachable) states and their connexions we obtain the following states graph :

The green node marks the starting point, while the blue node is a reunion of all states corresponding to the goal (m). The nodes on the $i$-th line from the top can be reached in $i$ moves but no less, thick lines go downwards and thin lines go upwards.

This graph looks like a classical labyrinth, with crossroads, dead ends, loops… at one glance it gives an idea of the complexity and interestingness of the original Viennese maze. Therefore, we will consider that a good Viennese maze is a maze whose states graph makes a good labyrinth.

What makes a good labyrinth ?

Here is an illustration of a few criteria which make a labyrinth insteresting :

There must be a unique solution, the longer the better. In Viennese mazes It will be difficult to avoid loops like the one in a, where you leave the right track at some point and join it back later at exactly the same position. But there should be a unique mandatory path to the goal (in red in the drawing).
There must be plenty of loops and dead ends, like in b and c, and also links between false paths (like d), all of these preferally early on the path.
The maze should be difficult to solve backwards, by having false ending paths (like e). This criterion also tends to produce nicer-looking Viennese mazes, with a better balance of the different colors.

For the computer to be able to compare mazes and identify the most interesting ones we define scores $S_1, S_2, S_3$ which will quantify how well each of the criteria 1,2,3, are fullfilled by a given maze. For instance

$S_1(maze) = \begin{cases} 0, \,\, \mbox{if there is no solution,} \\ 1, \,\, \mbox{if there are multiple solutions,} \\ L, \,\, \mbox{if there is a unique solution, of length $L$.} \end{cases}$

The final score of a Viennese maze is given by the product

$S = S_1^{c_1} \cdot S_2^{c_2} \cdot S_3^{c_3}$

where the exponents $c_1, c_2, c_3$ reflect the relative importance that we decide to attach to each criterion.

Evaluating this score on the states graph of a Viennese maze is easy: the existence and uniqueness of a solution can be checked using a simple-path-finding algorithm. Dead-ends are simply the nodes of the states graph with no descendents, and the loops of the maze correspond to the thin edges. The states graph itself and its different lines of nodes can be easily computed using Dijkstra’s efficient algorithm to find minimal paths between the start and the different states. The current Python implementation, relying on the Networkx package, enable to evaluate on the order of 1000 mazes per second (depending on their complexity).

Let’s grow mazes !

Now that we have defined how to score a Viennese maze, we will provide the computer with an uncolored canvas, and we will ask for a coloring (initial color of each traffic light) of this canvas that produces the best score possible :

There are $3^{24}$ (almost three hundred billion) ways of coloring the 24 streets on this canvas, and considering all of them would be too long. But a great many of these colorings make interesting mazes, so we can just look semi-randomly for some of these.

An effective way to do so is to first colorize the canvas in a completely random way, then improve the coloring by repeating the following steps:

Create a new maze by randomly changing just a few colors of the current maze.
Compute the score of this new maze.
If the new maze scores lower than the current maze, dump it, otherwise it replaces the current maze. Go back to step 1.

Here is a maze being optimized following this mutation/selection procedure (over 24000 mazes were generated, only the successive improvements are shown):

This algorithm can be refined using annealing (in which you first evaluate many different mazes before refining the search around the best one), or any fancier search strategy such as genetic algorithms, ant colonies… What works best is still an open question.

Try it at home

If you want to try and make your own Viennese mazes (using for instance you district as a canvas), I wrote a Python package called vmfactory which implements all the steps discussed above. It can generate two variants of Viennese mazes: one where passing through the same light twice in a row is forbidden, and one where it isn’t (algorithmically, the only difference is the way the states graph is computed).

In the following example, we generate a squared canvas, we initialize a maze with random colors, optimize it, and generate a report (maze/graph/solution):

from vmfactory import Vmaze_NHT
from vmfactory.canvas import squares_grid

canvas = squares_grid(4,4) # nodes will be numbered 0..15
# NHT means no half-turns (can't pass a light twice in a row) 
maze = Vmaze_NHT(canvas, start = 0, goal = 15)
maze.colorize( maze.random_colors() )
maze.anneal(200,20) # optimize the maze
maze.make_report().savefig('myreport.png')

The package is based on Networkx, Numpy and Matplotlib. The code is rather short (most of it serves to draw fancy graphs !), and modular : you can easily change the rules, change the way the score is computed, change the optimization procedure, or the way the reports are drawn.

Thank you for reading until there, and happy mazing !

Python, Pitch shifting, and the Pianoputer

2014-03-29T15:40:00+01:00

Record a sound, change its pitch 50 times and assign each new sound to a key of your computer keyboard. You get a Pianoputer !

A sound can be encoded as an array (or list) of values, like this:

To make this sound play twice faster, we remove every second value in the array:

By doing so we didn’t only halved the sound’s duration, we also doubled its frequency, making it higher-pitched than the original.

If on the contrary we repeat each value of the array twice, we produce a sound that is slower, with a longer period, and therefore lower-pitched:

Here is a simple Python function that can change the speed of a sound by any factor:

import numpy as np

def speedx(sound_array, factor):
    """ Multiplies the sound's speed by some `factor` """
    indices = np.round( np.arange(0, len(sound_array), factor) )
    indices = indices[indices < len(sound_array)].astype(int)
    return sound_array[ indices.astype(int) ]

What is more difficult to do is to change the duration of a sound while preserving its pitch (sound stretching), or change the pitch of a sound while preserving its duration (pitch shifting).

Sound stretching

Sound stretching can be done using the classical phase vocoder method. You first break the sound into overlapping bits, and you rearrange these bits so that they will overlap even more (if you want to shorten the sound) or less (if you want to stretch the sound), like in this figure:

The difficulty is that the reorganized bits can interfer badly with one another, and some phase-transformation is necessary so that this won’t happen. Here is the Python code, freely rewritten from there:

def stretch(sound_array, f, window_size, h):
    """ Stretches the sound by a factor `f` """

    phase  = np.zeros(window_size)
    hanning_window = np.hanning(window_size)
    result = np.zeros( len(sound_array) /f + window_size)

    for i in np.arange(0, len(sound_array)-(window_size+h), h*f):

        # two potentially overlapping subarrays
        a1 = sound_array[i: i + window_size]
        a2 = sound_array[i + h: i + window_size + h]

        # resynchronize the second array on the first
        s1 =  np.fft.fft(hanning_window * a1)
        s2 =  np.fft.fft(hanning_window * a2)
        phase = (phase + np.angle(s2/s1)) % 2*np.pi
        a2_rephased = np.fft.ifft(np.abs(s2)*np.exp(1j*phase))

        # add to result
        i2 = int(i/f)
        result[i2 : i2 + window_size] += hanning_window*a2_rephased

    result = ((2**(16-4)) * result/result.max()) # normalize (16bit)

    return result.astype('int16')

Pitch shifting

Pitch-shifting is easy once you have sound stretching. If you want a higer pitch, you first stretch the sound while conserving the pitch, then you speed up the result, such that the final sound has the same duration as the initial one, but a higher pitch due to the speed change.

Doubling the frequency of a sound increases the pitch of one octave, which is 12 musical semitones. Therefore to increase the pitch by $n$ semitones we must multiply the frequency by a factor $2^{\frac{n}{12}}$:

def pitchshift(snd_array, n, window_size=2**13, h=2**11):
    """ Changes the pitch of a sound by ``n`` semitones. """
    factor = 2**(1.0 * n / 12.0)
    stretched = stretch(snd_array, 1.0/factor, window_size, h)
    return speedx(stretched[window_size:], factor)

Application: the Pianoputer

Let us play around with our new pitch-shifter. We first strike a bowl:

Then we create 50 pitch-shifted derivatives of that sound, ranging from the very low to the very high:

from scipy.io import wavfile

fps, bowl_sound = wavfile.read("bowl.wav")
tones = range(-25,25)
transposed = [pitchshift(bowl_sound, n) for n in tones]

We will assign each sound to a key of the computer keyboard, following the order in this file, which organizes the keyboard like this:

We simply tell the computer to play the corresponding sound when a key is pressed, and stop the sound when the key is released:

import pygame

pygame.mixer.init(fps, -16, 1, 512) # so flexible ;)
screen = pygame.display.set_mode((640,480)) # for the focus

# Get a list of the order of the keys of the keyboard in right order.
# ``keys`` is like ['Q','W','E','R' ...] 
keys = open('typewriter.kb').read().split('\n')

sounds = map(pygame.sndarray.make_sound, transposed)
key_sound = dict( zip(keys, sounds) )
is_playing = {k: False for k in keys}

while True:

    event =  pygame.event.wait()

    if event.type in (pygame.KEYDOWN, pygame.KEYUP):
        key = pygame.key.name(event.key)

    if event.type == pygame.KEYDOWN:

        if (key in key_sound.keys()) and (not is_playing[key]):
            key_sound[key].play(fade_ms=50)
            is_playing[key] = True

        elif event.key == pygame.K_ESCAPE:
            pygame.quit()
            raise KeyboardInterrupt

    elif event.type == pygame.KEYUP and key in key_sound.keys():

        key_sound[key].fadeout(50) # stops with 50ms fadeout
        is_playing[key] = False

And we have turned our computer into a piano ! Now, to thank you for reading until there, let me play a little turkish song for you:

Here are all the files you need if you want to try this at home. Since not everyone uses Python, I also coded a pianoputer in Javascript/HTML5 (here) but it is very far from good. It would be really great if an experienced HTML5/JS/elm developer improved it, or rewrote it from scratch.

What next ?

On a more general note, I find that computers have been under-used to produce performance music. I get it that it is easier to use a piano keyboard or record from an instrument directly, but look at what you can do with just a bowl and 60 lines of Python !

Even a cheap computer has so many controls that would make it a proper music station: you can sing to the microphone, make gestures to the webcam, modulate stuff using the mouse, and control the rest from your keyboard. So many ways to express yourself, and there is a Python package for each of them… Any artistic hacker wanting to make steps in that direction ?

Transcribing Piano Rolls, the Pythonic Way

2014-02-12T00:25:00+01:00

In this post I use Fourier transforms to revive a forgotten Gershwin piano piece.

Piano rolls are these rolls of perforated paper that you feed to the saloon’s mechanical piano. They have been very popular until the 1950s, and the piano roll repertory counts thousands of arrangements (some by greatest names of jazz) which have never been published in any other form.

Here is Limehouse Nights, played circa 1918 by a 20-year-old George Gershwin:

It is cool, it is public domain music, and I want to play it. But like for so many other rolls, there is no published sheet music.

Fortunately, someone else filmed the same performance with a focus on the roll:

In this post I show how to turn that video into playable sheet music with the help of a few lines of Python. At the end I provide the sheet music, a human rendition, and a Python package that implements the method (and can also be used to transcribe from MIDI files).

Downloading the video

You can download the video from Youtube using youtube-dl in a terminal:

youtube-dl wMsEbYCh7yY -o limehouse_nights.mp4

Step 1: Segmentation of the roll

In each frame of the video we will focus on a well-located line of pixels:

By extracting this line from each video frame and stacking the obtained lines on one another we can reconstitute an approximate scan of the piano roll:

# Required Python modules
from moviepy.editor import VideoFileClip # for video processing
from pylab import * # for mathematics/plotting

# load the video, keep the clip between t=2s and t= 30s
video = VideoFileClip('./limehouse_nights.mp4').subclip(2,30)


# extract the focus lines in the different frames, stack them.
roll_picture = vstack([frame[[156],58:478]
                       for frame in video.iter_frames()])

imshow( roll_picture ) # display the obtained picture

We can see that the holes are placed along columns. Each of these columns corresponds to one key of the piano. A possible way to find the x-coordinates of these columns in the picture is to look at the minimal luminosity of each column of pixels:

roll_greyscale = roll_picture.mean(axis=2) # RGB to grey
luminosity_per_column = roll_greyscale.min(axis=0)

plot( luminosity_per_column)
xlabel('column of pixels (x-index)')
ylabel('minimal luminosity')

Holes are low-luminosity zones in the picture, therefore the x-coordinates with lower luminosity in the curve above indicate hole-columns. They are not equally spaced because some piano keys are not used in this piece, but there is clearly a dominant period, which we will find by looking at the frequency spectrum of the curve.

We compute that spectrum using a continuous Fourier transform. The peaks in the spectrum below mean that a periodic pattern is present in the curve:

n_lines, n_columns = roll_greyscale.shape
tt = arange(n_columns) # 0,1,2,3,4... n_columns
lum0 = luminosity_per_column - luminosity_per_column.mean()

def fourier_transform(signal, period, tt):
    """ See http://en.wikipedia.org/wiki/Fourier_transform
    I could also have used Numpy's fft.
    """
    f = lambda func : (signal*func(2*pi*tt/period)).sum()
    return f(cos)+ 1j*f(sin)

widths = arange(.1,20,.01)
transform = array([ fourier_transform(lum0,w,tt)
                    for w in widths])

plot(widths, abs(transform))
xlabel("Period (in number of pixels)")
ylabel("Spectrum value")

The higher peak of the spectrum indicates a period of x=5.46 pixels, and this is indeed the distance in pixels between two hole-columns. This, plus the phase of the spectrum in this point, gives us the coordinates of the centers of the hole-columns (vertical lines below).

# The maximum the transform indicates the holes' width
optimal_i = argmax(abs(transform))
hole_width = widths[optimal_i]
offset = angle(transform[optimal_i]) +hole_width/2 # to be revised.

keys_positions = arange(offset, n_columns, hole_width)
keys_positions = np.round(keys_positions).astype(int)

plot(luminosity_per_column)
for h in keys_positions:
    axvline(h, c='k', alpha=0.5)
xlabel('column of pixels')
ylabel('minimal luminosity')

We can now reduce our image of the piano roll to keep only one pixel per hole-column. In the resulting picture, one column gives the time profile of one key in the piano: when it is pressed, and when it is released.

keys_greyscale = roll_greyscale[:, keys_positions]

imshow(keys_greyscale[0:150])
xlabel('piano-key column')
ylabel('video frame number')

To reconstitute the sheet music the most important is to know when a key is pressed, not really when it is released. So we will look for the beginning of the holes, i.e. pixels that present a hole, while the pixel just above them doesn’t.

# we threshold the picture to separate the pixels
# into 'hole' and 'no-hole'
key_pressed = keys_greyscale < 0.8*keys_greyscale.max()
# We look at the differences between consecutive lines
key_changes =  diff(key_pressed.astype(int), axis=0)

imshow(key_changes)

This worked quite well: in the picture above red dots indicate key strikes and blue dots indicate key releases. Let us gather all the key strikes in a list.

Ly, Lx = key_changes.shape
keys_strikes = [(i, j) # (column number, strike time)
                for i in range(Ly)
                for j in range(Lx)
                if key_changes[i, j] == 1]

Step 2: Finding the pitch

We know that the columns correspond to piano keys. They are sorted left to right from the lowest to the highest note. But which column corresponds to the C4 (the middle C)?

I cheated a little and I looked at the first video (the one where you can see the piano keyboard) to see which notes were pressed in the first chords. I concluded that C4 is represented by column 34.

From now on I would like the musical notes C4, C#4, D4… to be coded by their respective numbers in the MIDI norm: 60, 61, 62… So I will transpose my list of key strikes by adding 26 to each note.

transpose = 26
keys_strikes = [(t, key+transpose)
                for t, key in keys_strikes ]

Step 3: Quantization of the notes

We have a list of notes with the time (or frame) at which they are played. We will now determine which notes are quarters, which are eights, etc. This operation is equivalent to finding the tempo of the piece. Let us first have a look at the times at which the the piano keys are striken:

strike_times = (key_changes == 1).sum(axis=1)
plot(strike_times)
xlabel('frame number'); ylabel('number of keys hit')

We observe regularly-spaced peaks corresponding to chords (several notes striken together). In this kind of music, chords are mainly played on the beat. Therefore, computing the main period in the graph above will give us the duration of a beat (or quarter). Let us have a look at the spectrum.

tt = arange(len(strike_times))
durations = arange(1.1,30,.02) # avoid 1.0
transform = array([fourier_transform(strike_times,d, tt)
                    for d in durations] )
optimal_i = argmax(abs(transform))
quarter_duration = durations[optimal_i]

plot(durations, abs(transform))
xlabel('period (in frames)'); ylabel('Spectrum value')

The higher peak indicates that a quarter has a duration corresponding to 7.1 frames of the video. Just for info, we can estimate the tempo of the piece with

tempo = int(video.fps * 60.0/quarter_duration) # we find 252.

We will now separate the hands. Let us keep things simple and say that the left hand takes all the notes below the middle C.

C4 = 60
left_hand = [(t,key) for (t,key) in keys_strikes if key<C4]
right_hand = [(t,key) for (t,key) in keys_strikes if key>=C4]

Then we quantize the notes of each hand with the following algorithm: compute the time duration $d$ between a note and the previous note, and compare $d$ to the duration $Q$ of the quarter:

If $d < Q/4$, consider that the two notes belong to the same chord.
Else, if $Q/4 \leq d < 3Q/4$ , consider that the previous note was an eighth.
Else, if $ 3Q/4 \leq d < 5Q/4 $, consider that the previous note was a quarter
etc.

And we treat the notes one after another:

def quantize(keys_strikes, quarter_duration):

    # the result is initialized with one 'empty' note.
    result = [ {'notes':[], 'duration':None, 't_strike':0} ]

    for time, key in keys_strikes:

        # time elapsed since last strike 
        delay = time - result[-1]['t_strike']
        # the next line quantizes that time in eights.
        delay_q = 0.5*int((4.0*delay/quarter_duration+1)/2)

        if (delay_q == 0):# put note in previous chord
            if key not in result[-1]['notes']:
                result[-1]['notes'].append(key)

        else: # this is a 'new' note/chord
            result[-1]['duration'] = delay_q
            result.append( {'notes': [key],
                            'duration': None,
                            't_strike':time} )

    result[-1]['duration'] = 4 # give duration to last note

    if result[0]['notes'] == []:
        result.pop(0) # first note will surely be empty

    return result

left_hand_quantized = quantize(left_hand, quarter_duration)
right_hand__quantized = quantize(right_hand, quarter_duration)

The final data looks like this:

>>> right_hand_q[:4]
#> [{'duration': 1.0, 'notes': [70, 72, 76, 80], 't_strike': 20},
#>  {'duration': 1.0, 'notes': [68, 74, 78, 82], 't_strike': 28},
#>  {'duration': 1.0, 'notes': [66, 76, 80, 84], 't_strike': 35},
#>  {'duration': 1.0, 'notes': [68, 74, 78, 82], 't_strike': 43}]

Step 4: Export to sheet music with Lilypond

Our script’s last task is to convert these lists of quantized notes to a music notation language called Lilypond, which wan be compiled into high-quality sheet music. Some packages like music21 can do that, but it is also fairly easy to program your own converter:

# non-exhaustive lists (but will do for our example)
lilynotes = ['c', 'cis', 'd', 'ees', 'e', 'f',
             'fis', 'g', 'gis', 'a', 'bes', 'b']
lilyoctaves = [',,,',',,',',','',"'", "''", "'''"]
lilydurations = {0.5:'8', 1:'4', 1.5:'4.', 2:'2',
                 3: '2.', 4:'1'}


def midi2lily(note):
    """ converts  60->c, and 61->cis, etc. """
    octave, rank = (note / 12) - 1 , note % 12
    return lilynotes[rank]+lilyoctaves[octave]


def strike2lily(strike):
    """ converts [60,64],1 -> 4 """
    notes, duration =strike['notes'], strike['duration']

    if len(notes) > 1: # chord
        chord = ' '.join(map(midi2lily, sorted(notes)))
        return"< %s >"%chord +lilydurations[duration]
    else:
        return midi2lily(notes[0]) +lilydurations[duration]


def lilyscore(strikes):
    """ converts a python list of srikes into Lilypond """
    return "\n".join(map(strike2lily,strikes))

left_hand_lily = lilyscore(left_hand_quantized)
right_hand_lily = lilyscore(right_hand_quantized)

Then we write this lilyfied sheet music in a file and render the sheet music by calling lilypond as an external program:

filename = "limehouse.ly"

with open(filename, 'w+') as f:
    f.write("\score{\\new Voice{ \\tempo 4=%d"%tempo
            + "\n %s}}"%right_hand_lily)

# render the sheet music by running Lilypond
import os
os.system('lilypond %s'%filename)

The resulting PDF file starts like this (we only asked for the right-hand part):

The script has made a pretty good work, all the notes are there with the right pitch and the right duration. If we transcribe the whole piece we will see some mistakes (mostly notes attributed to the wrong hand, and more rarely notes with a wrong duration, wrong pitch, etc.), which have to be corrected, but still it is pretty cool to have these 1500 notes crunched in just a few seconds.

Final result

After 3 hours of editing (with the Lylipond editor Frescobaldi, which I recommend) we come to this playable sheet music (PDF) and I can tease the keyboard like I’m George Gershwin !

Ok, it’s just the first bars - I am still unhappy with my rendition of the rest, it’s a pretty demanding piece.

Since the piece is in the public domain I also put my transcription in the public domain, and placed its lilypond source here on Github (feel free to share/correct/modify it !).

I also wrapped this code into a python package called Unroll which can transcribe from a video of from a midi file (it uses the package music21 for lilypond conversion, and also provides a convenient LilyPond piano template).

from unroll import video2scan, rollscan2keystrikes, KeyStrikes
# just transcribe until t=74s, after this it is a repeat.
scan = video2scan(videofile = "limehouse_nights.mp4",
                  start=2, end=74,
                  focus = lambda im : im[[156],58:478])
keystrikes = rollscan2keystrikes(scan, report=True).transposed(26)
keystrikes.transcribe('test2.ly', quarter_durations =[2,10,0.01])

Oh, and that video of me playing was also made with Python (and my library MoviePy). Here is the script that generated it.

A final word on piano rolls transcription

I have been transcribing rolls as an occasional hobby for years, and I am not the only one: here is another transcriber, and another and yet another. Even Limehouse Nights has apparently been recorded in 1992 but the pianist didn’t publish his transcription.

Most of us transcribe from MIDI files which are made from piano rolls scans (starting from MIDI files is equivalent to starting directly to Step 3, quantization and hands separation). Thousands of MIDI files from rolls scans are available on the internet (like here or here) but not all mechanical piano owners have an appropriate scanner, so there must be thousands of other rolls in private collections which have never been scanned and pushed on the Internet. With this post I wanted to show that just filming piano rolls in action is enough for transcriptions purposes.

Making GIFs from Video Files with Python

2014-01-23T22:08:00+01:00

Sometimes producing a good animated GIF requires a few advanced tweaks, for which scripting can help. So I added a GIF export feature to MoviePy, a Python package originally written for video editing.

For this demo we will make a few GIFs out of this trailer:

You can download it with this command if you have Youtube-dl installed:

youtube-dl 2Jw-AeaU5WI -o frozen_trailer.mp4

Converting a video excerpt into a GIF

In what follows we import MoviePy, we open the video file, we select the part between 1’22.65 (1 minute 22.65 seconds) and 1’23.2, reduce its size (to 30% of the original) and save it as a GIF:

from moviepy.editor import *

clip = (VideoFileClip("frozen_trailer.mp4")
        .subclip((1,22.65),(1,23.2))
        .resize(0.3))
clip.write_gif("use_your_head.gif")

Cropping the image

For my next GIF I will only keep the center of the screen. If you intend to use MoviePy, note that you can preview a clip with clip.preview(). During the preview clicking on a pixel will print its position, which is convenient for cropping with precision.

kris_sven = (VideoFileClip("frozen_trailer.mp4")
             .subclip((1,13.4),(1,13.9))
             .resize(0.5)
             .crop(x1=145,x2=400)) # remove left-right borders
kris_sven.write_gif("kris_sven.gif")

Freezing a region

Many GIF makers like to freeze some parts of the GIF to reduce the file size and/or focus the attention on one part of the animation.

In the next GIF we freeze the left part of the clip. To do so we take a snapshot of the clip at t=0.2 seconds, we crop this snapshot to only keep the left half, then we make a composite clip which superimposes the cropped snapshot on the original clip:

anna_olaf = (VideoFileClip("frozen_trailer.mp4")
             .subclip(87.9,88.1)
             .speedx(0.5) # Play at half speed
             .resize(.4))

snapshot = (anna_olaf
            .crop(x2= anna_olaf.w/2) # remove right half
            .to_ImageClip(0.2) # snapshot of the clip at t=0.2s
            .set_duration(anna_olaf.duration))

composition = CompositeVideoClip([anna_olaf, snapshot])
composition.write_gif('anna_olaf.gif', fps=15)

Freezing a more complicated region

This time we will apply a custom mask to the snapshot to specify where it will be transparent (and let the animated part appear) .

import moviepy.video.tools.drawing as dw

anna_kris = (VideoFileClip("frozen_trailer.mp4", audio=False)
             .subclip((1,38.15),(1,38.5))
             .resize(.5))

# coordinates p1,p2 define the edges of the mask
mask = dw.color_split(anna_kris.size, p1=(445, 20), p2=(345, 275),
                      grad_width=5) # blur the mask's edges

snapshot = (anna_kris.to_ImageClip()
            .set_duration(anna_kris.duration)
            .set_mask(ImageClip(mask, ismask=True))

composition = CompositeVideoClip([anna_kris,snapshot]).speedx(0.2)
# 'fuzz' (0-100) below is for gif compression
composition.write_gif('anna_kris.gif', fps=15, fuzz=3)

Time-symetrization

Surely you have noticed that in the previous GIFs, the end did not always look like the beginning. As a consequence, you could see a disruption every time the animation was restarted. A way to avoid this is to time-symetrize the clip, i.e. to make the clip play once forwards, then once backwards. This way the end of the clip really is the beginning of the clip. This creates a GIF that can loop fluidly, without a real beginning or end.

def time_symetrize(clip):
    """ Returns the clip played forwards then backwards. In case
    you are wondering, vfx (short for Video FX) is loaded by
    >>> from moviepy.editor import * """
    return concatenate([clip, clip.fx( vfx.time_mirror )])

clip = (VideoFileClip("frozen_trailer.mp4", audio=False)
        .subclip(36.5,36.9)
        .resize(0.5)
        .crop(x1=189, x2=433)
        .fx( time_symetrize ))

clip.write_gif('sven.gif', fps=15, fuzz=2)

Ok, this might be a bad example of time symetrization,it makes the snow flakes go upwards in the second half of the animation.

Adding some text

In the next GIF there will be a text clip superimposed on the video clip.

olaf = (VideoFileClip("frozen_trailer.mp4", audio=False)
        .subclip((1,21.6),(1,22.1))
        .resize(.5)
        .speedx(0.5)
        .fx( time_symetrize ))

# Many options are available for the text (requires ImageMagick)
text = (TextClip("In my nightmares\nI see rabbits.",
                 fontsize=30, color='white',
                 font='Amiri-Bold', interline=-25)
        .set_pos((20,190))
        .set_duration(olaf.duration))

composition = CompositeVideoClip( [olaf, text] )
composition.write_gif('olaf.gif', fps=10, fuzz=2)

Making the gif loopable

The following GIF features a lot of snow falling. Therefore it cannot be made loopable using time-symetrization (or you will snow floating upwards !). So we will make this animation loopable by having the beginning of the animation appear progressively (fade in) just before the end of the clip. The montage here is a little complicated, I cannot explain it better than with this picture:

castle = (VideoFileClip("frozen_trailer.mp4", audio=False)
          .subclip(22.8,23.2)
          .speedx(0.2)
          .resize(.4))

d = castle.duration
castle = castle.crossfadein(d/2)

composition = (CompositeVideoClip([castle,
                                   castle.set_start(d/2),
                                   castle.set_start(d)])
               .subclip(d/2, 3*d/2))

composition.write_gif('castle.gif', fps=5,fuzz=5)

Another example of a GIF made loopable

The next clip (from the movie Charade) was almost loopable: you can see Carry Grant smiling, then making a funny face, then coming back to normal. The problem is that at the end of the excerpt Cary is not exactly in the same position, and he is not smiling as he was at the beginning. To correct this, we take a snapshot of the first frame and we make it appear progressively at the end. This seems to do the trick.

carry = (VideoFileClip("charade.mp4", audio=False)
         .subclip((1,51,18.3),(1,51,20.6))
         .crop(x1=102, y1=2, x2=297, y2=202))

d = carry.duration
snapshot = (carry.to_ImageClip()
            .set_duration(d/6)
            .crossfadein(d/6)
            .set_start(5*d/6))

composition = CompositeVideoClip([carry, snapshot])
composition.write_gif('carry.gif', fps=carry.fps, fuzz=3)

Big finish: background removal

Let’s dive further into the scripting madness: we consider this video around 2’16 (edit: not the video I originally used, it was removed by the Youtube user, I add to find another link):

And we will remove the background to make this gif (with transparent background):

The main difficulty was to find what the background of the scene is. To do so, the script gathers a few images in which the little pigs are are different positions (so that every part part of the background is visible on at least several (actually most) of the slides, then it takes the pixel-per-pixel median of these pictures, which gives the background.

# Requires Scikit Images installed
import numpy as np
import skimage.morphology as skm
import skimage.filter as skf

from moviepy.editor import *

### LOAD THE CLIP

pigsPolka =  (VideoFileClip("pigs_in_a_polka.mp4"))
              .subclip((2,16.85),(2,35))
              .resize(.5)
              .crop(x1=140, y1=41, x2=454, y2=314))


### COMPUTE THE BACKGROUND
# There is no single frame showing the background only (there
# is always a little pig in the screen) so we use the median of
# several carefully chosen frames to reconstitute the background.
# I must have spent half an hour to find the right set of frames.

times = (list(np.linspace(2.3,4.2,30))+
         list(np.linspace(6.0,7.1,30))+
         8*[6.2])

frames_bg = [pigsPolka.get_frame(t) for t in times]
background = np.percentile(np.array(frames_bg), 50,axis=0)


### MASK GENERATION

def make_mask_frame(t):
    """ Computes the mask for the frame at time t """

    # THRESHOLD THE PIXEL-TO-PIXEL DIFFERENCE
    # BETWEEN THE FRAME AND THE BACKGROUND
    im = pigsPolka.get_frame(t)
    mask = ((im-background)**2).sum(axis=2) > 1500

    # REMOVE SMALL OBJECTS
    mask = skm.remove_small_objects(mask)

    # REMOVE SMALL HOLES (BY DILATIATION/EROSION)
    selem=np.array([[1,1,1],[1,1,1],[1,1,1]])
    for i in range(2):
        mask = skm.binary_dilation(mask,selem)
    for i in range(2):
        mask = skm.binary_erosion(mask,selem)

    # BLUR THE MASK A LITTLE
    mask = skf.gaussian_filter(mask.astype(float),1.5)

    return mask

mask = (VideoClip( make_mask_frame, ismask=True,
                   duration= pigsPolka.duration)

### LAST EFFECTS AND GIF GENERATION

final = (pigsPolka.set_mask(mask)
         .subclip(12.95,15.9)
         .fx(vfx.blackwhite) # black & white effect !

final.write_gif('pigs_polka.gif', fps=10, fuzz=10)

Interception of a linear trajectory with constant speed

2013-11-11T23:59:00+01:00

In this post I show how helpful trigonometry can be when it comes to catching rabbits.

Problem

Alice just spotted a white rabbit urging to its rabbit hole ! Given the coordinates of the positions A, B, H, of Alice, the rabbit and the hole, as well as the respective speeds $S_A$ and $S_B$ of Alice and the rabbit, say whether Alice can catch the rabbit before it disappears, and give the time and place of the fastest possible interception.

Solution

I guess that I am not the first one to solve this but I couldn’t find any simple solution on the internet. The one I am giving here relies on trigonometry, but interestingly it doesn’t require to compute any trigonometrical function !

If sines give you fever, don’t wait for the first sines of fever (uh uh uh), just skip this part, I summarize everything in the next section.

We call C and $t_C$ the location and the time of the catch. It is straightforward that, since we are looking for the fastest catch, Alice’s trajectory towards C must be a straight line. Here is a sketch of the problem:

Note that the lengths AC and BC denote the distance run by Alice and the Rabbit until the catch, therefore they verify

$AC = S_A t_C$ $BC = S_B t_C$

So finding the length BC would answer the problem, as it would tell us whether Alice can catch the rabbit before it reaches the rabbit hole (case $BC $C = B + \dfrac{BC}{BH}\overrightarrow{BH}$ $t_C = BC/S_B$

To express BC using the coordinates of the points, let us apply the famous Law of Sines to the triangle ABC:

$\dfrac{\sin \alpha}{BC} = \dfrac{\sin \beta}{AC} = \dfrac{\sin \gamma}{AB}$

Wich leads to

$BC = \dfrac {\sin \alpha}{\sin \gamma} AB = \dfrac {\sin \alpha}{\sin \gamma} \sqrt{(x_B-x_A)^2+(y_B-y_A)^2}$

Now all we have to do is to express $\sin \alpha$ and $\sin \gamma$ in function of the given data. To do so we first compute $\sin(\beta)$, then we express $\sin \alpha$ with $\sin \beta$, and we express $\sin \gamma$ as a function of $\sin \alpha$ and $\sin \beta$.

The value of $\sin \beta$ can be computed from the points coordinates as follows:

$\sin \beta = \dfrac{det(\overrightarrow{BA},\overrightarrow{BH})}{ BA * BH } = \dfrac{(x_A - x_B)(y_H-y_B) - (y_A - y_B)(x_H-x_B)}{\sqrt{(x_B-x_A)^2+(y_B-y_A)^2} \sqrt{(x_B-x_H)^2+(y_B-y_H)^2}}$

Then we use the Law of Sines again, to compute $\sin \alpha$:

$\sin \alpha = \frac{BC}{AC} \sin \beta = \frac{S_b t_C}{S_a t_C} \sin \beta = \frac{S_b}{S_a} \sin \beta$

This only makes sense, of course, if

$\frac{S_A}{S_R} \mid \sin \beta \mid \leq 1$

If this is not the case we conclude that Alice will never catch the rabbit, which solves the problem.

Finally we use the fact that the angles of a triangle sum to $\pi$ to compute $\sin \gamma$:

$\sin \gamma = \sin (\pi - \alpha - \beta) = \sin (\alpha + \beta) = \sin \alpha \cos \beta + \cos \alpha \sin \beta$

We reformulate using the already-copmputed $\sin \alpha$ and $\sin \beta$:

$\sin \gamma = (\sin \alpha) \sqrt{1 - \sin^2 \beta} + (\sin \beta) \sqrt{1 - \sin^2 \alpha}$

And… we are done, we have everything we need to compute BC and answer the problem.

Summary and code

So here is the short answer to the problem:

Compute $\sin \beta$ using the formula given above.
Compute $\sin \alpha = (S_b * \sin \beta)/S_a$. If $\mid \sin \alpha \mid>1$, Alice cannot catch the rabbit. Otherwise, advance to step 3.
Compute $\sin \gamma$ with the formula above and the values of $\sin \alpha$ and $\sin \beta$ found in steps 1 and 2.
Compute BC using the formula given above and the values found for $\sin \alpha$ and $\sin \gamma$. If $BC>BH$, the rabbit will reach its hole before Alice can catch it. Otherwise, congratulation young girl, you will eat rabbit for dinner, here are the location and time of the fastest possible interception:

$C = B + \frac{BC}{BH}\overrightarrow{BH}$ $t_C = BC/S_B$

Below is a script implementing this technique using Python’s pylab module:

from pylab import * # imports srqt, norm, array, plot...

def interception(A, B, H, Sa, Sb):
    """ Returns ``(t_C, C)`` if A can catch B, before B 
    reaches H. Otherwise, returns ``None``. """

    AB, AH, BH = norm(A-B), norm(A-H), norm(B-H)
    sin_b = det(array((A-B,H-B))) / (AB*BH)

    sin_a = (Sb / Sa) * sin_b

    if abs(sin_a) > 1 :

        print("B moves too fast to be ever caught !")
        return None

    else:

        sin_c = ( sin_a * sqrt(1 - sin_b**2)
                  + sin_b * sqrt(1 - sin_a**2) )

        BC = AB * (sin_a / sin_c)

        if BC > BH:

            print("B reaches H before interception by A !")
            return None

        else:

            print("A intercepted B !")
            t_C = BC / Sb
            C = B + BC * (H-B)/ BH
            return t_C, C

And here it is in action:

# PARAMETERS OF THE PROBLEM
A = array(( 1.0 , 5.0 )) # Alice's initial position
B = array(( 4.0 , 1.0 )) # Rabbit's initial position
H = array(( 6.0 , 7.0 )) # The hole's coordinates
Sa = 1.1 # Alice's speed
Sb = 1.0 # Rabbit's speed

# Find the intersection
t,C = interception(A, B, H, Sa, Sb)

# Plot the results

scatter(*zip(A,B,H,C), s=100, color='r')

for label, point in zip(['A','B','H','C'], [A,B,H,C]):
    annotate( label, xy = point, xytext = (-10, 10),
              textcoords = 'offset points', fontsize = 24)

annotate("", xy=H, xytext=B, xycoords='data',
         textcoords='data',size=20,
         arrowprops=dict(arrowstyle="simple",
                         connectionstyle="arc3"))

annotate("", xy=C, xytext=A, xycoords='data',
         textcoords='data',size=20,
         arrowprops=dict(arrowstyle="simple",
                         connectionstyle="arc3"))

title("A intercepts B in C", fontsize = 24)

show()

Placing people so that everyone meets

2013-11-08T20:36:00+01:00

I will solve a stupid management problem using old mathematics and Google.

Imagine that you have N employees who work side by side in a row. For more conviviality you decide to arrange them in a different order every day, so that after some time each employee has worked besides each of the others at least once. How to do so in a minimal number of days ?

A little bit of context

This problem appeared a few weeks ago in the Reddit/Python forum, when someone posted this:

“I think it is good to shuffle the team around. (…) Here is the function that we use to randomize our team making sure that you do not sit next to someone you are already sitting next to [supposing that all are sitting in a row].”.

Stated like this, it is a very simple problem which doesn’t require a complicated algorithm, you just shuffle the order of the previous day as follows, and it will do the trick:

This shuffle can be written in one line of Python:

# This shuffling creates new neighbours
shuffle = lambda myList : myList[::2] + myList[1::2]

The problem with this shuffling, as someone on Reddit pointed out, is that even if you shuffle a great number of times there is no warranty that everyone will have worked besides everyone else in the end. For instance in the shuffling shown above employees 1 and 8 will never be neighbours. And it seems that you can imagine a shuffling as complicated as you want, there will always be a number of people for which it will fail to create all possible pairs of neighbours !

This leads us to our problem: how to ensure that all possible pairs of neighbours will be created, and in a minimum amount of time ? We will see that there is an optimal strategy. It does NOT use a shuffling, but rather a 120 years old mathematical construction.

First elements of solution

If you have N employees, then they can form N(N-1)/2 pairs. Each day you create at most (N-1) new pairs of neighbours by placing the employees on a line. Therefore you will need at least N/2 days to create all possible pairs. This means that you cannot solve the problem in less than N/2 days if N is even, and (N+1)/2 days if n is odd. What we will show is that it is actually possible to solve the problem in N/2 days (for even N) or (N+1)/2 days (for odd N).

In fact we only need to solve the problem for even N, and the solutions for odd N will follow very simply. To see this, suppose that you have an odd number N of employees. If you add one imaginary employee, you come to an even number (N+1) of employees. Suppose that you have found a solution for these (N+1) employees, which means that you have found a series of (N+1)/2 arrangements which form all pairs of neighbours. Then remove the imaginary employee from each of these arrangements. What you obtain is a series of (N+1)/2 arrangements, in which all pairs of the employees 1 to N are formed. In other words, you have solved the problem for N.

Representing the problem with a graph

This problem can be very well represented using a graph whose nodes are the employees. Each day we add an edge in the graph between each pair of employees which have been neighbours, our goal being to cover all the possible edges of the graph:

Notice how each day you actually trace a path in the graph.

Now our problem has become: given a graph of size N (even), find N/2 paths, each going through each node exactly once, such that they cover all the possible edges of the graph.

And here is a sketch of a solution that will always work:

The first path is a simple pattern 1, N, 2, N-1, etc. and the others are just rotations of the first path. The nice thing with the graph representation is that I can use a simple geometric argument to prove that these paths will cover all the edges: if we place the N nodes of the graph cyclically like in the figures above, the path number K will have edges that make an angle $2K\pi/N$ or $(2K+1)\pi/N$ with the horizontal line. So the different paths have edges of completely different angles. For this reason an edge cannot belong to more than one path. Since there are N/2 paths and each path covers N-1 different edges, the paths cover N(N-1)/2 edges in total, which is all the edges.

This construction of paths may seem simple to some of you, but I couldn’t figure it out on my own, and it is an application of a 19th century mathematical trick called the Walecki construction, which I found after some googling, as I explain in the last section.

Solution

If N is even, arrange the employees in this order the first day: 1, N, 2, (N-1), 3, (N-2), etc. From day 2 to day N/2, place the employees by taking their arrangement of the day before and replacing employee 1 by 2, 2 by 3, 3 by 4… and N by 1.

If N is odd, add an imaginary (N+1)th employee, solve the problem for the N+1 employees using the mehod above, then remove the imaginary employee from each of the arrangements obtained.

Here is the Python implementation of this solution:

def place(N):
    """
    Returns a minimal series of permutations of 1..N such
    that each number is neighbour at least once with each of
    the others.
    """

    if (N % 2):
        """
        N is odd. Solve the problem for (N+1), then remove
        element (N+1) in the result
        """
        arrangements = place(N+1)
        for arr in arrangements:
            arr.remove(N+1)
        return arrangements

    else:
        """
        N is even. Place the elements in that order:
        1, N, 2, N-1, 3, N-2, etc. then roll !
        """
        arr1 = [] # construct the first arrangement
        for i in range(N/2):
            arr1 = arr1 + [i+1, N-i]

        # construct the subsequent arrangements using p1
        return [ [((e+K-1)%N)+1 for e in arr1]
                                 for K in range(N/2)]

>>> place(12)
[[1, 12, 2, 11, 3, 10, 4, 9, 5, 8, 6, 7],
 [2, 1, 3, 12, 4, 11, 5, 10, 6, 9, 7, 8],
 [3, 2, 4, 1, 5, 12, 6, 11, 7, 10, 8, 9],
 [4, 3, 5, 2, 6, 1, 7, 12, 8, 11, 9, 10],
 [5, 4, 6, 3, 7, 2, 8, 1, 9, 12, 10, 11],
 [6, 5, 7, 4, 8, 3, 9, 2, 10, 1, 11, 12]]

[Bonus] How to be a graph theorist with Google

For the anecdote, I was not really happy when I figured out that the problem could be represented with graphs, as I really know nothing about graphs theory.

However, I thought that, as we are dealing with graphs in which everyone is connected with everyone, they must have some interesting properties. So I googled fully connected graphs, which led me to Wolfram Mathworld’s article on Complete graphs (apparently that’s their real name), where we can read on the 6th line:

“In the 1890s, Walecki showed that complete graphs Kn admit a Hamilton decomposition for odd n, and decompositions into Hamiltonian cycles plus a perfect matching for even n (Lucas 1892, Bryant 2007, Alspach 2008). Alspach et al. (1990) give a construction for Hamilton decompositions of all Kn.”

That’s not what you’d call crystal clear, but it says decomposition several times, and that sounds like what I want to do. So I looked for the last reference, Alspach 1990. Springer, the publisher, gracefully gives you access to the first two pages for free. The good news is, they contain all the properties and proofs that we need, in a compacted yet very understandable form. Let us see in details what they say.

It starts with Hamiltonian cycles. An Hamiltonian cycle is a path that starts from one node, visits every other node exactly once, and come back to initial node. The two first figures below are two Hamiltonian cycles for a graph with five nodes:

As you can see, these paths have no edge in common, but put together they cover all the edges of the complete graph. They form what is called a Hamilton decomposition of the complete graph.

Now what happens if you remove one person from the graph, say, the person at the top ? You get this:

You obtain two paths that describe a solution of our problem for N=4 employees ! And it will always work: if you can find an Hamilton decomposition of the complete graph of N+1 nodes (N being even), just removing one node will give you a decomposition into paths of the complete graph of N nodes, from which you can deduce a solution to our problem with N employees.

So now the important question is: how do we find an Hamiltonian decomposition of the complete graph of (N+1) nodes (N+1 being odd) ?

This has been answered in 1890 by Walecki with the following construction. I use the same notations as in Alspach 1990. Note that node 0 stays in place while all the other numbers rotate clockwise from one cycle to another.

There is no extensive proof in Alspach 1990 of why this covers all edges, but I guess that a geometrical proof, like the one I give in a previous section, could do the trick. Now all we have to do is to remove one node of the graph: we choose the node 0:

With just a few tweaks in the order of the nodes, we come to the solution presented in the previous section.

Delay differential equations in Python

2013-10-22T07:50:00+02:00

I wrote ddeint, a simple module/function for solving Delay Differential Equations (DDEs) in Python. It is not very fast, but very flexible, and coded in just a few lines on top of Scipy’s differential equations solver, odeint.

Say you have a delay differential equation like this:

$\begin{cases} y(t) = g(t), \,\,\,\, &\mbox{for $t<0$} \\ y'(t) = F( y, t), \,\,\,\, &\mbox{for $t \geq 0$}, \end{cases}$

where $F(y, t)$ can involve delayed values of $y$, of the form $y(t-d)$.

To solve this DDE system at points t=[t1, t2 ...] you would just write

y = ddeint(F, g, t) # returns [y(t1), y(t2) ...]

A simple example

Let us start with a DDE whose exact solution is known (it is the sine function), just to check that the algorithm works as expected:

$\begin{cases} y(t) = \sin(t), \,\,\,\, &\mbox{for $t<0$} \\ y'(t) = y(t-3\pi/2), \,\,\,\, &\mbox{for $t \geq 0$} \end{cases}$

Here is how we solve it with ddeint:

from pylab import *
from ddeint import ddeint

model = lambda Y,t : Y(t - 3*pi/2) # Model
tt = linspace(0,50,10000) # Time start, time end, nb of points/steps
g=sin # Expression of Y(t) before the integration interval
yy = ddeint(model,g,tt) # Solving

# PLOTTING
plot(tt,yy,c='r',label="$y(t)$")
plot(tt,sin(tt),c='b',label="$sin(t)$")
set_ylim(ymax=2) # make room for the legend
legend()

The resulting plot compares our solution (red) with the exact solution (blue). See how our result eventually detaches itself from the actual solution as a consequence of many successive approximations ? As DDEs tend to create chaotic behaviors, you can expect the error to explode very fast. As I am no DDE expert, I would recommend checking for convergence in all cases, i.e. increasing the time resolution and see how it affects the result. Keep in mind that the past values of Y(t) are computed by interpolating the values of Y found at the previous integration points, so the more points you ask for, the more precise your result.

An example with parameters

You can set the parameters of your model at integration time, like in Scipy’s ODE and odeint. As an example, imagine a chemical product with degradation rate $r$, and whose production rate is negatively linked to the quantity of this same product at the time $(t-d)$:

$\begin{cases} y(t) = 0, \,\,\,\, &\mbox{for $t<0$} \\ y'(t) = \dfrac{1}{1+(\dfrac{y(t-d)}{K})^2} -ry(t), \,\,\,\, &\mbox{ for $t\geq 0$}. \end{cases}$

We have three parameters that we can choose freely. For $K = 0.1$, $d = 5$, $r = 1$, we obtain oscillations !

# MODEL, WITH UNKNOWN PARAMETERS
model = lambda Y,t, k,d,r :  1/(1+(Y(t-d)/k)**2) - r*Y(t)
g = lambda t:0 # history before t=0

tt = linspace(0,50,10000)
yy = ddeint(model, g, tt, fargs=(0.1, 5, 1)) # K=0.1, d=5, r=1

plot(tt,yy,lw=2)

Example with several variables

The variable Y can be a vector, which means that you can solve DDE systems of several variables. Here is a version of the famous Lotka-Volterra two-variables system, where we introduce a delay $d$. For $d=0$ the system is a classical Lotka-Volterra system ; for $d\neq 0$ the system undergoes an important amplification:

$\begin{cases} \big(x(t), y(t)\big) = (1,2) \,\,\,\, &\mbox{for $t<0$}, \\ x'(t) = 0.5x(t)\big(1-y(t-d)\big) \,\,\,\, &\mbox{for $t\geq 0$}, \\ y'(t) = -0.5y(t)\big(1-x(t-d)\big) \,\,\,\, &\mbox{for $\geq 0$}. \end{cases}$

def model(Y,t,d):
    x,y = Y(t)
    xd,yd = Y(t-d)
    return array([0.5*x*(1-yd), -0.5*y*(1-xd)])

g = lambda t : array([1,2])
tt = linspace(2,30,20000)

for d in [0, 0.2]:
    yy = ddeint(model,g,tt,fargs=(d,))
    # WE PLOT X AGAINST Y
    plot(yy[:,0], yy[:,1], lw=2, label='delay = %.01f'%d)

legend() # display the legend

Example with a non-constant delay

In this last example the delay depends on the value of $y(t)$ :

$\begin{cases} y(t) = 1, \,\,\,\, &\mbox{for $t<0$}, \\ y'(t) = - y\big(t-3\cos(y(t))^2 \big),\,\,\,\, &\mbox{for $t \geq 0$}. \end{cases}$

model = lambda Y,t:  -Y( t-3*cos( Y(t) )**2 )
tt = linspace(0, 30, 2000)
yy = ddeint(model, lambda t:1, tt)

plot(tt, yy)

Read and write audio files in Python using FFMPEG

2013-10-04T21:28:00+02:00

This article shows how easy it is to read or write audio files in a few lines Python, by calling the external software FFMPEG through pipes. If you want a battle-tested and more sophisticated version, check out my module MoviePy. Check also that other article for the same with video files.

Before we start, you must have FFMPEG installed on your computer and you must know the name (or path) of the FFMPEG binary on your computer. It should be one of the following:

FFMPEG_BIN = "ffmpeg" # on Linux
FFMPEG_BIN = "ffmpeg.exe" # on Windows

Reading

To read the audio file “mySong.mp3” we first ask FFMPEG to open this file and to direct its output to Python:

import subprocess as sp

command = [ FFMPEG_BIN,
        '-i', 'mySong.mp3',
        '-f', 's16le',
        '-acodec', 'pcm_s16le',
        '-ar', '44100', # ouput will have 44100 Hz
        '-ac', '2', # stereo (set to '1' for mono)
        '-']
pipe = sp.Popen(command, stdout=sp.PIPE, bufsize=10**8)

In the code above -i mySong.mp3 indicates the input file, while s16le/pcm_s16le asks for a raw 16-bit sound output. The - at the end tells FFMPEG that it is being used with a pipe by another program. In sp.Popen, the bufsize parameter must be bigger than the biggest chunk of data that you will want to read (see below). It can be omitted most of the time in Python 2 but not in Python 3 where its default value is pretty small.

Now you just have to read the output of FFMPEG. In our case we have two channels (stereo sound) so one frame of out output will be represented by a pair of integers, each coded on 16 bits (2 bytes). Therefore one frame will be 4-bytes long. To read 88200 audio frames (2 seconds of sound in our case) we will write:

raw_audio = pipe.proc.stdout.read(88200*4)

# Reorganize raw_audio as a Numpy array with two-columns (1 per channel)
import numpy

audio_array = numpy.fromstring(raw_audio, dtype="int16")
audio_array = audio_array.reshape((len(audio_array)/2,2))

You can now play this sound using for instance Pygame’s sound mixer:

import pygame
pygame.init()
pygame.mixer.init(44100, -16, 2) # 44100 Hz, 16bit, 2 channels
sound = pygame.sndarray.make_sound( audio_array )
sound.play()

Finally, you can get informations on a file (audio format, frequency, etc.) by calling

pipe = sp.Popen([FFMPEG_BINARY,"-i", 'mySong.mp3', "-"],
                stdin=sp.PIPE, stdout=sp.PIPE,  stderr=sp.PIPE)
pipe.stdout.readline()
pipe.terminate()
infos = proc.stderr.read()

Now infos contains a text describing the file, that you would need to parse to obtain the relevant informations. See section Going Further below for a link to an implementation.

Writing

To write an audio file we open FFMPEG and specify that the input will be piped and that it will consist in raw audio data:

pipe = sp.Popen([ FFMPEG_BIN,
       '-y', # (optional) means overwrite the output file if it already exists.
       "-f", 's16le', # means 16bit input
       "-acodec", "pcm_s16le", # means raw 16bit input
       '-r', "44100", # the input will have 44100 Hz
       '-ac','2', # the input will have 2 channels (stereo)
       '-i', '-', # means that the input will arrive from the pipe
       '-vn', # means "don't expect any video input"
       '-acodec', "libfdk_aac" # output audio codec
       '-b', "3000k", # output bitrate (=quality). Here, 3000kb/second
       'my_awesome_output_audio_file.mp3'],
        stdin=sp.PIPE,stdout=sp.PIPE, stderr=sp.PIPE)

The codec can be any valid FFMPEG audio codec. For some codecs providing the output bitrate is optional. Now you just have to write raw audio data into the file. For instance, if your sound is represented have a Nx2 Numpy array of integers, you will just write

audio_array.astype("int16").tofile(self.proc.stdin)

Going further

I tried to keep the code as simple as possible here. With a few more lines you can make useful classes to manipulate video files, like FFMPEG_AudioReader and FFMPEG_AudioWriter that I wrote for my video editing software. In these files in particular how to parse the information on the video, how to save/load pictures using FFMPEG, etc.

Read and write video frames in Python using FFMPEG

2013-09-27T23:53:00+02:00

This article shows how easy it is to read or write video frames with a few lines of Python, by calling the external software FFMPEG through pipes. If you want a battle-tested and more sophisticated version, check out my module MoviePy. See also this other article for the same with audio files.

Before we start, you must have FFMPEG installed on your computer and you must know the name (or path) of the FFMPEG binary. It should be one of the following:

FFMPEG_BIN = "ffmpeg" # on Linux ans Mac OS
FFMPEG_BIN = "ffmpeg.exe" # on Windows

Reading

To read the frames of the video “myHolidays.mp4” we first ask FFMPEG to open this file and to direct its output to Python:

import subprocess as sp
command = [ FFMPEG_BIN,
            '-i', 'myHolidays.mp4',
            '-f', 'image2pipe',
            '-pix_fmt', 'rgb24',
            '-vcodec', 'rawvideo', '-']
pipe = sp.Popen(command, stdout = sp.PIPE, bufsize=10**8)

In the code above -i myHolidays.mp4 indicates the input file, while rawvideo/rgb24 asks for a raw RGB output. The format image2pipe and the - at the end tell FFMPEG that it is being used with a pipe by another program. In sp.Popen, the bufsize parameter must be bigger than the size of one frame (see below). It can be omitted most of the time in Python 2 but not in Python 3 where its default value is pretty small.

Now we just have to read the output of FFMPEG. If the video has a size of 420x320 pixels, then the first 420x360x3 bytes outputed by FFMPEG will give the RGB values of the pixels of the first frame, line by line, top to bottom. The next 420x360x3 bytes afer that will represent the second frame, etc. In the next lines we extract one frame and reshape it as a 420x360x3 Numpy array:

import numpy
# read 420*360*3 bytes (= 1 frame)
raw_image = pipe.stdout.read(420*360*3)
# transform the byte read into a numpy array
image =  numpy.fromstring(raw_image, dtype='uint8')
image = image.reshape((360,420,3))
# throw away the data in the pipe's buffer.
pipe.stdout.flush()

You can now view the image with for instance Pylab’s imshow( image ). By repeating the two lines above you can read all the frames of the video one after the other. Reading one frame with this method takes 2 milliseconds on my computer.

What if you want to read the frame that is at time 01h00 in the video ? You could do as above: open the pipe, and read all the frames of the video one by one until you reach that corresponding to t=01h00. But this may be VERY long. A better solution is to call FFMPEG with arguments telling it to start reading “myHolidays.mp4” at time 01h00:

command = [FFMPEG_BIN,
            '-ss', '00:59;59',
            '-i', 'myHolidays.mp4',
            '-ss', '1',
            '-f', 'image2pipe',
            '-pix_fmt', 'rgb24',
            '-vcodec','rawvideo', '-']
pipe = sp.Popen(command, stdout=sp.PIPE, bufsize=10**8)

In the code above we ask FFMPEG to quickly (and imprecisely) reach 00:59:59, then to skip 1 second of movie with precision (-ss 1), so that it will effectively start at 01:00:00 sharp (see this page for more infos).Then you can start reading frames as previously shown. Seeking a frame with this method takes at most 0.1 second on my computer.

You can also get informations on a file (frames size, number of frames per second, etc.) by calling

command = [FFMPEG_BINARY,'-i', 'my_video.mp4', '-']
pipe = sp.Popen(command, stdout=sp.PIPE stderr=sp.PIPE)
pipe.stdout.readline()
pipe.terminate()
infos = proc.stderr.read()

Now infos contains a text describing the file, that you would need to parse to obtain the relevant informations. See the last section for a link to an implementation.

Writing

To write a series of frames of size 460x360 into the file 'my_output_videofile.mp4', we open FFMPEG and indicate that raw RGB data is going to be piped in:

command = [ FFMPEG_BIN,
        '-y', # (optional) overwrite output file if it exists
        '-f', 'rawvideo',
        '-vcodec','rawvideo',
        '-s', '420x360', # size of one frame
        '-pix_fmt', 'rgb24',
        '-r', '24', # frames per second
        '-i', '-', # The imput comes from a pipe
        '-an', # Tells FFMPEG not to expect any audio
        '-vcodec', 'mpeg'",
        'my_output_videofile.mp4' ]

pipe = sp.Popen( command, stdin=sp.PIPE, stderr=sp.PIPE)

The codec of the output video can be any valid FFMPEG codec but for many codecs you will need to provide the bitrate as an additional argument (for instance -bitrate 3000k). Now we can write raw frames one after another in the file. These will be raw frames, like the ones outputed by FFMPEG in the previous section: they should be strings of the form “RGBRGBRGB…” where R,G,B are caracters that represent a number between 0 and 255. If our frame is represented as a Numpy array, we simply write:

pipe.proc.stdin.write( image_array.tostring() )

Going further

I tried to keep the code as simple as possible here. With a few more lines you can make useful classes to manipulate video files, like FFMPEG_VideoReader and FFMPEG_VideoWriter that I wrote for my video editing software. In these files in particular how to parse the information on the video, how to save/load pictures using FFMPEG, etc.

A basic example of threads synchronization in Python

2013-09-19T22:58:00+02:00

We will see how to use threading Events to have functions in different Python threads start at the same time.

I recently coded a method to view movies in Python : it plays the video, and in the same time, in a parralel thread, it renders the audio. The difficult part is that the audio and video should be exactly synchronized. The pseudo-code looks like this:

def view(movie):

    new_thread( play_audio( movie ) )
    play_video( movie )

In this code, play_audio() and play_video() will start at approximately the same time and will run parallely, but these functions need some preparation before actually starting playing stuff. Their code looks like that:

def play_audio(movie):

    audio = prepare_audio( movie )
    audio.start_playing()


def play_video(movie):

    video = prepare_video( movie )
    video.start_playing()

To have a well-synchronized movie we need the internal functions audio.start_playing() and video.start_playing(), which are run in two separate threads, to start at exactly the same time. How do we do that ?

The solution seems to be using threading.Event objects. An Event is an object that can be accessed from all the threads and allows very basic communication between them : each thread can set or unset an Event, or check whether this event has already been been set (by another thread).

For our problem we will use two events video_ready and audio_ready which will enable our two threads to scream at each other “I am ready ! Are you ?”. Here is the Python fot that:

import threading

def play_audio(movie, audio_ready, video_ready):

    audio = prepare_audio( movie )

    audio_ready.set() # Say "I'm ready" to play_video()
    video_ready.wait() # Wait for play_video() to say "I'm ready"

    audio.start_playing()

def play_video(movie, audio_ready, video_ready):

    video = prepare_video( movie )

    video_ready.set() # Say "I'm ready" to play_audio()
    audio_ready.wait()  # Wait for play_audio() to say "I'm ready"

    video.start_playing()

and finally the code for view(movie):

def view(movie):

    audio_ready = threading.Event()
    video_ready = threading.Event()


    # launch the parrallel audio thread
    audiothread = threading.Thread(target=play_audio,
                              args = (movie, audio_ready, video_ready))
    audiothread.start()

    play_video(movie, audio_ready, video_ready)

A few tips tips to go further:

Here I am using the module threading, and the two threads will be played in parrallel on the same processor. If you have a computer with several processors you can also use the multiprocessing module to have your threads played on two different processors (which can be MUCH faster). Nicely enough the two modules have the same syntax: simply replace threading by multiprocessing and Thread by Process in the example above and it should work.
In my original program, I also use an Event to terminate play_video and play_audio at the same time: when the video playing is exited, play_video unsets that Event. In play_audio, this event is regularly checked, and when it is seen to be unset, play_audio exits too.
Instead of using wait to wait for an Event to be set, you can use a loop to you decide at which frequency you want to check the Event. Only do that if don’t mind a lag of a few milliseconds between your processes :

import time
while not audio_ready.is_set():
    time.sleep(0.002) # sleep 2 milliseconds

__del__( self )

An algorithm to extract looping GIFs from videos

When is a video segment well-looping ?

Finding well-looping segments

Selecting interesting segments

Examples of use

Your turn !

Data animations with Python and MoviePy

Animations with Mayavi

Animations with Vispy

Animations with Matplotlib

Animations with Numpy

Putting animations together

One library to animate them all ?

Things you can do with Python and POV-Ray

Example 1: Basic animation with post-processing

Example 2: Embedding a video in a 3D scene

Example 3: A more complex scene

Example 4: Rendering a Physics simulation

Example 5: The ghost of J.Lawrence Cook

Final words

Vector animations with Python

Gizeh and Moviepy

Example 1

Example 2

Example 3

Example 4

Example 5

Example 6

Example 7

Example 8

Example 9

Example 10

Mixing videos and vector graphics

Your turn now !

A Python script controlled via Twitter

Introducing Twittcher

Automatic Soccer Highlights Compilations with Python

Results

Some more Videogreping with Python

Getting started

A simple videogreper

Greping whole sentences

Greping single words

A covers mix with Python

Viennese Mazes: what they are, and how to make one

Viennese mazes are (a special kind of) normal mazes

What makes a good labyrinth ?

Let’s grow mazes !

Try it at home

Python, Pitch shifting, and the Pianoputer

Sound stretching

Pitch shifting

Application: the Pianoputer

What next ?

Transcribing Piano Rolls, the Pythonic Way

Downloading the video

Step 1: Segmentation of the roll

Step 2: Finding the pitch

Step 3: Quantization of the notes

Step 4: Export to sheet music with Lilypond

Final result

A final word on piano rolls transcription

Making GIFs from Video Files with Python

Converting a video excerpt into a GIF

Cropping the image

Freezing a region

Freezing a more complicated region

Time-symetrization

Adding some text

Making the gif loopable

Another example of a GIF made loopable

Big finish: background removal

Interception of a linear trajectory with constant speed

Problem

Solution

Summary and code

Placing people so that everyone meets

A little bit of context

First elements of solution

del( self )