In this post I give a minimalistic version of Sam Lavigne’s Videogrep and use it to promote world peace.
This week Sam Lavigne wrote a very entertaining blog post introducing Videogrep, a Python script that searches through dialog in videos (using the associated subtitles file), selects scenes (for instance all scenes containing a given word), and cuts together a new video.
The script on Github implements many tweaks and goodies (such as working on multiple files, identifying complex patterns, etc.). In this post I present the code for a minimal videogreper in Python and attempt to refine cuts to get scenes containing whole sentences or single words.
A good place to find public domain videos with subtitles is the White House channel on Youtube. In what follows I will be working on the 2012 State Of The Union Address:
To get both the video and the subtitles you can use youtube-dl in a terminal:
This downloads a video file
state.mp4 and a text file
state.en.srt indicating the subtitles as follows:
1 2 3 4 5 6 7 8
This file can be easily parsed in Python to get a list of elements of the form
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
1 2 3
A simple videogreper
Let us have a look at the most common words in the speech:
1 2 3 4 5
1 2 3
Seems like the word “should” has been pronounced a lot. Let us find the times of all the subtitle blocks in which it appears:
Now we cut and put together all these scenes using MoviePy:
1 2 3 4 5 6 7 8 9 10 11
Here is the result:
It is promising, but in some scenes we don’t get to know exactly what should be done, which is frustrating. In the next section we add a little content-awareness to get more relevant cuts.
Greping whole sentences
We now want to cut together all the sentences containing the word “should”. We first explore the whole text looking for sentences containing that word, then we find the subtitle blocks corresponding to the start and end of each sentence, and we cut the video file accordingly.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
It’s much better:
Note that with just a little more code you could achieve much more. In Videogrep the author uses the Python package
pattern to look for advanced phrase constructions, such as all phrases of the form gerund-determiner-adjective-noun.
Greping single words
Let us take a step in the other direction and see if it is possible to automatically cut a scene with exactly one word or expression, and as little possible of the words around it. Consider the following subtitle block:
We can roughly evaluate that the word “We” will be pronounced in the first quarter of the time span (from 3:20.1 to 3:20.85), “can” in the second quarter (from 3:20.85 to 3:21.6), etc. Following this reasoning, here is a function that finds the exact times using the relative position of the characters in the subtitles blocks:
1 2 3 4 5 6 7 8
Let us try it on “Americans”:
At least some of the cuts worked properly. If we use much-pronounced words we may find at least one correct cut for each of them and we can build a whole sentence:
1 2 3 4 5 6 7
Wow ! That seemed so real, and it almost made sense. From there the cuts could be refined by hand, but the script did most of the work and surely deserved, if not a Nobel Peace Prize, fourteen minutes of applause:
1 2 3