Popular services like Shazam and Midomi have made music identification easy, provided that you're trying to find a song in their database. But what happens when the song you're searching for isn't in their database--when you know for certain that it won't be in their database?
Usually you're just SOL and hope that some hero comments on the video you just watched with something other than "Song Name: Darude - Sandstorm". In 2005-2006, I stumbled upon a set of videos produced by Tom's Hardware. As a newbie-PC builder at the time I was interested by their content; who doesn't want to see a Pentium 4 overclocked to 5GHz under LN2, a PC cooled with vegetable oil, or what happens (before throttling and shutdown became ubiquitous) when you removed the heatsink on a running processor.
I was also entranced by the music used in these videos. I had never heard anything quite like it as someone who only experienced the days of the Amiga--I only knew the world of sampled mp3s and wavs. The closest thing I had ever heard to a music module was a midi played back using whatever terrible sample library shipped with Windows XP by default.If you're looking for the soundtracks for these two videos, they can be found here: Anvil - Path to Nowhere Karsen - Aryx Acecream - Love Me! Blue Ion - Fifth Project X
As the soundtracks were never credited in the videos (I wasn't even on YouTube at the time to even begin checking the comments there), I gave up hope of tracking down the music.
A few years later, after I had discovered the genre of "keygen music," a term that is itself really just a bootleg for module or tracker music, I downloaded a "keygen music pack" that contained the song Reed - Dansze Mucyka. I recognized it immediately as the source for this Tom's Hardware video. I now knew that this type of music existed originally in the form of module formats, but I was stuck with the misconception that "keygen" music packs were the primary source of tracker music. The fact that "keygen" music often came mislabeled (labeled based on the keygen they were ripped from and not the actual composer) didn't help and I didn't find of the other tracks.
Fast forward another few years, and on a whim I decided to visit those old Tom's Hardware videos on YouTube just for fun. This time, however, most of the videos had comments detailing where the modules used for the soundtracks came from, and some quick googling revealed the modarchive to be the definitive source for modules. Eventually, I discovered that the vast majority of songs used in Tom's Hardware videos were shamelessly ripped from the list of featured modules on the modarchive, with the time the module was featured roughly corresponding to the release date of the videos. This list should have been the last piece in the puzzle to getting the soundtracks to all of the videos, but unfortunately there was a big black hole circa 2004-2005 where the featured module list was not updated. Soundtracks for videos such as the Intel Pentium 560 Heat Problems would remain unidentified.
On the surface, simply knowing that a song came from a tracker music file doesn't seem to be a whole lot of useful information. I didn't think it told me anything obvious other than that a Shazam or Midomi search wouldn't work. However, if you look a little deeper, a few factors come together that make a ridiculous project like Molasses possible. They are:
Molasses was born out of these possibilities, but the ease of availability of projects such as dejavu were what really made things doable. It turns out that storing the fingerprints of 120,000 songs in a mysql database (as dejavu is implemented) isn't exactly feasible for a single user (estimated 500GiB to 1TiB needed), especially if I only intend on performing a few searches. Instead of fingerprinting the target song and matching it against an index of pre-fingerprinted songs, molasses fingerprints everything on the fly to save disk space.
The FFT-based fingerprinting algorithm used in dejavu works well with tracker music, so long as the modules are converted to a sampled format beforehand. This is "easily" done with gstreamer-based tools such as soundconverter. Ostensibly, all that needed to be done was convert 120,000 modules to wav via soundconverter, fingerpint them, and finally match them.
Unfortunately, some of the songs in the mod archive seem to induce nasty and unrecoverable bugs in soundconverter. Most of them seem to be reasonably begign, causing soundconverter to hang and produce a 0-byte wav file. I just hoped that the song I was looking for didn't happen to be one of these "buggy" modules. However, sometimes, a buggy module produces the opposite behavior, where soundconverter will just continue writing garbage to a wav file, sometimes in excess of 10GiB. With 120,000 total modules, garbage wav files will eat a hard disk quickly if unchecked. In the end, I "remedied" these problems with a simple timeout counter for the soundconverter subprocess and by deleting wav files once they are fingerprinted.
As you can imagine, trying to match a song against 120,000 candidates takes a long time, and this is where Molasses gets its name. However, it doesn't take that long, and modules can be fingerprinted at roughly 10X real-time speed on a fairly modern multicore CPU (molasses uses the multiprocess module for parallel processing). All in all, it takes on the order of a few days to fingerprint the entire 2007 snapshot.
Even if a few days sounds like a long time, it's nothing compared to the first implementation of Molasses, which is on the order of 40X slower than the version that was eventually used to search the mod archive.
The first 10X performance improvement was reached by using a non-totally naive data-structures for matching fingerprints of songs. Each song produces hundreds of thousands of fingerprint-time pairs, which are then compared with other songs. Comparing two songs (a, b) involves checking if song b has a matching fingerprint with song a for every fingerprint in song a, and storing the offset between the fingerprint times for each song if there is a match. Naively, if fingerprint-time pairs are store in flat lists or arrays, comparing songs is an O(n²) operation. Storing things as maps from fingerprints->lists of times changes this to an "O(1)."
The second 4X* performance improvement was reached by using parallel processing via the multiprocessing module. Making things multiprocessing-friendly was relatively straightforward, all that had to be done was a quick hack to pack the state of an object so that it could be handled by a function acceptable by Pool.map().*depends on how many cores/threads your hardware supports
Eventually, I found the song.