Tuesday, December 11, 2012

How to process a million songs in 20 minutes « Music Machinery

How to process a million songs in 20 minutes « Music Machinery: "The recently released Million Song Dataset (MSD), a  collaborative project between The Echo Nest and Columbia’s LabROSA is a fantastic resource for music researchers. It contains detailed acoustic and contextual data for a million songs. However, getting started with the dataset can be a bit daunting. First of all, the dataset is huge (around 300 gb) which is more than most people want to download.  Second, it is such a big dataset that processing it in a traditional fashion, one track at a time, is going to take a long time.  Even if you can process a track in 100 milliseconds, it is still going to take over a day to process all of the tracks in the dataset.  Luckily there are some techniques such as Map/Reduce that make processing big data scalable over multiple CPUs.  In this post I shall describe how we can use Amazon’s Elastic Map Reduce to easily process the million song dataset."

'via Blog this'