Saturday, December 06, 2008

rockstar




I recently had a situation where I needed to screenscrape data from a website. I did it in Ruby with wget, which worked super well.

wget with --post-data rocks


wget http://website.com --post-data="thingy=#{id}"


Turned out that just doing this didn't work, it had some kind of validation code that it was POSTing as well. To figure this out I used the excellent Tamper Data plugin for Firefox, which let me see exactly what data was being POSTed to the website.

It was then just a matter of using Ruby to scan through the page to get the data I wanted and output this as a .csv file.