Facts, hacks and attacks from my life as a web application developer
Nice! I was just busy this week using BeautifulSoup and mechanize to screen-scrape some supermarket sites - will publish a post on it soon. I definitely like the idea of using screen-scraping to back up data that's held on otherwise-closed systems... am lookin' at facebook here...
Facebook actually has a decent API. I wish Amazon had one for this information. Screen scraping always makes me feel dirty ;)
There is a very small off by one bug.--- amazon_order.py.old 2011-02-24 16:18:03.000000000 +0000+++ amazon_order.py 2011-02-24 16:18:13.000000000 +0000@@ -55,7 +55,7 @@ if __name__ == '__main__': sys.exit(1) orders = - for year in range(int(options.firstyear), datetime.datetime.now().year):+ for year in range(int(options.firstyear), datetime.datetime.now().year+1): orders_html = br.open("https://www.amazon.com/gp/css/history/orders/view.html?orderFilter=year-%s&startAtIndex=1000" % year) new_orders = _parse_orders(orders_html.read()) if new_orders:
Amazon recently refreshed their site layout, which makes this script break, but an easy fix is changing the user agent to one for Opera, which they're still serving the old layout to.
I was wondering how I can use this to create a sales report with my amazon account. Thanks.