Scraping - robertervin/amazon-project GitHub Wiki
Notes:
If you are not running this on a debian-based machine, change the PROJECT_PATH variable in amazon-project/amazon-scraper/amazon-scraper/settings.py to where your project is stored.
Testing
To test if the parser works correctly, navigate to amazon-project/amazon_scraper and run scrapy crawl amazon.com. You should use raw_input() to test if the model dictionaries are scraping the correct data. Also, traceback.print_exc() is a useful function for debugging exceptions.
If you want to test a particular search page, paste the page's url into the start_urls list and comment out all other urls in that list.
Production
Navigate to amazon-project/query_titles/views.py and in the scrape() function at the bottom you should uncomment the commented code lines, and indent all spider configurations to be inside of setup_crawler(). This will split the urls into chunks of 4 and spin up 4 amazon.com spiders.