Deploying the Product Scraper - welikepie/asos-hangout-app GitHub Wiki
Note: As the scraper is manually accessing and retrieving information from the ASOS website, changes in the backend of the ASOS website which lead to the content having different identifying attributes will cause the scraper to either break entirely, or return less consistently correct information.
The product scraper needs to be deployed as it is what is fetching product information from the ASOS website, for use in the admin panel and product display of the Hangout Applications. To work with the scraper you must first access the utilScraper folder within the asos-hangout-app repo that you should have already downloaded.
We can set how often the scraper repeats; it is best to set this as to repeat longer than the scrape actually takes to complete, so setting it to repeat every 12 hours is recommended. We can also set the threshold of products that the products have. This is the time after which items are dropped from the database; for example, we could set it to drop products after they’ve been in the database for 7 days. For both “repeat” and “threshold”, they are set as comma separated days, hours, mins, seconds and milliseconds.
We also need to set the following in the config.json file:
errorEmail -> True/False as to whether an email is sent to you if the scraper errors.
email -> The email address where the error email would be sent
errorThreshold -> the amount of errors the scraper needs to reach before and email is sent.
db ->
host -> the URL your database is at. This should include the port to access the database at the end of the URL. The default for mysql is port 3306.
user -> Username used to access the database
password -> Password used to access the database
dbName -> if you have run the .sql script, it should be “**asos**”
tName -> if you have run the .sql script, it should be “**fashionItemsUK**”
Once you have edited the config file and saved it. You can now upload the contents of the utilScraper folder to the same folder on the server you’ve installed the node modules to.
Now you need to SSH to the server so you can execute the following command line instructions. To start the scraper, navigate to the scraper’s folder and type in the following command and hit enter:
forever start scrape.js
If any errors occur in which a node module could not be found, install them using the “npm install” command.
To stop the scraper you can type the following command and hit enter:
forever stop scrape.js
To test functionality, run “node scrape.js” to see all of the logs generated by the program printed to the command line. Ctrl+C stops process. It will be best to wait until the first batch of results has been written to the database, as this way you know the database credentials are correct.