To do - karantyagi/hackathons-near-me GitHub Wiki
Next Up
Website v/s webapp(hosted on HEROKU) - RESTful, Django python website v/s RESTful Flask API - Python
0.0) To mine extra info about all hackathon main pages: use tags in head :
google:
Which tags hep in SEO
html5 documentation head tags convention
html5 head tag coding standard
0.1) Make a log file to store the date and time when the crawler /scraper was last run
-
use cronjob, Linux, daemon, scrap once every 5th day or whatever time suits me better
-
Saving a text file version everytime the crawler runs. Like V2018_02_12.txt for 12th Feb. Implement version, log files of DB to track changes etc... implement this concept at a basic level.
-
Batch file, config file, etc - Goal: is to run the scrawl.py file once per day after laptop opens or resumes from sleep etc [executing the script once per day remotely]
-
At last you need to implement repeated crawling (IR concept) - refreshing database after every 24 hours when you host your project on the cloud. Script will be residing on cloud then.