References - UPOLSearch/UPOL-Search-Engine GitHub Wiki
Used technology
- BeautifulSoup (parsing HTML)
- Requests (HTTP requests)
- Celery
- RabbitMQ
Web
- http://blog.mischel.com/2011/12/13/writing-a-web-crawler-introduction/ (this guy.. such useful)
- http://blog.mischel.com/2008/05/06/more-on-robots-exclusion/
- http://blog.mischel.com/2008/05/05/struggling-with-the-robots-exclusion-standard/
- https://blog.scrapinghub.com/2016/08/25/how-to-crawl-the-web-politely-with-scrapy/