Getting Started - GonzaloUlla/unlp-dbd-newsler GitHub Wiki

Welcome to the Newsler wiki!

Getting Started

  • Install Python 3 (you can get it here: https://www.python.org/downloads/release/python-377)
  • Check your environment variables and make sure the Python install dir (where python.exe lives) and Scripts dir are both in your path
  • Download the project
  • Open a PowerShell/CMD window in the project root
  • Install Scrapy and other requirements using pip: python pip install -r requirements.txt
  • Create the Newsler scrappy project: python scrapy startproject newsler
  • Copy the spiders from the scrapy_prototype folder to the spiders folder in Newsler project: cp .\scrapy_prototype\* .\newsler\newsler\spiders
  • Run the Newsler spiders: python3 -m scrapy crawl TheGuardian Also try: CNN, AlJazeera, DW, FoxNews