Usage - gskoljarev/bloshi GitHub Wiki

Introduction

bloshi uses two components:

  • scraper: component for parsing data
  • bloshi: component for storing, viewing and working with data

scraper uses so-called spiders to parse data.

bloshi stores spider data and data parsed by them.

General workflow

Add shop data:

  • Add availabilities
  • Add categories
  • Create a shop
  • Add shop availabilities: connect to availibilities; add keyword identifier in parsed data
  • Add shop categories: connect to categories; add URL from which to parse data
  • Add spider: connect to shop; add XPaths for next page, selector, and fields;

Add spider data:

  • Connect to shop
  • Add initial request URL: for example, for switching to list or grid view
  • Add XPaths for next page & selector
  • Add field data: input, output processors, type, Xpath
  • Check parse detailed info if needed; if enabled, spider will follow item URL and parse fields with 'detail' type; note that this slows down parsing of data

Parse data

workon bloshiproject
cd scraper

Parse & save data for shop with code VMN

scrapy crawl spider -a save=1 -a shop=VMN 

Parse & save data for shop with code AHL and category with code PC

scrapy crawl spider -a category=PC -a save=1 -a shop=AHL 

Parse data for shop with code AHL (for testing only, parsing without saving)

scrapy crawl spider -a category=PC -a save=0 -a save_temp=0 -a shop=AHL

View data

workon bloshiproject
cd bloshi
python manage.py runserver 0.0.0.0:8888

View interface at: http://localhost:8888/admin/