Home - nrbrase/WebScraper GitHub Wiki

Welcome to the WebScraper Wiki!

We are building a web scraper that collects data and stores the information in a text file. Users will be able to go to a website and scrape information off of it.

1. Project standards: MVP

Using the java library jsoup, we will scrape the web of a pharmacy company (CVS) throughout one state to retrieve information about locations, store numbers, and phone numbers. This information will be used for updating databases as needed.

1.1 Feature 1: Parallel running

Given the desired company (CVS) we plan to integrate the data scraping to run in parallel. This will speed up the run time and produce the results in a usable time frame.

1.2 Feature 2: All Locations

Using one pharmacy (CVS) we will scrape every state and gather all of the location information: Location, store numbers, and phone numbers.

1.3 Feature 3: Progress bar

Given the state we will show a loading progress bar as each location is printed. This will be indeterminate progress bar to show that it is scraping.

Roles (Subject to change at companies/groups discretion):

Kim: Scraping, GUI, youtube uploads

  1. Scrape CVS for the store information given
  2. Setup basic GUI
  3. Youtube uploads for each section of the project

Nick: Scraping, parallel running, dropdown menu

  1. Scrape CVS for the store information given
  2. Run the scrape in parallel for faster usable runtime
  3. Create a dropdown menu to select the state of choice

Stephanie: Create a landing page, progress bar in GUI, general documentation

  1. Landing page creation via requirements on course website
  2. Add indeterminate progress bar to main GUI for users to know that it is making progress
  3. Update general documentation, readme, how to's