Home - nrbrase/WebScraper GitHub Wiki

Welcome to the WebScraper Wiki!

We are building a web scraper that collects data and stores the information in a text file. Users will be able to go to a website and scrape information off of it.

1. Project standards: MVP

Using the java library jsoup, we will scrape the web of a pharmacy company (CVS) throughout one state to retrieve information about locations, store numbers, and phone numbers. This information will be used for updating databases as needed.

1.1 Feature 1: Parallel running

Given the desired company (CVS) we plan to integrate the data scraping to run in parallel. This will speed up the run time and produce the results in a usable time frame.

1.2 Feature 2: All Locations

Using one pharmacy (CVS) we will scrape every state and gather all of the location information: Location, store numbers, and phone numbers.

1.3 Feature 3: Progress bar

Given the state we will show a loading progress bar as each location is printed. This will be indeterminate progress bar to show that it is scraping.

Roles (Subject to change at companies/groups discretion):

Kim: Scraping, GUI, youtube uploads

Scrape CVS for the store information given
Setup basic GUI
Youtube uploads for each section of the project

Nick: Scraping, parallel running, dropdown menu

Scrape CVS for the store information given
Run the scrape in parallel for faster usable runtime
Create a dropdown menu to select the state of choice

Stephanie: Create a landing page, progress bar in GUI, general documentation

Landing page creation via requirements on course website
Add indeterminate progress bar to main GUI for users to know that it is making progress
Update general documentation, readme, how to's