Shadow Seals Crawler - ActoKids/web-crawler GitHub Wiki

The SSCrawler opens the calendar from a website http://www.shadowsealsswimming.org/Calendar.html and crawls the events' data from the table on this website and generates a json file with the results.

Dependencies -

Python - https://www.python.org/downloads/
Pip - https://pip.pypa.io/en/stable/installing/
Beautiful Soup 4 - pip install beautifulsoup4
dateutil - pip install python-dateutil
datetime - pip install DateTime

How it works

Opens the website http://www.shadowsealsswimming.org/Calendar.html and parses all the events from the table. Produce a list of events' data and outout json file. Displays timestamp of start time and end time, logs connecting to link successful or failed, links of events found.

Running the crawler

Make sure ALL dependencies are installed before running AND that you have ran SSCrawler.py first. Once done simply run the SSCrawler.py script and wait until it finishes creating the json.

Example of output

Following is the JSON generated -
SSCrawler

activity_type
String
:
Contact organizer for details


approver
String
:
N/A


contact_email
String
:
Contact organizer for details


contact_name
String
:
Contact organizer for details


contact_phone
String
:
Contact organizer for details


cost
String
:
Contact organizer for details


created_timestamp
String
:
2019-03-16 03:32:28.601277


description
String
:
Practice - Shoreline Pool


disability_types
String
:
Contact organizer for details


end_date_time
String
:
Contact organizer for details


event_id
String
:
43a5efa9-2fbd-514e-be09-55f9489071e3


event_link
String
:
http://www.shadowsealsswimming.org/Calendar.html


event_name
String
:
Practice 


event_status
String
:
pending


frequency
String
:
Contact organizer for details


inclusive_event
String
:
Contact organizer for details


location_address
String
:
Unknown


location_name
String
:
Contact organizer for details


max_age
String
:
Contact organizer for details


min_age
String
:
Contact organizer for details


org_name
String
:
Shadow Seals Swimming


picture_url
String
:
<img src=\"http://www.shadowsealsswimming.org/images/c03de8f057c6e9610d95a251f7085d95_944j.png\">


start_date_time
String
:
2018-11-18 14:45:00


user_name
String
:
None