Shadow Seals Crawler - ActoKids/web-crawler GitHub Wiki
The SSCrawler opens the calendar from a website http://www.shadowsealsswimming.org/Calendar.html and crawls the events' data from the table on this website and generates a json file with the results.
Dependencies -
Python - https://www.python.org/downloads/
Pip - https://pip.pypa.io/en/stable/installing/
Beautiful Soup 4 - pip install beautifulsoup4
dateutil - pip install python-dateutil
datetime - pip install DateTime
How it works
Opens the website http://www.shadowsealsswimming.org/Calendar.html and parses all the events from the table. Produce a list of events' data and outout json file. Displays timestamp of start time and end time, logs connecting to link successful or failed, links of events found.
Running the crawler
Make sure ALL dependencies are installed before running AND that you have ran SSCrawler.py first. Once done simply run the SSCrawler.py script and wait until it finishes creating the json.
Example of output
Following is the JSON generated -
SSCrawler
activity_type
String
:
Contact organizer for details
approver
String
:
N/A
contact_email
String
:
Contact organizer for details
contact_name
String
:
Contact organizer for details
contact_phone
String
:
Contact organizer for details
cost
String
:
Contact organizer for details
created_timestamp
String
:
2019-03-16 03:32:28.601277
description
String
:
Practice - Shoreline Pool
disability_types
String
:
Contact organizer for details
end_date_time
String
:
Contact organizer for details
event_id
String
:
43a5efa9-2fbd-514e-be09-55f9489071e3
event_link
String
:
http://www.shadowsealsswimming.org/Calendar.html
event_name
String
:
Practice
event_status
String
:
pending
frequency
String
:
Contact organizer for details
inclusive_event
String
:
Contact organizer for details
location_address
String
:
Unknown
location_name
String
:
Contact organizer for details
max_age
String
:
Contact organizer for details
min_age
String
:
Contact organizer for details
org_name
String
:
Shadow Seals Swimming
picture_url
String
:
<img src=\"http://www.shadowsealsswimming.org/images/c03de8f057c6e9610d95a251f7085d95_944j.png\">
start_date_time
String
:
2018-11-18 14:45:00
user_name
String
:
None