Event Brite Crawler - ActoKids/web-crawler GitHub Wiki

The EBCrawler crawls the specified url looking for more url's

Dependencies -

Python - https://www.python.org/downloads/
Pip - https://pip.pypa.io/en/stable/installing/
Beautiful Soup 4 - pip install beautifulsoup4
dateutil - pip install python-dateutil
datetime - pip install DateTime

Supported URL's

EVENT_BRITE = "https://www.eventbrite.com/d/wa--seattle/disability/?page=1"

How it works

Opens the designated URL and grabs all anchor tags found on the page within the main content, also goes through pagination. A json object is created after all URL's are found.

Running the crawler

Make sure ALL dependencies are installed before running. Once done simply run the EBCrawler.py script and wait until it finishes creating the json.

Example of output

Following is the JSON generated -
EventBrite

[
    "https://www.eventbrite.com/e/help-im-invisible-invisible-disabilities-in-the-workplace-tickets-56439603373?aff=ebdssbdestsearch",
    "https://www.eventbrite.com/e/p4p-presents-models-of-disability-how-do-we-ensure-everyone-belongs-tickets-55034038290?aff=ebdssbdestsearch",
    "https://www.eventbrite.com/e/blackafrican-american-community-social-2019-tickets-48468415335?aff=ebdssbdestsearch",
    "https://www.eventbrite.com/e/529-college-savings-plans-for-grandparents-tickets-56454943255?aff=ebdssbdestsearch",
    "https://www.eventbrite.com/e/bridging-family-culture-graduate-school-tickets-55541710752?aff=ebdssbdestsearch",
    ...
]

What to do next

Once the JSON is created proceed with starting EBScraper to parse the url's