Event Brite Crawler - ActoKids/web-crawler GitHub Wiki
The EBCrawler crawls the specified url looking for more url's
Dependencies -
Python - https://www.python.org/downloads/
Pip - https://pip.pypa.io/en/stable/installing/
Beautiful Soup 4 - pip install beautifulsoup4
dateutil - pip install python-dateutil
datetime - pip install DateTime
Supported URL's
EVENT_BRITE = "https://www.eventbrite.com/d/wa--seattle/disability/?page=1"
How it works
Opens the designated URL and grabs all anchor tags found on the page within the main content, also goes through pagination. A json object is created after all URL's are found.
Running the crawler
Make sure ALL dependencies are installed before running. Once done simply run the EBCrawler.py script and wait until it finishes creating the json.
Example of output
Following is the JSON generated -
EventBrite
[
"https://www.eventbrite.com/e/help-im-invisible-invisible-disabilities-in-the-workplace-tickets-56439603373?aff=ebdssbdestsearch",
"https://www.eventbrite.com/e/p4p-presents-models-of-disability-how-do-we-ensure-everyone-belongs-tickets-55034038290?aff=ebdssbdestsearch",
"https://www.eventbrite.com/e/blackafrican-american-community-social-2019-tickets-48468415335?aff=ebdssbdestsearch",
"https://www.eventbrite.com/e/529-college-savings-plans-for-grandparents-tickets-56454943255?aff=ebdssbdestsearch",
"https://www.eventbrite.com/e/bridging-family-culture-graduate-school-tickets-55541710752?aff=ebdssbdestsearch",
...
]
What to do next
Once the JSON is created proceed with starting EBScraper to parse the url's