Google Calendar Crawler - ActoKids/web-crawler GitHub Wiki

The Google Calendar Crawler scrapes an individual Google Calendar to pull event information.

Dependencies

Access to AWS Lambda and DynamoDB

How it works

The crawler uses a gmail token and OAuth to gain access to individual Google Calendars. From there, the crawler scrapes the calendar for event details. The calendar being scraped depends on the email associated with it. In order to change the calendar, first add it to the gmail account you want to use as your base. From there, you can change the calendar you'd like to scrape by changing the email on line 50 in the lambda_function.py file: events_result=service.events().list(calendarId="whateverEmailYouWant"...

How to run

Make sure you have access to AWS Lambda.

Go to our GitHub Google lambda and save that zip folder locally. If you delete the token.pickle file, the first time you run the program it will ask you to choose a gmail account. The lambda function has an email account already in place on line 50. Simply change that email to the one associated with the calendar you want to scrape and the program will scrape that calendar data and write it to DynamoDB. Once you have successfully ran the program outside of Lambda, you can move forward.

Open AWS Lambda, create a new function. When you open that function, go down to 'Function code' and for 'Code entry type' select 'Upload a .zip file'. Upload the zip file and create a new Test event at the top of the page (you can name this whatever you want). When you hit 'Test', the function will run and write to DynamoDB.

Visual example

Example output

Fields provided to DynamoDB include event name, description, date, location, minimum/maximum age, contact email, name, phone, cost, and a link to the event URL.