AWS EC2 - ActoKids/web-crawler GitHub Wiki

UPDATE 03/13 - Ec2 is no longer being used, this walk through is using an older version plan of our OFACrawler + SSScraper. Once I have made changes to dynamo this will completely cease to work. Until then the lambda function ec2launcher no longer is functional, please launch ec2 manually.

AWS - EC2

Welcome, this page is dedicated to informing on how to use the EC2 instance to scrape Outdoor's for All and Shadow Seals.
Outdoor's for All -
https://outdoorsforall.org/events-news/calendar/
Shadow Seals -
http://www.shadowsealsswimming.org/Calendar.html

How it works

The ec2 instance is hosted on the AWS cloud. It is triggered by lambda which then turns on the instance and runs the scrapers. The scrapers then will write to Dynamo as well as produce log files, ofalog.log and sslog.log. These log files are then uploaded to Cloudwatch by the Cloudwatch Agent and then shows the logs. We are still discussing the triggers for this lambda but currently it is triggered every 24 hours.

Lambda Function

URL - https://console.aws.amazon.com/lambda/home?region=us-east-1#/functions/ad440-w19-lambda-crawler-launchec2?tab=graph The following is the code for the Lambda Function. On trigger the lambda will open a ec2 client and start the ec2 instance by ID. Followed by a confirmation message on the instance('s) that were launched.

import boto3
region = 'us-east-1'
instances = ['i-0a9c5fe477ab2c1cb']

def lambda_handler(event, context):
    ec2 = boto3.client('ec2', region_name=region)
    ec2.start_instances(InstanceIds=instances)
    print('started your instances: ' + str(instances))


To test this Lambda just simply press "Test" You should see a green success box upon instance start.

EC2

URL - https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#Instances:sort=instanceId EC2 is run on a Windows Server 2016 environment using T2.Micro specs (Which is free tier for AWS) Once you have started the Lambda proceed over to the EC2 instance. You should see that the instance is now running. From here you can stop, restart, or terminate the instance. During this time the EC2 will run both OFAScraper and SSScraper and will run until the script ends.

Dynamo

Dynamo - https://console.aws.amazon.com/dynamodb/home?region=us-east-1#tables:selected=events;tab=overview As we put the entries into Dynamo, UUID's are created and implemented with the rest of the data to ensure unique entries. Below is the example output of the data from EC2

OFA data

SS data

Cloudwatch

URL - https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#logStream:group=crawlerlogs;streamFilter=typeLogStreamPrefix Once the scripts finish the log files ofalog.log and sslog.log are sent over to Cloudwatch

OFA logs

SS logs