Big Data Technologies - urcuqui/Data-Science GitHub Wiki

Amazon Web Services & Google Cloud Platform

Datalake/Datawarehouse

They are digital stores which can save raw data Amazon Comparing Athena to Redshift is not simple. Athena has an edge in terms of portability and cost, whereas Redshift stands tall in terms of performance and scale.

  • Redshift
  • Athena Google
  • Big Query

Store of objects

all data are stored as objects

  • S3 -> Amazon
  • Cloud Storage -> Google

ETL: Extract, Transform and Load

  • Glue -> Amazon
  • DataFlow -> Google

Data Processing - Hadoop

It allows us to manage in a distributed structure the data and their process

  • EMR -> Amazon
  • DataProc -> Google

Data Streaming

It allows anyone to collect, process and analyze the data in real-time.

  • Kinesis -> Amazon
  • Pub/Sub -> Google

Infrastructure as Code

It is the way (scripts) to configure the infrastructure remotely

  • CloudFormation -> Amazon
  • Deploy Manager - > Google

Manage the workflows

  • Step Functions -> Amazon
  • Cloud Composer -> Google

BI and Visualization

  • QuickSight -> Amazon
  • DataLabd -> Google
  • Data Studio -> Google

Data Pipelines

It is the process to transform the data flows

  • DataPipeline -> Amazon
  • DataPrep -> Google

Functions

Making applications to answer quickly new information

  • Lambda -> Amazon
  • Cloud Functions -> Google

Visualizing and Managing of logs

  • ELK
  • Elastic Search