Big Data Technologies - urcuqui/Data-Science GitHub Wiki
Amazon Web Services & Google Cloud Platform
Datalake/Datawarehouse
They are digital stores which can save raw data Amazon Comparing Athena to Redshift is not simple. Athena has an edge in terms of portability and cost, whereas Redshift stands tall in terms of performance and scale.
- Redshift
- Athena Google
- Big Query
Store of objects
all data are stored as objects
- S3 -> Amazon
- Cloud Storage -> Google
ETL: Extract, Transform and Load
- Glue -> Amazon
- DataFlow -> Google
Data Processing - Hadoop
It allows us to manage in a distributed structure the data and their process
- EMR -> Amazon
- DataProc -> Google
Data Streaming
It allows anyone to collect, process and analyze the data in real-time.
- Kinesis -> Amazon
- Pub/Sub -> Google
Infrastructure as Code
It is the way (scripts) to configure the infrastructure remotely
- CloudFormation -> Amazon
- Deploy Manager - > Google
Manage the workflows
- Step Functions -> Amazon
- Cloud Composer -> Google
BI and Visualization
- QuickSight -> Amazon
- DataLabd -> Google
- Data Studio -> Google
Data Pipelines
It is the process to transform the data flows
- DataPipeline -> Amazon
- DataPrep -> Google
Functions
Making applications to answer quickly new information
- Lambda -> Amazon
- Cloud Functions -> Google
Visualizing and Managing of logs
- ELK
- Elastic Search