Big Data - amitbhilagude/userfullinks GitHub Wiki
- Big Data Technology Evolutions
- Teradata
- Apache Hadoop
- Apache Spark
- Data Lake
- Data Warehouse
- AI and Machine learning using ML Studio, TensorFlow, etc.
- Big Data Scenarios
- Analytics by capturing clicks, logs
- IoT devices like RFIDs
- What is big data
- Data volume in the 100s of TBs or PBs
- It does parallel processing using some of the technology like hadoop, spark.
- Data Transformation and Pipelines
- It performs in two different ways
- ETL: Extract, Transform and Load
- Extract data from source and store into DataLake
- Transform data into Azure Data Factory or databricks
- Load data into Destinations like SQL data warehouse
- ELT: Extract, Load and Transform
- Extract data from source and store into DataLake
- Load data into Destinationtion like SQL data warehouse
- Transform data into Azure Data Factory or databricks
- ETL: Extract, Transform and Load
- It performs in two different ways
- Common Big data technologies
-
Hadoop
- Cloud Provides who have own Hadoop Service
- HDInsights in Azure
- EMR in Amazon
- DataProc in GCP
- Cloud Provides who have own Hadoop Service
-
Spark
- Advanced version of Hadoop. Used for in-memory data set instead of Disk. If you use Spark SQL, Those data sets will be stored in Data frames.
- Data brick is most commonly used in Spark space.
- Azure has Azure Databrick which is on top of Databrick space
-
Kafka
-
Hive
-
Presto
-
- Big Data Roles
- Data Analyst: Focus on analyzing data and understand data in a business context
- Data Engineer: Coder, Builds Data pipeline and Transforms it using code or visualization
- Data Skewed: Put Goverenece on data
- Data Scientist: AI and Machine Learning expert
- Machine Learning Engineer: Administrative tasks on Machine learning
- Chief data/ analytics/ Officer: In charge of data and business decision
- Data Lake
- Repository for storing Big Data.
- Parquet file
- New file format over CSV file
- This format is heavily used to store files in data lake which requires smaller space in compression