Plan de capacitaciones y referencias - wandent/mutual-wiki GitHub Wiki

Plan de capacitaciones y referencias

Actualizado en 30/10/2020

TOC

Capacidades para Ingeniero de Datos

Azure for the data engineer (learning path)

https://docs.microsoft.com/en-us/learn/paths/azure-for-the-data-engineer/

Productos

Azure Storage (Data Lake Gen2)

Large scale data processing with Azure Data Lake Storage gen2 (learning path)

https://docs.microsoft.com/en-us/learn/paths/data-processing-with-azure-adls/

Overview of Azure Data Lake store gen2

https://channel9.msdn.com/Shows/Azure-Friday/Azure-Data-Lake-Storage-Gen2-overview?term=data%20lake%20gen%202&lang-en=true

Introduction to Azure Data Lake Store gen2

https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction

Multiprotocol Access on Azure Data Lake Storage

https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-multi-protocol-access

Open Source Platforms that supports Azure Data Lake Store Gen2

https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-supported-open-source-platforms

AzCopy Tool

https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10

Azure Data Lake Store gen2 : Security recommendations

https://docs.microsoft.com/en-us/azure/storage/blobs/security-recommendations

Immutable Storage

https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blob-immutable-storage

Object Replication https://docs.microsoft.com/en-us/azure/storage/blobs/object-replication-configure?tabs=portal

Azure Data Factory - ADF

Integrate Data with Data Factory (learning path)

https://docs.microsoft.com/en-us/learn/modules/data-integration-azure-data-factory/

Receive Data with Azure Data Share and transforming with Azure Data Factory (learning path)

https://docs.microsoft.com/en-us/learn/modules/receive-data-with-azure-data-share-transforming-with-azure-data-factory/

Transform data by running Python activity (ADF) in Azure Databricks

https://docs.microsoft.com/en-us/azure/data-factory/transform-data-databricks-python

Transform data by running a jar activity in Azure Databricks

https://docs.microsoft.com/en-us/azure/data-factory/transform-data-databricks-jar

Transform data by running a databricks notebook (submitting jobs on databricks)

https://docs.microsoft.com/en-us/azure/data-factory/transform-data-databricks-notebook

Databricks

Microsoft

Data Engineering with Databricks

https://docs.microsoft.com/en-us/learn/paths/data-engineer-azure-databricks/

Databricks training (clases virtuales, con costo)

ETL Part 1

https://academy.databricks.com/course/MID-DE-DAEX-v1-SP-C

ETL Part 2

https://academy.databricks.com/course/MID-DE-DTLO-v1-SP-C

ETL Part 3

https://academy.databricks.com/course/MID-DE-ETLP-v1-SP-C

Structured Streaming

https://academy.databricks.com/course/MID-AL-STST-v1-SP-C

otras capacitaciones por profesión (ingeniero de datos, cientifico de datos o administrador)

https://academy.databricks.com/pathway/INT-AL-FREE-SP

Capacitaciones con instructor

https://academy.databricks.com/category/public-trainings

Webinar: Using SQL to query your Datas Lake with Delta Lake

Sin costo, bajo demanda.

https://databricks.com/p/webinar/using-sql-to-query-your-data-lake-with-delta-lake

Azure Databricks Essential - LinkedIn Learning

https://www.linkedin.com/learning/azure-databricks-essential-training/optimize-data-pipelines?u=3322

More references for Databricks

(some might not be current, or available)

Databricks 101

Security - Isso 27001 certified secuirty ISO 27001 Certified Security

https://vimeo.com/282588913

Delta Lake Website (Webinars) https://databricks.com/product/delta-lake-on-databricks

Apache Spark Documentation

http://spark.apache.org/

Spark Docs

http://spark.apache.org/docs/latest/

Pyspark documentation

https://spark.apache.org/docs/latest/api/python/index.html

Delta Lake

https://docs.azuredatabricks.net/delta/index.html

Dataframes

https://docs.databricks.com/getting-started/spark/dataframes.html

Introduction to Dataframes

https://docs.azuredatabricks.net/spark/latest/dataframes-datasets/introduction-to-dataframes-python.html#write-the-unioned-dataframe-to-a-parquet-file

Spark SQL Reference (Hive Spark)

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select

Spark API (Dataframes and Datasets)

https://docs.databricks.com/spark/latest/dataframes-datasets/index.html#dataframes

Databases and Tables

https://docs.databricks.com/data/tables.html

Data types

http://spark.apache.org/docs/latest/sql-programming-guide.html#data-types

Metastores

https://docs.databricks.com/data/metastores/index.html

Introduction to Dataframes

https://docs.azuredatabricks.net/spark/latest/dataframes-datasets/introduction-to-dataframes-python.html#work-with-dataframes

Databases and Tables

https://docs.databricks.com/data/tables.html

Spark Clusters Configurations

https://docs.microsoft.com/en-us/azure/databricks/clusters/configure

Python Tutorial

https://www.tutorialspoint.com/python/

Mix Languages in Notebooks

https://docs.databricks.com/notebooks/notebooks-use.html#mix-languages

Databricks-cli (Databricks CLI Command Line Interface)

https://pypi.org/project/databricks-cli/

Delta Lake

Delta Lake

https://docs.azuredatabricks.net/delta/index.html Delta Lake API Reference

https://docs.azuredatabricks.net/delta/delta-apidoc.html Delta Lake Website

http://delta.io

Delta Lake Documentation

https://docs.delta.io/latest/index.html Delta Lake Quick Start Python

https://docs.azuredatabricks.net/delta/delta-batch.html#write-to-a-table

Delta Lake Quick Start SQL

https://docs.azuredatabricks.net/_static/notebooks/delta/quickstart-sql.html

Streaming

Sample Notebook 1

https://docs.microsoft.com/en-us/azure/databricks/_static/notebooks/structured-streaming-python.html Structured Streaming and Event Hubs Integration Guide https://github.com/Azure/azure-event-hubs-spark/blob/master/docs/structured-streaming-eventhubs-integration.md

Azure Event Hubs Spark Connector

https://github.com/Azure/azure-event-hubs-spark

Other (advanced)

Libraries

https://docs.azuredatabricks.net/libraries.html ML Flow

https://docs.azuredatabricks.net/applications/mlflow/index.html

GraphFrames

https://docs.azuredatabricks.net/spark/latest/graph-analysis/graphframes/index.html

Azure Samples - Streaming at Scale

https://github.com/Azure-Samples/streaming-at-scale

Loading Avro Files into Databricks https://docs.databricks.com/data/data-sources/read-avro.html

Databricks logs in Azure

https://docs.microsoft.com/en-us/azure/databricks/administration-guide/account-settings/azure-diagnostic-logs

Databricks Cloud Automation

https://github.com/databrickslabs/databricks-cloud-automation

High Performance Spark Queries with Databricks Delta https://docs.azuredatabricks.net/_static/notebooks/delta/optimize-python.html

Azure Databricks Operator - Container image

https://hub.docker.com/_/microsoft-k8s-azure-databricks-operator

Azure Databricks API container

https://hub.docker.com/_/microsoft-azure-databricks-api

ML Leap model export demo Python https://docs.microsoft.com/en-us/azure/databricks/_static/notebooks/mleap-model-export-demo-python.html

Deploy models with Azure Machine Learning

https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where

Power BI

Iniciante

Create and use analytics reports with Power BI

https://docs.microsoft.com/en-us/learn/paths/create-use-analytics-reports-power-bi/

Intermedio Prepare Data in Power BI

https://docs.microsoft.com/en-us/learn/paths/prepare-data-power-bi/

Visualize Data in Power BI

https://docs.microsoft.com/en-us/learn/paths/visualize-data-power-bi/

Perform analytics in Power BI

https://docs.microsoft.com/en-us/learn/paths/perform-analytics-power-bi/

Use DAX in Power BI

https://docs.microsoft.com/en-us/learn/paths/dax-power-bi/

Manage Workspaces and Datasets in Power BI

https://docs.microsoft.com/en-us/learn/paths/manage-workspaces-datasets-power-bi/