DataProc - bobbae/gcp GitHub Wiki
Dataproc is a managed service that can run Apache Spark, Apache Hadoop, Apache Flink, Presto, and 30+ open source tools and frameworks for batch processing, querying, streaming, data lake modernization, ETL, secure data science, and machine learning.
https://cloud.google.com/dataproc/docs/quickstarts
https://codelabs.developers.google.com/codelabs/cloud-dataproc-gcloud
Enterprises are migrating their existing on-premises Apache Hadoop and Spark clusters over to Dataproc to manage costs and unlock the power of elastic scale.
https://www.youtube.com/watch?v=h1LvACJWjKc
Dataproc Serverless lets you run Spark batch workloads without requiring you to provision and manage your own cluster.
https://cloud.google.com/dataproc-serverless/docs
https://medium.com/geekculture/creating-serverless-spark-jobs-with-google-cloud-dd84c375947d
https://mkuthan.github.io/blog/2022/03/24/gcp-dataproc-spark-tuning/
https://cloud.google.com/vertex-ai/docs/pipelines/dataproc-component
Dataproc Metastore is a managed Hive metastore that can be used as a centralized metadata repository that can be shared among various ephemeral Dataproc clusters running different open source components.
Component Gateway provides secure access to web endpoints for Dataproc default and optional components.
https://www.youtube.com/watch?v=YK_-yS9y_0k
Dataproc integrates with Apache Hadoop and the Hadoop Distributed File System (HDFS).
https://cloud.google.com/dataproc/docs/concepts/dataproc-hdfs
https://medium.com/@datacouch/big-data-processing-using-google-dataproc-d911d0b05313
https://cloud.google.com/dataproc-serverless/docs/guides/bigquery-connector-spark-example
https://cloud.google.com/dataproc/docs/tutorials
Write Spark Scala Jobs (From Spark to DataProc)
Machine Learning with Spark on Google Cloud Dataproc
Distributed Image Processing in Cloud Dataproc
Using Apache Spark DStreams with Dataproc and Pub/Sub
Cloud Bigtable map reduce word count example with Dataproc
Install and run a Jupyter notebook on a Dataproc cluster
Apache Spark and Jupyter Notebooks made easy with Dataproc component gateway