Spark - bobbae/gcp GitHub Wiki

Apache Spark is an open-source unified analytics engine for large-scale data processing.

Spark on Dataproc

Dataproc Spark cluster on GCP

https://medium.com/google-cloud/dataproc-spark-cluster-on-gcp-in-minutes-3843b8d8c5f8

Apache Spark and Jupyter Notebooks on Cloud Dataproc

https://codelabs.developers.google.com/codelabs/spark-jupyter-dataproc#0

Dataproc Serverless PySpark templates

https://medium.com/@ppaglilla/getting-started-with-dataproc-serverless-pyspark-templates-e32278a6a06e

PySpark and Jupyter Notebook

https://www.freecodecamp.org/news/what-is-google-dataproc/

Tuning Spark Applications to Efficiently Utilize Dataproc Cluster

https://medium.com/paypal-tech/tuning-spark-applications-to-efficiently-utilize-dataproc-cluster-11bd51b36fe1

BigQuery Stored Procedures for Apache Spark

https://cloud.google.com/blog/products/data-analytics/build-limitless-workloads-on-bigquery/

Serverless Spark

https://cloud.google.com/blog/products/data-analytics/making-serverless-spark-even-more-powerful

Spark and Airflow

https://medium.com/google-cloud/serverless-spark-etl-pipeline-orchestrated-by-airflow-on-gcp-199efbf9a9f3

Apache Spark Tutorial

https://www.youtube.com/watch?v=IQfG0faDrzE4

Apache Spark and machine learning

https://www.datacamp.com/community/tutorials/apache-spark-tutorial-machine-learning

A Scala tutorial for Java programmers

https://docs.scala-lang.org/tutorials/scala-for-java-programmers.html

Some online courses to learn Hadoop and Spark

https://medium.com/swlh/5-free-online-courses-to-learn-big-data-hadoop-and-spark-in-2019-a553e6ccfe30

Spark by Example

https://sparkbyexamples.com/

Main Spark github source tree

https://github.com/apache/spark

Apache Beam vs Spark

https://blog.allegro.tech/2021/06/1-task-2-solutions-spark-or-beam.html

Apache Flink vs Spark

https://data-flair.training/blogs/comparison-apache-flink-vs-apache-spark/

Presto vs Spark

https://ahana.io/learn/comparisons/spark-sql-vs-presto/

Apache Hudi vs Apache Kudu

https://hudi.apache.org/docs/comparison/

Examples

Spark examples source code

https://github.com/apache/spark/tree/master/examples/src/main

User churn prediction

https://medium.com/@frederik-schmidt/churn-prediction-with-pyspark-and-google-cloud-dataproc-ba9bca6981d4