Spark - bobbae/gcp GitHub Wiki
Apache Spark is an open-source unified analytics engine for large-scale data processing.
Spark on Dataproc
Dataproc Spark cluster on GCP
https://medium.com/google-cloud/dataproc-spark-cluster-on-gcp-in-minutes-3843b8d8c5f8
Apache Spark and Jupyter Notebooks on Cloud Dataproc
https://codelabs.developers.google.com/codelabs/spark-jupyter-dataproc#0
Dataproc Serverless PySpark templates
PySpark and Jupyter Notebook
https://www.freecodecamp.org/news/what-is-google-dataproc/
Tuning Spark Applications to Efficiently Utilize Dataproc Cluster
BigQuery Stored Procedures for Apache Spark
https://cloud.google.com/blog/products/data-analytics/build-limitless-workloads-on-bigquery/
Serverless Spark
https://cloud.google.com/blog/products/data-analytics/making-serverless-spark-even-more-powerful
Spark and Airflow
Apache Spark Tutorial
https://www.youtube.com/watch?v=IQfG0faDrzE4
Apache Spark and machine learning
https://www.datacamp.com/community/tutorials/apache-spark-tutorial-machine-learning
A Scala tutorial for Java programmers
https://docs.scala-lang.org/tutorials/scala-for-java-programmers.html
Some online courses to learn Hadoop and Spark
Spark by Example
Main Spark github source tree
https://github.com/apache/spark
Apache Beam vs Spark
https://blog.allegro.tech/2021/06/1-task-2-solutions-spark-or-beam.html
Apache Flink vs Spark
https://data-flair.training/blogs/comparison-apache-flink-vs-apache-spark/
Presto vs Spark
https://ahana.io/learn/comparisons/spark-sql-vs-presto/
Apache Hudi vs Apache Kudu
https://hudi.apache.org/docs/comparison/
Examples
Spark examples source code
https://github.com/apache/spark/tree/master/examples/src/main