Hadoop - bobbae/gcp GitHub Wiki
Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation.
Basic introduction to Apache Hadoop
https://www.youtube.com/watch?v=OoEpfb6yga8
Map reduce job example
You can see how to create a small three node Hadoop cluster and submit map reduce example.
MrJob
mrjob lets you write MapReduce jobs in Python 2.7/3.4+ and run them on several platforms.
Hive
Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis.
https://www.youtube.com/watch?v=cMziv1iYt28
Using Apache Hive on Dataproc.
Apache Hive is considered similar to BigQuery.
Migrating from Hive to Bigquery
https://cloud.google.com/blog/products/data-analytics/apache-hive-to-bigquery
Hadoop Pig
Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs.
https://www.youtube.com/watch?v=Hve24pRW_Ps
Hive vs Pig vs SQL
https://www.whizlabs.com/blog/hive-vs-pig-vs-sql/
Pig Latin SQL Challenge
Doing ETL in SQL or Pig Latin to give more detailed feel for why one might prefer one or the other in solving actual common problems:
http://www.olric.org/2019/09/pig-latin-sql-challenge-or-window.html?m=1
Sawzall
A perspective on Sawzall DSL (domain specific language) over Google map/reduce and Pig DSL over Hadoop map/reduce.