Hadoop - studiofu/brain GitHub Wiki

Quick Start

HDFS and MapReduce

Parallel Processing to increase performance of I/O throughput

Apache Hive - SQL for HDFS

HBase - NoSQL

Apache Sqoop - bulk transfer between relational database and HDFS

Apache Flume

Apache Avro

Apache Kafka

Apache Solr

Apache Mahout

Resources

What's Yarn in Hadoop

https://www.quora.com/What-is-YARN-in-Hadoop

Setup Spark Hadoop Yarn Clusters

https://www.linode.com/docs/databases/hadoop/install-configure-run-spark-on-top-of-hadoop-yarn-cluster/

Spark Installation

https://spark.apache.org/downloads.html