Home - cchantra/bigdata.github.io GitHub Wiki
Welcome to the bigdata.github.io wiki!
This page follows the course : Big data platform at Kasetsart University.
Syllabus and materials
- What is big data?
- Introduction to HDFS and Hadoop ecosystem
-
HDFS commands cheat sheet
- MapReduce Concepts and Wordcount program
- Data store Example on HDFS, Hive , HBase, Pig
Installation:
Lecture:
Video :
Tools:
- hive beeline
- pyhive_pyarrow notebook
- happybase
- parquet intro notebook1 notebook2
- pig udf
- pyhive demo
- hive install demo query demo
Hive SQL Command Reference:
- https://cwiki.apache.org/confluence/display/Hive/LanguageManual
- https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
Hbase: Installation guide
Pig:
- Spark Ecosystem: Pyspark, SparkML, Streaming with Spark, GraphFrame
- spark RDD video lecture and demo
- spark 3.0 demo
Current version is at official page.
GraphX
SparkML
- Messaging service with Kafka (optional MQTT & Python)
streaming Twitter with kafka
** A full running system at this point **
you should have hdfs, hive, hbase, kafka
- Elasticsearch ecosystem (ELK)
Elasticsearch, Filebeat, Logstash, Kibana
Their connectivity to Spark, and Kafka
Alternative