Home - cchantra/bigdata.github.io GitHub Wiki

Welcome to the bigdata.github.io wiki!

240px-KU_SubLogo

This page follows the course : Big data platform at Kasetsart University.

Syllabus and materials

  1. What is big data?
  1. Introduction to HDFS and Hadoop ecosystem
  1. MapReduce Concepts and Wordcount program
  1. Data store Example on HDFS, Hive , HBase, Pig

Installation:

Lecture:

Video :

Tools:

Hive SQL Command Reference:

Hbase: Installation guide

Pig:

  1. Spark Ecosystem: Pyspark, SparkML, Streaming with Spark, GraphFrame

Current version is at official page.

GraphX

SparkML

  1. Messaging service with Kafka (optional MQTT & Python)

streaming Twitter with kafka

** A full running system at this point **

you should have hdfs, hive, hbase, kafka

  1. Elasticsearch ecosystem (ELK)

Elasticsearch, Filebeat, Logstash, Kibana

Their connectivity to Spark, and Kafka

Alternative

https://opensearch.org/downloads.html