Spark SQL links - vaquarkhan/Apache-Kafka-poc-and-notes GitHub Wiki

Spark SQL architecture contains three layers namely, Language API, Schema RDD, and Data Sources.

Language API − Spark is compatible with different languages and Spark SQL. It is also, supported by these languages- API (python, scala, java, HiveQL).

Schema RDD − Spark Core is designed with special data structure called RDD. Generally, Spark SQL works on schemas, tables, and records. Therefore, we can use the Schema RDD as temporary table. We can call this Schema RDD as Data Frame.

Data Sources − Usually the Data source for spark-core is a text file, Avro file, etc. However, the Data Sources for Spark SQL is different. Those are Parquet file, JSON document, HIVE tables, and Cassandra database.

https://rklicksolutions.wordpress.com/2016/03/03/tutorial-spark-1-6-sql-and-dataframe-operations/

Build Spark : http://mbonaci.github.io/mbo-spark/

http://spark.apache.org/docs/latest/building-spark.html

**Test Data :+1: https://www.mapr.com/blog/using-apache-spark-dataframes-processing-tabular-data Good Notes: https://www.supergloo.com/fieldnoteshttps://www.supergloo.com/fieldnotes

Spark SQL example practice https://www.infoq.com/articles/apache-spark-sql