spark - JasonWayne/personal-wiki GitHub Wiki
MacOs下配置Spark + ipython https://gist.github.com/ololobus/4c221a0891775eaa86b0
spark优化技巧 http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-1/
dynamic repartitioning的讲座 https://spark-summit.org/2016/events/handling-data-skew-adaptively-in-spark-using-dynamic-repartitioning/
Difference Between Yarn Cluster and Yarn client mode
Yarn Cluster
: Spark的Driver跑在Yarn的AM里,Yarn Client
: Spark的Driver跑在本地机器上
Spark SQL
spark基础之spark sql运行原理和架构
spark基础之Spark SQL和Hive的集成以及ThriftServer配置
https://blog.csdn.net/eric_sunah/article/details/49705307
https://www.jianshu.com/p/0aa4b1caac2e
https://zhuanlan.zhihu.com/p/29407368
Spark SQL之External DataSource外部数据源(二)源码分析
Spark SQL的前世今生
https://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html
http://www.cnblogs.com/zwCHAN/p/4276240.html
https://www.zhihu.com/question/23182567
Spark streaming
运行第一个Streaming程序
Long-running Spark Streaming Jobs on YARN Cluster
Spark编程指南-简体中文版-Streaming相应章节
源码阅读 & 调试
Stash
性能调优
数据倾斜
Spark性能优化之道——解决Spark数据倾斜(Data Skew)的N种姿势
Skew Join Optimization - databricks
Optimize Spark with DISTRIBUTE BY & CLUSTER BY