Spark - namgunghyeon/wiki GitHub Wiki

Spark์ •๋ฆฌ

1.Spark๋ž€

Apach Spark๋Š” ๋น ๋ฅด๊ณ  General Purpose Cluster Computing System์ž…๋‹ˆ๋‹ค. ์ŠคํŒ์€ ๋ฒ”์šฉ ๋ถ„์‚ฐ ํ”Œ๋žซํผ. ํ•˜๋‘ก๊ณผ ๊ฐ™์ด MapReduce๋งŒ ๋Œ๋ฆฌ๋Š” ๊ฒƒ์€ ์•„๋‹ˆ๊ณ , Storm๊ณผ ๊ฐ™์ด ์ŠคํŠธ๋ฆฌ๋ฐ๋งŒ ์ฒ˜๋ฆฌํ•˜๋Š”๊ฒŒ ์•„๋‹ˆ๋ผ, ๋ถ„์‚ฐ๋œ ์—ฌ๋Ÿฌ๋Œ€์˜ ๋…ธ๋“œ์—์„œ ์—ฐ์‚ฐํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ๋Š” ๋ฒ”์šฉ ๋ถ„์‚ฐ ํด๋Ÿฌ์Šคํ„ฐ๋ง ํ”Œ๋žซํผ์œผ๋กœ, ์œ„์— MapRduce, ์ŠคํŠธ๋ฆฌ๋ฐ ์ฒ˜๋ฆฌ๋“ฑ์˜ ๋ชจ๋“ˆ์„ ์˜ฌ๋ ค์„œ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค. ํ•˜๋‘ก์ด MR์ž‘์—…์„ ๋””์Šคํฌ ๊ธฐ๋ฐ˜์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋Š๋ ค์ง€๋Š” ์„ฑ๋Šฅ์„ ๋ฉ”๋ชจ๋ฆฌ ๊ธฐ๋ฐ˜์œผ๋กœ ์˜ฎ๊ฒจ์„œ ๊ณ ์†ํ™” ํ•˜๊ณ ์ž ์ถœ๋ฐœํ–ˆ๋‹ค. spark

2.๊ตฌ์กฐ

์ŠคํŒ์€ Driver Program์œผ๋กœ Driver Program์€ ์—ฌ๋Ÿฌ๊ฐœ์˜ ๋ณ‘๋ ฌ ์ž‘์—…์œผ๋กœ ๋‚˜๋ˆ„์–ด Spark์˜ Worker Node์— ์žˆ๋Š” Executor์—์„œ ์‹คํ–‰๋œ๋‹ค.

spark

Cluster Manager Type: Standalone, Apache Mesos, Hadoop YARN

Driver Program main ํ•จ์ˆ˜๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ํ”„๋กœ์„ธ์Šค, spark-sumit์„ ํ†ตํ•ด์„œ ๊ตฌํ˜„ํ•œ ์ฝ”๋“œ๋ฅผ ์ œ์ถœ, ๊ตฌํ˜„ํ•œ ์ฝ”๋“œ์—์„œ๋Š” SparkContext๋ผ๋Š” ๊ฐ์ฒด๋ฅผ ์ƒ์„ฑํ•˜๊ณ , RDD๋ฅผ ์ƒ์„ฑํ•˜๋ฉฐ ์ œ์ถœํ•œ application์„ Task๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” ์‹ค์ œ ์ˆ˜ํ–‰ ๋‹จ์œ„๋กœ ๋ณ€ํ™˜ task๋ฅผ ๋ฌถ์–ด Worker Node์˜ Executor๋กœ ์ „๋‹ฌ์„ ํ•œ๋‹ค. Executor๋Š” ๋ฐ›์€ Task๋ฅผ RDD์— ์ €์žฅํ•˜๊ณ  ์ฒ˜๋ฆฌ๋ฅผ ํ•œ๋‹ค.(spark-sumit์ด ์›Œ์ปค ๋…ธ๋“œ์˜ ์ •๋ณด๋ฅผ ์•Œ๊ณ  ๋ชจ๋“  ์›Œํฌ ๋…ธ๋“œ์—๊ฒŒ ์ž‘์—…์„ ์‹œํ‚ค๋Š” ๊ฑด์ง€?, ํด๋Ÿฌ์Šคํ„ฐ ๋งค๋‹ˆ์ €๋กœ๋ถ€ํ„ฐ ์›Œ์ปค ๋…ธ๋“œ ์ •๋ณด๋ฅผ ๋ฐ›์•„์„œ ์ „๋‹ฌ?)

Worker Node ํด๋Ÿฌ์Šคํ„ฐ์— ์žˆ๋Š” executor๋ฅผ ํฌํ•จํ•œ ์‹ค์ œ ์ž‘์—…์„ ํ•˜๋Š” ๋…ธ๋“œ

SparkContext Driver Program์— ์˜ํ•ด์„œ ์ƒ์„ฑ๋˜๊ณ , Cluster manager(Resource๋ฅผ ํ• ๋‹นํ•˜๋Š” ์—ญํ™œ)์™€ ์—ฐ๊ฒฐ๋œ๋‹ค.

Executor computation๊ณผ data๋ฅผ ์ €์žฅํ•˜๋Š” ์—ญํ• ์„ ํ•˜๋Š” process๋กœ, application๊ณผ lifecycle๊ณผ ๋™์ผํ•˜๊ฒŒ ์ˆ˜ํ–‰๋œ๋‹ค. executor๊ฐ€ ์˜ค๋ฅ˜๊ฐ€ ๋‚˜๋ฉด ๋Œ€์ฒด executor job์„ ํ• ๋‹น ํ•œ๋‹ค. Executor๋Š” multi threads์—์„œ tasks๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ณ  ์ˆ˜ํ–‰ ๊ฒฐ๊ณผ๋ฅผ Driver Program์œผ๋กœ ์ „์†ก ๋ฐ tasks๋ฅผ ์Šค์ผ€์ฅด๋งํ•˜๋Š” ์—ญํ• ์„ ํ•œ๋‹ค.

spark

Spark Application ์‹คํ–‰ ๋™์ž‘ ์ˆœ์„œ

1.์‚ฌ์šฉ์ž๊ฐ€ spark-submit์„ ์‚ฌ์šฉํ•ด ์ž‘์„ฑํ•œ Application์„ ์‹คํ–‰
2.spark-submit์€ Driver Program์„ ์‹คํ–‰ํ•˜์—ฌ main() ๋ฉ”์†Œ๋“œ๋ฅผ ํ˜ธ์ถœ
3.Driver์—์„œ ์ƒ์„ฑ๋œ SparkContext๋Š” Cluster Manager๋กœ ๋ถ€ํ„ฐ Executor ์‹คํ–‰์„ ์œ„ํ•œ ๋ฆฌ์†Œ์Šค๋ฅผ ์š”์ฒญ
4.Cluster Manager๋Š” Executor๋ฅผ ์‹คํ–‰
5.Driver Program์€ Application์„ Task๋‹จ์œ„๋กœ ๋‚˜๋ˆ„์–ด Executor์—๊ฒŒ ์ „์†ก
6.Executor๋Š” Task๋ฅผ ์‹คํ–‰
7.Executor๋Š” Application์ด ์ข…๋ฃŒ๋˜๋ฉด, ๊ฒฐ๊ณผ๋ฅผ Driver Program์—๊ฒŒ ์ „๋‹ฌ, Cluster Manager์—๊ฒŒ ๋ฆฌ์†Œ๋ฅผ ๋ฐ˜๋‚ฉ.

3.RDD(Resilient Distributed DataSet)

spark Resilient Distributed Datasets์˜ ํŠน์„ฑ์€ immutable, partitioned collections of records์˜ ํŠน์ง•์ด ์žˆ๋‹ค. ์ˆ˜์ • ๋ถˆ๊ฐ€๋Šฅํ•œ ๊ฐ์ฒด๋กœ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์€ Storage์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์™€ RDD๋กœ ๋ณ€ํ™˜ํ•˜๊ฑฐ๋‚˜ RDD์—์„œ RDD๋กœ๋งŒ ๊ฐ€๋Šฅํ•˜๋‹ค. immutable์ด๊ธฐ ๋•Œ๋ฌธ์— ์ˆ˜์ •์ด ๋ถˆ๊ฐ€๋Šฅํ•ด, read-only๋กœ๋งŒ ์‚ฌ์šฉ๋˜๊ณ , ์ƒ์„ฑ๋˜๋Š” ๊ณผ์ •์„ ๊ธฐ๋กํ•ด ๋†“์€ linege๋ฅผ ํ†ตํ•ด์„œ RDD๊ฐ์ฒด๋ฅผ ๋‹ค์‹œ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋ ‡๊ฒŒํ•˜๋ฉด fault-tolerant์˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋‹ค.

linegage๋Š” ๊ณ„๋ณด๋ผ๋Š” ๋œป์„ ๊ฐ–๊ณ  ์žˆ๊ณ , DAG(directed acyclic graph)๋กœ ๋””์ž์ธ ๋˜์–ด ์žˆ๋‹ค. ๋ฐ์ดํ„ฐ๋ฅผ ๋กœ๋”ฉํ•˜๊ณ , ์ผ๋ จ์˜ ๊ณผ์ •์„ ๊ธฐ๋กํ•˜๊ณ  ์žˆ๊ณ , ์ด๋ ‡๊ฒŒ ๊ธฐ๋ก๋œ ๊ณผ์ •์€ ์ถ”ํ›„ fault-tolerant์˜ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ๋•Œ ๋‚ด๊ฐ€ ์ƒ์„ฑํ•ด ๋†“์€ RDD์˜ ์ด์ „ lineage๋ฅผ ๋ณด๊ณ  ์ƒ์„ฑํ•˜๋ฉด ๋˜๊ธฐ ๋•Œ๋ฌธ์— ๋น ๋ฅธ ๋ณต๊ตฌ ๊ฐ€๋Šฅํ•˜๋‹ค.

RDD๋Š” ์—ฌ๋Ÿฌ ๋ถ„์‚ฐ ๋…ธ๋“œ์— ๊ฑธ์ณ์„œ ์ €์žฅ๋˜๋Š” ๋ณ€๊ฒฝ์ด ๋ถˆ๊ฐ€๋Šฅํ•œ ๋ฐ์ดํ„ฐ์˜ ์ง‘ํ•ฉ์œผ๋กœ ๊ฐ๊ฐ์˜ RDD๋Š” ์—ฌ๋Ÿฌ๊ฐœ์˜ ํŒŒํ‹ฐ์…˜์œผ๋กœ ๋ถ„๋ฆฌ๊ฐ€ ๋œ๋‹ค. RDD๋Š” ๋ณ€๊ฒฝ์ด ๋ถˆ๊ฐ€๋Šฅํ•ด, ๋ณ€๊ฒฝํ•˜๋ ค๋ฉด ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ์…‹์„ ์ƒ์„ฑํ•ด์•ผ ํ•œ๋‹ค. ๋‘ ๊ฐ€์ง€ ์˜คํผ๋ ˆ์ด์…˜๋งŒ ์ง€์›ํ•œ๋‹ค.

Transformation : ๊ธฐ์กด์˜ RDD ๋ฐ์ดํƒ€๋ฅผ ๋ณ€๊ฒฝํ•˜์—ฌ ์ƒˆ๋กœ์šด RDD ๋ฐ์ดํƒ€๋ฅผ ์ƒ์„ฑํ•ด๋‚ด๋Š” ๊ฒƒ. filter์™€ ๊ฐ™์€ ํŠน์ •ํ•œ ๋ฐ์ดํƒ€๋งŒ ๋ฝ‘์•„ ๋‚ด๊ฑฐ๋‚˜ map ํ•จ์ˆ˜ ์ฒ˜๋Ÿผ, ๋ฐ์ดํƒ€๋ฅผ ๋ถ„์‚ฐ ๋ฐฐ์น˜ ํ•˜๋Š” ๊ฒƒ ๋“ฑ์„ ๋“ค ์ˆ˜ ์žˆ๋‹ค.

Action : RDD ๊ฐ’์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฌด์–ธ๊ฐ€๋ฅผ ๊ณ„์‚ฐํ•ด์„œ ๊ฒฐ๊ณผ๋ฅผ ์ƒ์„ฑํ•ด ๋‚ด๋Š”๊ฒƒ์œผ๋กœ Count()์™€ ๊ฐ™์€ Operation๋“ค์„ ๋“ค ์ˆ˜ ์žˆ๋‹ค. RDD์˜ ๋ฐ์ดํƒ€ ๋กœ๋”ฉ ๋ฐฉ์‹์„ Lazy ๋กœ๋”ฉ ์ปจ์…‰์„ ์‚ฌ์šฉํ•˜๋Š”๋ฐ, ์˜ˆ๋ฅผ ๋“ค์–ด sc.textFile(โ€˜ํŒŒ์ผ')๋กœ ํŒŒ์ผ์„ ๋กœ๋”ฉํ•˜๋”๋ผ๋„ ์‹ค์ œ๋กœ ๋กœ๋”ฉ์ด ๋˜์ง€ ์•Š๋Š”๋‹ค. ํŒŒ์ผ์ด ๋กœ๋”ฉ๋˜์„œ ๋ฉ”๋ชจ๋ฆฌ์— ์˜ฌ๋ผ๊ฐ€๋Š” ์‹œ์ ์€ action์„ ์ด์šฉํ•ด์„œ ๊ฐœ์„ ํ•  ๋‹น์‹œ๋งŒ ์˜ฌ๋ผ๊ฐ„๋‹ค. RDD๋ฅผ action์„ ๋งŒ๋‚˜๊ธฐ ์ „๊นŒ์ง€ transformation์„ ์ฒ˜๋ฆฌํ•˜์ง€ ์•Š๊ณ  RDD๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ–๊ณ  ์žˆ๊ธฐ ๋ณด๋‹ค๋Š” reference๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค.

Data Partitioning ๋ถ„์‚ฐ ํ”„๋กœ๊ทธ๋žจ์—์„œ ๋™์‹  ๋น„์šฉ์ด ๋งค์šฐ์ปค ๋„คํŠธ์›Œํฌ ๋ถ€ํ•˜๋ฅผ ์ค„์ด๋Š” ๊ฒƒ์ด ์ค‘์š”. ํŒŒํ‹ฐ์…”๋‹์€ ๋„คํŠธ์›Œํฌ ๋ถ€ํ•˜๋ฅผ ํšจ์œจ์ ์œผ๋กœ ์ค„์ด๋Š” ๋ฐฉ๋ฒ• ์–ด๋–ค ํ‚ค์˜ ๋ชจ์Œ๋“ค์˜ ์ž„์˜์˜ ๋…ธ๋“œ์— ํ•จ๊ป˜ ๋ชจ์—ฌ ์žˆ๋Š” ๊ฒƒ์„ ๋ณด์žฅ ํ‚ค ์ค‘์‹ฌ์˜ ์—ฐ์‚ฐ์—์„œ ๋ฐ์ดํ„ฐ๊ฐ€ ์—ฌ๋Ÿฌ๋ฒˆ ์žฌ ์‚ฌ์šฉ๋ ๋•Œ๋งŒ ์˜๋ฏธ๊ฐ€ ์žˆ์Œ

Cache Transformation๊ณผ Action์˜ Operation์„ ํ†ตํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ณ . ๋ฉ”๋ชจ๋ฆฌ์— ๋กœ๋”ฉํ•˜๋Š” ์ž‘์—…์„ ์ง„ํ–‰ ํ•˜์ง€๋งŒ Action์˜ operation์„ ์‚ฌ์šฉํ•˜๊ณ  ๋‚˜๋ฉด, ๋ฐ์ดํ„ฐ๋Š” ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜ํ™˜ ํ•˜๊ณ  ๋ฉ”๋ชจ๋ฆฌ์—์„œ ์‚ฌ๋ผ์ง€๊ฒŒ ๋œ๋‹ค. action์„ ํ•  ๋•Œ๋งˆ๋‹ค ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ณ  ๋กœ๋”ฉํ•˜๊ณ  ๋ฐ˜ํ™˜ํ•˜๋Š” ์ž‘์—…์„ ๋ฐ˜๋ณต์ ์œผ๋กœ ์ง„ํ–‰ํ•˜๊ฒŒ ๋œ๋‹ค. RDD๋ฅผ ๋ฉ”๋ชจ๋ฆฌ์— ์˜ฌ๋ฆฐ ์ƒํƒœ์—์„œ ์žฌํ™œ์šฉํ•ด์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์€ Persist์„ ์‚ฌ์šฉํ•œ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด์„œ ๋ฉ”๋ชจ๋ฆฌ์— ์ƒ์ฃผ ์‹œํ‚ฌ ์ˆ˜ ์žˆ๊ณ , LRU ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ์˜ํ•ด์„œ ์‚ญ์ œ ๋˜๊ฑฐ๋‚˜ unpersist์„ ํ†ตํ•ด์„œ ๋ฉ”๋ชจ๋ฆฌ์—์„œ ์‚ญ์ œํ•  ์ˆ˜ ์žˆ๋‹ค. cache()๋Š” persist() ์—์„œ ์ €์žฅ ์˜ต์…˜์„ MEMORY_ONLY๋กœ ํ•œ ์˜ต์…˜๊ณผ ๋™์ผํ•˜๋‹ค.

๋ฐ์ดํ„ฐ ์–‘์ด ๋งŽ์„ ๊ฒฝ์šฐ์—๋Š” DISK์— ์ €์žฅํ•˜๋Š” ์˜ต์…˜ ๋ณด๋‹ค๋Š” ์ฐจ๋ผ๋ฆฌ persistํ•˜์ง€ ์•Š๊ณ (Serialize-Deserialize์˜ ์˜ค๋ฒ„ํ—ค๋“œ๊ฐ€ ํผ), ํ•„์š”ํ•  ๋•Œ๋งˆ๋‹ค ์žฌ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์ด ๋” ๋น ๋ฅผ ์ˆ˜ ์žˆ๋‹ค. lazy-execution์˜ ์žฅ์ ์€ ๋Šฆ๊ฒŒ ์‹คํ–‰๋˜๋Š” ๋™์•ˆ DAG๋ฅผ ํ†ตํ•ด์„œ transformation์˜ ์—ฐ์‚ฐ์˜ ์ฝ”์ŠคํŠธ๋ฅผ ๋ฏธ๋ฆฌ ๊ณ„์‚ฐํ•ด ์ตœ์ ์œผ๋กœ ๊ณ„์‚ฐํ•œ๋‹ค.

Narrow Dependency Narrow๋Š” ํ•œ ๋…ธ๋“œ์—์„œ ์ฒ˜๋ฆฌ ํ•  ์ˆ˜ ์žˆ๋Š” ์ผ์€ ๋ชจ์•„์„œ

Wide Dependency Wide๋Š” ๋ชจ๋“  ๋…ธ๋“œ์—์„œ ์ž‘์—…์„ ๋ชจ์•„์„œ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ๋•Œ๋ฌธ์— Network I/O๊ฐ€ ๋ฐœ์ƒํ•˜๊ณ  ๋Š๋ฆฌ๋‹ค.

4.Spark On Mesos

spark

Mesos - Master์— Spark ์„ค์น˜

https://spark.apache.org/downloads.html
wget http://d3kbcqa49mib13.cloudfront.net/spark-2.0.1-bin-hadoop2.7.tgz
Spark์ด ์„ค์น˜๋œ  /home/nkh/spark-2.0.0-bin-hadoop2.7/conf ํด๋”์—์„œ spark-env.sh.template -> spark-env.sh๋กœ ๋ณต์‚ฌ

๋ณต์‚ฌ๋œ spark-env.shํŒŒ์ผ์— ์•„๋ž˜ ๋‘ ๊ฐœ๋ฅผ  ์„ค์ •
Spark์—์„œ Mesos๋ฅผ ์‚ฌ์šฉํ•   ์ˆ˜ ์žˆ๋„๋ก
export MESOS_NATIVE_JAVA_LIBRARY=/usr/lib/libmesos.so

Mesos Master ์„œ๋ฒ„์—์„œ  python -m SimpleHTTPServer 9914 ์‹คํ–‰
Spark์„  ๋‹ค์šด ๋ฐ›์•„ Mesos Slave์—์„œ ์‹คํ–‰ ํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ค์ •
export SPARK_EXECUTOR_URI=http://130.211.188.2:9914/spark-2.0.0-bin-hadoop2.7.tgz

spark

Spark์‹คํ–‰

Client Mode
~/spark-2.0.1-bin-hadoop2.7/bin
bash pyspark --master mesos://10.128.0.2:5050 <- ๋‚ด๋ถ€ ์ฃผ์†Œ ๋ง๊ณ  ์™ธ๋ถ€ ์ฃผ์†Œ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์ œ๋Œ€๋กœ ๋™์ž‘ํ•˜์ง€ ์•Š์Œ. ๋‹ค๋ฅธ ์„ค์ •์ด ์žˆ๋Š”์ง€ ํ™•์ธ์ด ํ•„์š”.

spark

Spark์ด์‹คํ–‰๋˜๋ฉด ๋ชจ๋“  Mesos-Slave์— Spark์„ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ์ค€๋น„

spark

Spark์ž‘์—… ์ฒ˜๋ฆฌ ํ˜„ํ™ฉ http://130.211.188.2:4040/executors/ spark

cluster-mode ๋งˆ์Šคํ„ฐ์—์„œ ๋ฉ”์†Œ์Šค ๋””์ŠคํŒจ์ณ ์‹คํ–‰

~/spark-2.0.1-bin-hadoop2.7/sbin
bash start-mesos-dispatcher.sh --master mesos://10.128.0.2:5050

๋””์ŠคํŒจ์ฒ˜๋ฅผ ์‹คํ–‰์‹œํ‚ค๋ฉด ์•„๋ž˜์™€ ๊ฐ™์ด spark-submit์œผ๋กœ ๋ฐ›์„ ์ˆ˜ ์žˆ๋„๋ก ์„œ๋ฒ„๋ฅผ ์—ฐ๋‹ค.
Spark Command: /usr/lib/jvm/java-8-oracle/bin/java -cp starting org.apache.spark.deploy.mesos.MesosClusterDispatcher, logging to /home/mesos-master-1/spark-2.0.1-bin-hadoop2.7/logs/spark-mesos-master-1-org.apache.spark.deploy.mesos.MesosClusterDispatcher-1-mesos-master-1.out
mesos-master-1@mesos-master-1:~/spark-2.0.1-bin-hadoop2.7/sbin$ tail -f /home/mesos-master-1/spark-2.0.1-bin-hadoop2.7/logs/spark-mesos-master-1-org.apache.spark.deploy.mesos.MesosClusterDispatcher-1-mesos-master-1.out
16/10/05 08:00:29 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(mesos-master-1); groups with view permissions: Set(); users  with modify permissions: Set(mesos-master-1); groups with modify permissions: Set()
16/10/05 08:00:29 INFO Utils: Successfully started service on port 8081.
16/10/05 08:00:29 INFO MesosClusterUI: Bound MesosClusterUI to 0.0.0.0, and started at http://10.0.2.15:8081
I1005 08:00:29.956048 15739 sched.cpp:226] Version: 1.0.1
I1005 08:00:29.959381 15736 sched.cpp:330] New master detected at [email protected]:5050
I1005 08:00:29.959638 15736 sched.cpp:341] No credentials provided. Attempting to register without authentication
I1005 08:00:29.960786 15736 sched.cpp:743] Framework registered with 56f0b494-020b-4ef5-99d6-cd36ed1fdccc-0003
16/10/05 08:00:29 INFO MesosClusterScheduler: Registered as framework ID 56f0b494-020b-4ef5-99d6-cd36ed1fdccc-0003
16/10/05 08:00:29 INFO Utils: Successfully started service on port 7077.
16/10/05 08:00:29 INFO MesosRestServer: Started REST server for submitting applications on port 7077

ํด๋Ÿฌ์Šคํ„ฐ ๋ชจ๋“œ ์‹คํ–‰

๋‹ค๋ฅธ ์„œ๋ฒ„์—์„œ ์ŠคํŒ์„ ์„ค์น˜ ํ›„ spark-submit์œผ๋กœ ํ”„๋กœ๊ทธ๋žจ ์ œ์ถœ
/home/nkh/spark-2.0.0-bin-hadoop2.7/bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master mesos://10.128.0.2:7077 \
--deploy-mode cluster \
http://130.211.188.2:9914/cluster-test.py    <- ์‹คํ–‰ํ•ด์•ผํ•  ํŒŒ์ผ python -m SimpleHTTPServer 9914 ๋„์›Œ์„œ ์‚ฌ์šฉ

์œ„ ๋‚ด์šฉ์€ GCE์—์„œ ์‹คํ–‰ํ–ˆ๋˜ ๋ถ€๋ถ„
์•„๋ž˜ ๋‚ด์šฉ์€ virtualbox์—์„œ ์‹คํ–‰

/home/mesos-master-3/spark-2.0.1-bin-hadoop2.7/bin/spark-submit \
--class org.apache.spark.examples.SparkPi --master mesos://192.168.56.101:7077 --deploy-mode cluster \ http://192.168.56.106:9914/test.py


spark

https://github.com/namkunghyeon/python_spark

์ค‘์š”

--master mesos://192.168.56.101:7077 ์ปค๋„ฅ์…˜์ด ์•ˆ๋˜๋Š” ๊ฒฝ์šฐ netstat -pln๋กœ 7077ํฌํŠธ๊ฐ€ ์ •์ƒ์ ์œผ๋กœ ์•„์ดํ”ผ์— ๋ฐ”์ธ๋”ฉ๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธ
tcp6       0      0 192.168.56.101:7077     :::*                    LISTEN      15852/java
์•„๋ž˜ ์ฒ˜๋Ÿผ ๋˜์–ด ์žˆ๋‹ค๋ฉด
tcp6       0      0 127.0.1.1:7077     :::*                    LISTEN      15852/java

vi /etc/hosts์—์„œ
127.0.1.1      mesos-master-1 ํ•ด๋‹น ๋ถ€๋ถ„ ์‚ญ์ œ

์ •์ƒ์ ์ด๋ผ๋ฉด ์•„๋ž˜ ์ฒ˜๋Ÿผ ๊ธฐ๋ก์ด ๋˜์–ด ์žˆ์–ด์•ผํ•จ.
192.168.56.101  mesos-master-1
192.168.56.102  mesos-master-2
192.168.56.103  mesos-master-3

์ถœ์ฒ˜ : http://ourcstory.tistory.com/124 https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-mesos.html https://ogirardot.wordpress.com/2015/05/29/rdds-are-the-new-bytecode-of-apache-spark/ https://vanwilgenburg.wordpress.com/2015/05/10/how-to-run-a-spark-cluster-on-mesos-on-your-mac/ http://bcho.tistory.com/1024