Docker - MaxGolden/Personal_Blogs GitHub Wiki
How to use Docker
df = spark.read.json(["fileName1","fileName2"]) df = spark.read.json("data/*json") val text sc.textFile("file1,file2....")
1 install docker
// docker login
2 pull image
docker pull maxjin/5540projectjupyter:2 //image name is maxjin/5540projectjupyter and tag is 2
// Linux
docker run -p 8888:8888 -v `pwd`:/home/jovyan/work --name spark jupyter/pyspark-notebook
// Win10
docker run -p 8888:8888 -v /c/Users/Max/docker/jupyter:/home/jovyan/work --name spark maxjin/5540projectjupyter:2
// And just use jupyter notebook in localhost like usual.
// You should put your dataset under `/c/Users/Max/docker/jupyter` or `pwd` (depend on OS)
// And load data part should be `XX = spark.read.json("/home/jovyan/work/tweets.json")`
docker exec -it spark bash // Linux bash, not important
docker exec -it spark bash // Linux bash
There are some .ipynb created by me, you can upload your local file and use your local dataset and then get the results, all done
docker cp //home/jovyan/work/. spark://home/jovyan/
cp //home/jovyan/work/tweets.json //home/jovyan/
Others
docker iamge name by max
maxjin/5540projectjupyter:2
C:\Users\Max\docker
docker run -d -p 8888:8888 -v /c/docker/user/max/pyspark_jupyter --name spark jupyter/pyspark-notebook
Linux
docker run -p 8888:8888 -v `pwd`:/home/jovyan/work --name spark jupyter/pyspark-notebook
Win10
docker run -p 8888:8888 -v /c/Users/Max/docker/jupyter:/home/jovyan/work --name spark jupyter/pyspark-notebook
docker run -p 8888:8888 -v /c/Users/Max/docker/jupyter:/home/jovyan/work --name spark maxjin/5540projectjupyter:2
pull and push image
docker pull "image"
docker login
docker run -ut ubuntu
docker commit ID username/name:2
docker push username/name:2
docker exec -it spark bash
Reference
URL
https://hub.docker.com/u/maxjin/
My Docker HUB page and imagehttps://www.youtube.com/watch?v=PtT32MW2j9c