EMR 015 Spark HTTP HTTPS Proxy - qyjohn/AWS_Tutorials GitHub Wiki

(1) pyspark command line

pyspark --driver-java-options="-Dhttp.proxyHost=hostname -Dhttp.proxyPort=8080 -Dhttps.proxyHost=hostname -Dhttps.proxyPort=8443" --packages <somePackage>

(2) spark-shell command line

spark-shell --conf "spark.driver.extraJavaOptions=-Dhttp.proxyHost=hostname -Dhttp.proxyPort=8080 -Dhttps.proxyHost=hostname -Dhttps.proxyPort=8443" --packages <somePackage>

(2) Alternatively, you can edit the /etc/spark/conf/spark-defaults.conf, and find the line for spark.driver.extraJavaOptions, then add the following to the end of the same line:

-Dhttp.proxyHost=hostname -Dhttp.proxyPort=8080 -Dhttps.proxyHost=hostname -Dhttps.proxyPort=8443