configure Log4J on AWS EMR for Spark - isgaur/AWS-BigData-Solutions GitHub Wiki

Provide the following configuration in /usr/lib/spark/conf/spark-defaults.conf .

  spark.driver.extraClassPath /my/appender/appender.jar
  spark.executor.extraClassPath /my/appender/appender.jar

========================================================================== Steps for modifying spark-defaults.conf via SSH (SSH into Master Node):

  1) Take backup of the file /usr/lib/spark/conf/spark-defaults.conf

  2) Copy ps-log4j.jar file to /hadoop/home (this can be any local directory where the file is accessible by the user running the job)

  3) Append the "ps-log4j.jar" location to properties 'spark.driver.extraClassPath' and 'spark.executor.extraClassPath' in spark-defaults.conf

Sample extraClassPath after proposed modifications should look like below:

  spark.driver.extraClassPath      :/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/home/hadoop/ps-log4j.jar

  spark.executor.extraClassPath    :/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/home/hadoop/ps-log4j.jar

Note: The ps-log4j.jar is referenced at the end of each of these extraClassPath parameters.

Once you modify the spark-defaults.conf , in the spark-submit command we do not have to mention the --conf related to log4j . Hence the spark-submit command would be like -

  spark-submit --deploy-mode cluster \
  --files /local/path/to/my_custom_log4j.properties' \
  --class <your-class-name> \
  --jars /local/path/to/appender.jar \
  /path/to/application/jar