Spark Application logs location on AWS EMR - isgaur/AWS-BigData-Solutions GitHub Wiki

To start with, Spark executes distributively and containers are launched on different nodes so the application logs reside on those worker(CORE/TASK) nodes which can be accessed using yarn by running the command "yarn logs --applicationId APPLICATION_ID” for the "COMPLETED" application.

Where as for the "RUNNING" applications these logs are stored in HDFS on the MASTER node of an AWS EMR due to log aggregation concept and can check be accessed using the below command:

hdfs dfs -ls /var/log/hadoop-yarn/apps/hadoop/logs/

These running applications can also be accessed from the spark history server UI[1] as well. The spark-history-server data can also be accessed from HDFS by running the command

hdfs dfs -ls /var/log/spark/apps/

Also the executor logs ( i.e. the tasks for any given application which are running on core/task nodes ) for the running YARN application , each executor will also create application logs which are stored locally.The location of these application logs on the CORE/TASK nodes is as below.Please perform "sudo su" to access these logs on CORE/TASK nodes.

/mnt/var/log/hadoop-yarn/containers/application_ID/container_ID

Please consider using the above options to access the logs while the spark application is running using YARN in real time.