Yarn Log Aggregation vs EMR Log Aggregation - isgaur/AWS-BigData-Solutions GitHub Wiki
To Enable/disable the yarn log aggregation on an AWS EMR cluster . The property you would need to modified is as below in yarn-site.xml and set it to false.
<property>
<name>yarn.log-aggregation-enable</name>
<value>false</value>
</property>
<property>
-
Once done , restart the following services - you can try stop and start
A. hadoop-hdfs-namenode B. hadoop-mapreduce-historyserver C. hadoop-yarn-resourcemanager
-
For /var/log/spark/apps -->> Spark History Server manages the Spark logs present in the "/var/log/spark/apps" location. By default, "spark.history.fs.cleaner.enabled" is set as false so that Spark History Sever does not delete the files. Hence, in order to rotate/periodically delete the spark history logs then you can add following properties in the spark-default.conf which present in "/etc/spark/conf" and restart the spark history server at your end.
========> spark.history.fs.cleaner.enabled true spark.history.fs.cleaner.maxAge 12h spark.history.fs.cleaner.interval 1h
- EMR Log aggregation vs YARN Log aggregation :
EMR had its own log aggregation into S3 bucket for persistent logging and debugging. EMR Web console provides similar feature as “yarn logs -applicationId” if you turn on debugging feature.
YARN log aggregation stores the application container logs in HDFS , where as EMR’s LogPusher (process to push logs to S3 as persistent option) needed the files in local file system. After post-aggregation , the default behavior of YARN is to copy the containers logs in local machines of core-nodes to HDFS and then after post-aggregation , DELETE those local files on individual core-nodes. Since they are being deleted by Nodemanager on individual node’s , EMR has no way to save those logs to a more persistent storage such as S3.
- The property “yarn.log-aggregation.enable-local-cleanup” property in yarn-site.xml on respective core/task nodes. This property is not public and can only be set on EMR distributions. This option is by default set to FALSE which means the cleanup on local machines WILL NOT take place. Logpusher will need this logs on local machines to push to S3 and LogPusher is the one which is responsible for removing those local logs after they are copied over and after certain retention period(4 hours for containers ).
For the logs to be deleted from local disks, we need to flip it to true with configurations API while launching the cluster. On live cluster, all core/task node’s yarn-site.xml should be updated and NM should be restarted. After the restart old container logs might still be present.
*** This options might not recommended because the Logpusher , will NOT be able to push those local container logs to customer’s(service’s) S3 if this option is set to true. ** and the Only source of container logs will be aggregated logs on HDFS which is not so persistent.
-
If you decide not to rely on both EMR LogPusher(that pushes to s3) and YARN Nodemanager(that aggregates logs to HDFS) , then there’s few things to consider
-
Disable the YARN log aggregation using yarn.log-aggregation-enable = false . This means the YARN NodeManager on core nodes will not push the respective container logs on local disk to centralized HDFS. Note that Lohpusher can still delete(and can try to upload) the container logs to s3 after 4 hours(see /etc/logpusher/hadoop.config)
-
Once log aggregation is disabled, yarn.nodemanager.log.retain-seconds comes under picture, which will delete logs on local disks by default after 3 hours. This means Nodemanager can try to remove logs even before Logpusher tries to send them to s3 and delete the logs itself. So, make sure you increase this time so that your custom monitoring application has enough time to send logs to your preferred destination.
-
-
The logs in the EMR Cluster is handled by the logpusher deamon. The logpusher monitors all application logs and periodically uploads them to S3, once logs are older than the retention period mentioned in the config file present under "/etc/logpusher" since the last update they will be removed. You can change the retention period of the logs in the config file and restart the logpusher to update the values. ========> $ sudo /etc/init.d/logpusher restart