Yarn Log Aggregation Default values - isgaur/AWS-BigData-Solutions GitHub Wiki

  1. yarn.log-aggregation.retain-seconds - Default Value - 172800 . This means the aggregated logs are rotated after "yarn.log-aggregation.retain-seconds" seconds (i.e., by default 172800 seconds, so 48 hours).If launching an EMR cluster with the "Logging" feature enabled, all the cluster logs are copied inside the specified S3 path by a daemon running on each EMR node called "LogPusher".

    1. yarn.log-aggregation-enable - true - This is defaulted to "True".

One can override these default configurations for applications by supplying a configuration object for applications. You can use a shorthand syntax to provide the configuration or reference the configuration object in a JSON file.You can specify multiple classifications for multiple applications in a single JSON object.

For example : To enable log aggregation create the configuration file - myConfig.json, which contains the following and can be used while launching an EMR cluster:

      [
        {
          "Classification": "yarn-site",
          "Properties": {
            "yarn.log-aggregation-enable": "true",
            "yarn.log-aggregation.retain-seconds": "-1",
            "yarn.nodemanager.remote-app-log-dir": "s3:\/\/mybucket\/logs"
          }
        }
      ]

More information on how to configure applications by supplying a configuration object can be found here.

References Documentation:

[1] https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hadoop-config.html [2] https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html