Setting FAIR Scheduler instead for Spark on AWS EMR - isgaur/AWS-BigData-Solutions GitHub Wiki

Files those needs to be modified are - /etc/spark/conf/spark-defaults.conf and /etc/spark/conf/fair-scheduler.xml.

  1. In /etc/spark/conf/spark-defaults.conf , Set spark.scheduler.allocation.file to /etc/spark/conf/fair-scheduler.xml.
  
  
  2. In Yarn-site.xml changed yarn.scheduler.fair.allocation.file property to /etc/hadoop/conf/fair-scheduler.xml.
  5. Restarted YARN Resource manager  by using sudo stop hadoop-yarn-resourcemanager , sudo start hadoop-yarn-resourcemanager.
  6. Checked YARN UI - the scheduler was successfully changed from capacity to FAIR. 
  7. Execute spark application to test the changes.
  8. To run application in cluster mode these all configurations must propagate to all the worker nodes.Here are the steps to change configurations

     while provisioning a new AWS EMR cluster.For NEW cluster please refer [1] and for running EMR cluster please refer [2]

  9. For Dynamic allocation for spark. Based on the application workload please set it to True or False. It works best based on the configuration at the application level rather than setting it up at the EMR cluster level. Here is the document for your reference.[3]

[1] https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps-create-cluster.html [2] https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps-running-cluster.html [3] https://blog.cloudera.com/how-to-tune-your-apache-spark-jobs-part-2/