Spark on YARN - datacouch-io/spark-java GitHub Wiki

Understanding the difference between YARN Client and Cluster Modes

Objective: To test the consequences of making changes to container sizes, Maximum Application settings, and queue states; configure application preemption and create a set of queues and leaf queues which logically represent the organizations you support and their SLAs

  1. Open terminal

  2. Change to the ~/spark3/examples/jars directory.

    cd spark3/examples/jars
    
  3. Open a second terminal window.

  4. Once again, change to the ~/spark3/examples/jars directory.

    cd ~/spark3/examples/jars
    
  5. Position the two terminal windows so that both are visible on your screen side-by-side.

  6. Import sherlock.txt data into HDFS

    hdfs dfs -mkdir data/  
    hdfs dfs -put ~/data/sherlock.txt data/
    
  7. Type the Spark Submit command to execute wordcount in both terminal windows but do not press Enter to execute it.

Terminal One

spark3-submit --class org.apache.spark.examples.JavaWordCount --master yarn --deploy-mode client spark-examples_2.12-3.1.2.jar data/sherlock.txt

List Application

  1. While Application is running, open another terminal window and execute below command:

    $ yarn application -list
    

Output

To view Logs of the Application, execute below command:

yarn logs -applicationId {your application  id}

Kill an Application

  1. Below command kills a running application -

yarn application -kill {your application  id}

Exploring the YARN Cluster

  1. Open the browser and connect to the CM at the URL http://localhost:7180

  2. Login to the CM UI using the username admin and the password admin

  3. Click Services in the CM UI.

  4. Select the YARN service on the Services page.

  1. Click Web UI and select ResourceManager UI.

The ResourceManager UI Web interface opens in another browser tab.

NOTE: If simply clicking the quick link fails to open the ResourceManager UI, replace the default URL in the browser tab with datacouch.training.io:8088/cluster

  1. This page will be refreshed in a moment once applications are running. Leave this tab open.

Terminal Two

spark3-submit --class org.apache.spark.examples.JavaWordCount --master yarn --deploy-mode cluster spark-examples_2.12-3.1.2.jar data/sherlock.txt

Open ResourceManager UI Web interface in another browser tab and click on application ID