Enable Spark UI for AWS EMR - isgaur/AWS-BigData-Solutions GitHub Wiki

The web interfaces in EMR are hosted on various ports on localhost and are enabled by default. To access these Web UI on your local machine, you can establish an SSH tunnel with dynamic ports. However, If EMR cluster is hosted in a VPC, therefore, establishing a direct tunnel in this case will not be possible since the cluster's master node does not have a public IP.

A work around for this would be to create a bastion host in your VPC under a public subnet and establishing a tunnel to that. The bastion host can then help us establish a connection to the EMR cluster's master node.

====== Create and Configure Bastion Host ==============================================================

1) Launch a bastion host( an EC2 instance with a public subnet ) and make sure that the bastion and the EMR are in the same VPC.

2) Check if the bastion can access the EMR Master node. To do this, SSH into the bastion from your local machine, and then, from the bastion try to SSH into the master node of the cluster(using the EMR clusters internal IP).

3) Next, we establish a tunnel. To do this, please execute the following on your local machine:
        ssh -i key_name.pem ec2-user@Public IP(Bastion Host)  -ND 8157

The above command establishes a dynamic port (-D) to your bastion from your local machine.

====== Enable Proxies ==============================================================================

Note: If you already have proxies enabled(foxy proxy), please ignore this section and continue reading

1) On your web browser (I used firefox and it worked like a charm!), download the foxy proxy extension [1], and make sure "Use Enabled Proxies By Patterns and Priority" option is selected.
 
2) I am attaching my foxy proxy config which you can import. To do this, follow the below:

    FoxyProxy options > Import > Under "Import Settings from FoxyProxy 6.x (current version)", click Browse > Provide the file I have attached(foxyproxy.json) > 
    Accept any overwrite warnings. Make sure emr-socks-proxy is enabled as well after import is completed
            
    Note: While establishing a tunnel, if you used a port different from 8157, then you will have to modify your foxyproxy settings to update the port number(FoxyProxy options > Edit emr-socks-proxy > Update Port > Save)

You should now be able to view your web UI on your firefox using the below format:

    http://<EMR DNS>:<PORT>/

The EMR DNS will look something like ip-10-xx-xx-xx.ec2.internal.

Assuming you have some SparkContext active (eg ran 'pyspark'), you can access it on port 4040. Please note that for each spark application session, the port keeps on incrementing. It initially starts from port 4040.

The Spark History Server can be access on port 18080. The Resource Manager on port 8088. For a complete list of available ports, please look at the attached AWS documentation [2]

References:

[1] Foxy Proxy Extension for Firefox: https://addons.mozilla.org/en-US/firefox/addon/foxyproxy-standard/

[2] EMR Web Interfaces: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-web-interfaces.html

⚠️ **GitHub.com Fallback** ⚠️