EMR 012 Presto Unable to execute HTTP request: Timeout waiting for connection from pool - qyjohn/AWS_Tutorials GitHub Wiki

When you run queries in Presto against a large number of files on S3, you might encounter the following exception:

2019-06-06T02:02:59.923Z	ERROR	remote-task-callback-677	com.facebook.presto.execution.StageStateMachine	Stage 20190606_020208_00041_fnadn.1 failed
com.facebook.presto.spi.PrestoException: Error opening Hive split s3://xxxx-xxxx-xxxx/part-00002-6d5a3961-e8ea-47ff-a631-0293ba44f5c5.c000.gz.parquet (offset=0, length=32514): Unable to execute HTTP request: Timeout waiting for connection from pool
	at com.facebook.presto.hive.parquet.ParquetHiveRecordCursor.createParquetRecordReader(ParquetHiveRecordCursor.java:386)
	at com.facebook.presto.hive.parquet.ParquetHiveRecordCursor.<init>(ParquetHiveRecordCursor.java:165)
... ...
	at java.lang.Thread.run(Thread.java:748)
Caused by: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1116)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1066)
... ...
	at com.facebook.presto.hive.parquet.ParquetHiveRecordCursor.createParquetRecordReader(ParquetHiveRecordCursor.java:333)
	... 17 more
Caused by: com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool
	at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:286)
	at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:263)
... ...
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1058)
	... 48 more

This error usually happens when Presto reaches the Amazon EMR File System (EMRFS) connection limit for Amazon Simple Storage Service (Amazon S3). To resolve this error, you will need to increase the value of the fs.s3.maxConnections property (default=500). You can increase the value of fs.s3.maxConnections on a running cluster with the following steps:

  • SSH to the master node.

  • Open the emrfs-site.xml file as sudo. This file is located in the /usr/share/aws/emr/emrfs/conf directory.

sudo nano /usr/share/aws/emr/emrfs/conf/emrfs-site.xml
  • Set the fs.s3.maxConnections property to a value above 500. In the following example, the value is set to 1000. You might need to choose a higher value, depending on how many concurrent S3 connections your Presto queries need.
<property>
  <name>fs.s3.maxConnections</name>
  <value>1000</value>
</property>
  • Restart the presto-server using following commands:
sudo stop presto-server
sudo start presto-server
  • Repeat the above-mentioned steps on all core nodes and task nodes. Use the same fs.s3.maxConnections value that you used on the master node.

  • Run the Presto query again. Your application (Presto) should use the new value for fs.s3.maxConnections after the service restart.

To set the value of the fs.s3.maxConnections property on all nodes when you launch a new cluster, use a configuration object similar to the following. For more information, see Configuring Applications.

[
    {
      "Classification": "emrfs-site",
      "Properties": {
        "fs.s3.maxConnections": "1000",
      }
    }
]
⚠️ **GitHub.com Fallback** ⚠️