EMR 016 Thrift Transport TTransportException Socket Read Timeout - qyjohn/AWS_Tutorials GitHub Wiki
A spark application working with Hive threw the following exception:
19/06/19 08:06:42 WARN RetryingMetaStoreClient: MetaStoreClient lost connection. Attempting to reconnect.
org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
... ...
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
... 60 more
19/06/19 08:16:45 WARN HiveClientImpl: Deadline exceeded
19/06/19 08:16:45 ERROR CreateDataSourceTableAsSelectCommand: Failed to write to table db_name.table_name
org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out;
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
at org.apache.spark.sql.hive.HiveExternalCatalog.listPartitions(HiveExternalCatalog.scala:1190)
... ...
By reviewing the logs for this particular Spark application_1560183435838_xxxxx, there were two execution attempts, both failed with the same error message.
In attempt #1, we observed the following activities related to Hive metastore:
19/06/19 07:36:10 INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
19/06/19 07:36:10 INFO metastore: Trying to connect to metastore with URI thrift://xxxxxxxxxx:9083
19/06/19 07:36:10 INFO metastore: Connected to metastore.
... ...
19/06/19 07:46:13 WARN RetryingMetaStoreClient: MetaStoreClient lost connection. Attempting to reconnect.
org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
... ...
19/06/19 07:46:14 INFO metastore: Trying to connect to metastore with URI thrift://xxxxxxxxxx:9083
19/06/19 07:46:14 INFO metastore: Connected to metastore.
19/06/19 07:56:14 WARN HiveClientImpl: HiveClient got thrift exception, destroying client and retrying (0 tries remaining)
org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
at org.apache.hadoop.hive.ql.metadata.Hive.getPartitions(Hive.java:2134)
... ...
19/06/19 07:56:15 WARN HiveClientImpl: Deadline exceeded
19/06/19 07:56:15 ERROR CreateDataSourceTableAsSelectCommand: Failed to write to table db_name.table_name
org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out;
... ...
In attempt #2, we observed the following activities related to Hive metastore:
19/06/19 07:56:39 INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
19/06/19 07:56:40 INFO metastore: Trying to connect to metastore with URI thrift://xxxxxxxxxx:9083
19/06/19 07:56:40 INFO metastore: Connected to metastore.
... ...
19/06/19 08:06:42 WARN RetryingMetaStoreClient: MetaStoreClient lost connection. Attempting to reconnect.
org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
... ...
19/06/19 08:06:43 INFO metastore: Trying to connect to metastore with URI thrift://xxxxxxxxxx:9083
19/06/19 08:06:43 INFO metastore: Connected to metastore.
19/06/19 08:16:44 WARN HiveClientImpl: HiveClient got thrift exception, destroying client and retrying (0 tries remaining)
org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
at org.apache.hadoop.hive.ql.metadata.Hive.getPartitions(Hive.java:2134)
... ...
19/06/19 08:16:45 WARN HiveClientImpl: Deadline exceeded
19/06/19 08:16:45 ERROR CreateDataSourceTableAsSelectCommand: Failed to write to table db_name.table_name
org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out;
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
... ...
In both cases, there was a read timeout after the application was connected to metastore for 10 minutes, then the connection was reestablished, but another timeout occurred again 10 minutes later. This observation leads to the speculation that the CreateDataSourceTableAsSelectCommand took more than 10 minutes to run, exceeding the hive.metastore.client.socket.timeout setting on the EMR cluster, which is 600 second (10 minutes) by default.
Solution: add the following configuration to your hive-site.xml. In the following example, I use 1800 seconds but you might want to use a bigger value, depending on your actual use case.
<property>
<name>hive.metastore.client.socket.timeout</name>
<value>1800s</value>
</property>
You need to restart Hive to make the new configuration take effect.
sudo stop hive-server2
sudo start hive-server2