Hbase - 9dian/Index GitHub Wiki
[root@**2 ~]# hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot TestTable-snapshot1 -copy-to hdfs://**4:8020/tmp/bak/
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release 21/02/05 14:22:40 INFO snapshot.ExportSnapshot: Copy Snapshot Manifest 21/02/05 14:22:40 INFO hdfs.DFSClient: Created token for hdfs: HDFS_DELEGATION_TOKEN owner=hdfs/**[email protected], renewer=yarn, realUser=, issueDate=1612506160711, maxDate=1613110960711, sequenceNumber=285, masterKeyId=112 on 10.18.60.113:8020 21/02/05 14:22:40 INFO security.TokenCache: Got dt for hdfs://**4:8020; Kind: HDFS_DELEGATION_TOKEN, Service: 10.18.60.113:8020, Ident: (token for hdfs: HDFS_DELEGATION_TOKEN owner=hdfs/**[email protected], renewer=yarn, realUser=, issueDate=1612506160711, maxDate=1613110960711, sequenceNumber=285, masterKeyId=112) 21/02/05 14:22:40 INFO client.RMProxy: Connecting to ResourceManager at **4/10.18.60.113:8032 21/02/05 14:22:41 INFO snapshot.ExportSnapshot: Loading Snapshot 'TestTable-snapshot1' hfile list 21/02/05 14:22:41 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 21/02/05 14:22:41 INFO mapreduce.JobSubmitter: number of splits:1 21/02/05 14:22:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1612494344882_0001 21/02/05 14:22:41 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: 10.18.60.113:8020, Ident: (token for hdfs: HDFS_DELEGATION_TOKEN owner=hdfs/**[email protected], renewer=yarn, realUser=, issueDate=1612506160711, maxDate=1613110960711, sequenceNumber=285, masterKeyId=112) 21/02/05 14:22:42 INFO impl.YarnClientImpl: Submitted application application_1612494344882_0001 21/02/05 14:22:42 INFO mapreduce.Job: The url to track the job: http://**4:8088/proxy/application_1612494344882_0001/ 21/02/05 14:22:42 INFO mapreduce.Job: Running job: job_1612494344882_0001 21/02/05 14:22:45 INFO mapreduce.Job: Job job_1612494344882_0001 running in uber mode : false 21/02/05 14:22:45 INFO mapreduce.Job: map 0% reduce 0% 21/02/05 14:22:46 INFO mapreduce.Job: Job job_1612494344882_0001 failed with state FAILED due to: Application application_1612494344882_0001 failed 2 times due to AM Container for appattempt_1612494344882_0001_000002 exited with exitCode: -1000 For more detailed output, check application tracking page:http://**4:8088/proxy/application_1612494344882_0001/Then, click on links to logs of each attempt. Diagnostics: Application application_1612494344882_0001 initialization failed (exitCode=255) with output: main : command provided 0 main : run as user is hdfs main : requested yarn user is hdfs Requested user hdfs is banned Failing this attempt. Failing the application. 21/02/05 14:22:46 INFO mapreduce.Job: Counters: 0 21/02/05 14:22:46 ERROR snapshot.ExportSnapshot: Snapshot export failed org.apache.hadoop.hbase.snapshot.ExportSnapshotException: Copy Files Map-Reduce Job failed at org.apache.hadoop.hbase.snapshot.ExportSnapshot.runCopyJob(ExportSnapshot.java:825) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.run(ExportSnapshot.java:1020) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.innerMain(ExportSnapshot.java:1094) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.main(ExportSnapshot.java:1098)
'Requested user hdfs is banned', Yarn的配置中banned.users包括:hdfs, yar, mapred, bin 。
[root@**2 ~]# hdfs dfs -ls hdfs://**4:8020/tmp/
Found 6 items drwxrwxrwx - hdfs supergroup 0 2021-02-05 14:26 hdfs://**4:8020/tmp/.cloudera_health_monitoring_canary_files drwxrwxr-x - hbase supergroup 0 2021-02-04 14:45 hdfs://**4:8020/tmp/bak drwxr-xr-x - yarn supergroup 0 2020-07-09 22:47 hdfs://**4:8020/tmp/hadoop-yarn drwx--x--x - hbase supergroup 0 2020-11-11 16:50 hdfs://**4:8020/tmp/hbase-staging drwx-wx-wx - hive supergroup 0 2020-10-30 15:15 hdfs://**4:8020/tmp/hive drwxrwxrwt - mapred hadoop 0 2020-09-16 17:14 hdfs://**4:8020/tmp/logs
[root@**2 ~]# hdfs dfs -ls hdfs://**4:8020/tmp/bak/
Found 1 items drwxr-xr-x - hbase supergroup 0 2021-02-04 14:45 hdfs://**4:8020/tmp/bak/.hbase-snapshot
[root@**2 ~]# hdfs dfs -chmod -R g+w hdfs://**4:8020/tmp/bak/
[root@**2 ~]# hdfs dfs -ls hdfs://**4:8020/tmp/bak/
Found 1 items drwxrwxr-x - hbase supergroup 0 2021-02-04 14:45 hdfs://**4:8020/tmp/bak/.hbase-snapshot
[root@**2 ~]# hdfs dfs -ls hdfs://**4:8020/tmp/bak/.hbase-snapshot
Found 1 items drwxrwxr-x - hbase supergroup 0 2021-02-05 14:22 hdfs://**4:8020/tmp/bak/.hbase-snapshot/.tmp
[root@**1 ~]# pssh -h /cdhdata/bak/list_krb_clients -P -l root usermod -a -G supergroup hive
[1] 15:06:31 [SUCCESS] **1 [2] 15:06:31 [SUCCESS] **3 [3] 15:06:31 [SUCCESS] **2 [4] 15:06:31 [SUCCESS] **4
[root@**1 ~]# pssh -h /cdhdata/bak/list_krb_clients -P -l root usermod -a -G supergroup hbase
[1] 15:06:38 [SUCCESS] **1 [2] 15:06:38 [SUCCESS] **3 [3] 15:06:38 [SUCCESS] **2 [4] 15:06:38 [SUCCESS] **4
[root@**1 ~]# pssh -h /cdhdata/bak/list_krb_clients -P -l root tail -2 /etc/group
**2: yy:x:1006: supergroup:x:1101:hive,hbase [1] 15:06:49 [SUCCESS] **2 **1: yy:x:1007: supergroup:x:1101:hbase,hive [2] 15:06:49 [SUCCESS] **1 **4: yy:x:1009: supergroup:x:1103:hive,hbase [3] 15:06:49 [SUCCESS] **4 **3: yy:x:1009: supergroup:x:1103:hive,hbase [4] 15:06:49 [SUCCESS] **3
[root@**1 ~]# su - hive
[hive@**1 ~]$ hdfs dfs -rm -f -skipTrash hdfs://**4:8020/tmp/bak/info.txt
Deleted hdfs://**4:8020/tmp/bak/info.txt
[hive@**1 ~]$ hdfs dfs -copyFromLocal /cdhdata/bak/list_krb_clients hdfs://**4:8020/tmp/bak/
[hive@**1 ~]$ hdfs dfs -ls hdfs://**4:8020/tmp/bak
Found 1 items -rw-r--r-- 3 hive supergroup 32 2021-02-05 15:11 hdfs://**4:8020/tmp/bak/list_krb_clients
[root@**1 ~]# sudo -u hbase hdfs dfs -cp hdfs://**4:8020/tmp/bak/.hbase-snapshot/TestTable-snapshot1 hdfs://**4:8020/hbase/.hbase-snapshot/
21/02/05 16:27:02 WARN security.UserGroupInformation: PriviledgedActionException as:hbase (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 21/02/05 16:27:02 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 21/02/05 16:27:02 WARN security.UserGroupInformation: PriviledgedActionException as:hbase (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] cp: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "**1/10.18.60.114"; destination host is: "**4":8020;
[root@**1 ~]# klist
Ticket cache: FILE:/tmp/krb5cc_0 Default principal: hbase/**[email protected] Valid starting Expires Service principal 02/05/2021 14:36:36 02/06/2021 14:36:36 krbtgt/[email protected] renew until 02/09/2021 18:16:03
[root@**1 ~]# hdfs dfs -cp hdfs://**4:8020/tmp/bak/.hbase-snapshot/TestTable-snapshot1 hdfs://**4:8020/hbase/.hbase-snapshot/
[root@**1 ~]# hdfs dfs -ls hdfs://**4:8020/tmp/bak/
[root@**1 ~]# hdfs dfs -ls hdfs://**4:8020/tmp/bak/archive/
[root@**1 ~]# hdfs dfs -ls hdfs://**4:8020/tmp/bak/archive/data/
[root@**1 ~]# hdfs dfs -ls hdfs://**4:8020/tmp/bak/archive/data/default/
[root@**1 ~]# hbase hbck -details
[root@**1 ~]# hdfs dfs -cp -f hdfs://**4:8020/tmp/bak/archive/data/default/TestTable hdfs://**4:8020/hbase/archive/data/default/
用JAVA API连接HBase的过程中出现了RpcRetryingCaller Call exception,日志显示多次连接失败并重试,示例异常日志如下。
看着像是请求Zookeeper服务器的报错 ... 待分析。 [INFO ] 2021-07-20 20:24:41.154 [hconnection-0x71cf1b07-metaLookup-shared--pool2-t1] RpcRetryingCaller - Call exception, tries=10, retries=35, started=38390 ms ago, cancelled=false, msg=**3 row 'speechdialog,,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=**3,60020,1626745338656, seqNum=0 [INFO ] 2021-07-20 20:24:51.234 [hconnection-0x71cf1b07-metaLookup-shared--pool2-t1] RpcRetryingCaller - Call exception, tries=11, retries=35, started=48472 ms ago, cancelled=false, msg=**3 row 'speechdialog,,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=**3,60020,1626745338656, seqNum=0 [INFO ] 2021-07-20 20:25:29.671 [hconnection-0x71cf1b07-metaLookup-shared--pool2-t2] RpcRetryingCaller - Call exception, tries=10, retries=35, started=38330 ms ago, cancelled=false, msg=**3 row 'speechdialog,,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=**3,60020,1626745338656, seqNum=0 [INFO ] 2021-07-20 20:25:39.687 [hconnection-0x71cf1b07-metaLookup-shared--pool2-t2] RpcRetryingCaller - Call exception, tries=11, retries=35, started=48346 ms ago, cancelled=false, msg=**3 row 'speechdialog,,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=**3,60020,1626745338656, seqNum=0 [INFO ] 2021-07-20 20:26:18.125 [hconnection-0x71cf1b07-metaLookup-shared--pool2-t3] RpcRetryingCaller - Call exception, tries=10, retries=35, started=38231 ms ago, cancelled=false, msg=**3 row 'speechdialog,,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=**3,60020,1626745338656, seqNum=0 [INFO ] 2021-07-20 20:26:28.178 [hconnection-0x71cf1b07-metaLookup-shared--pool2-t3] RpcRetryingCaller - Call exception, tries=11, retries=35, started=48285 ms ago, cancelled=false, msg=**3 row 'speechdialog,,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=**3,60020,1626745338656, seqNum=0 [INFO ] 2021-07-20 20:27:06.826 [hconnection-0x71cf1b07-metaLookup-shared--pool2-t4] RpcRetryingCaller - Call exception, tries=10, retries=35, started=38342 ms ago, cancelled=false, msg=**3 row 'speechdialog,,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=**3,60020,1626745338656, seqNum=0 [INFO ] 2021-07-20 20:27:16.858 [hconnection-0x71cf1b07-metaLookup-shared--pool2-t4] RpcRetryingCaller - Call exception, tries=11, retries=35, started=48374 ms ago, cancelled=false, msg=**3 row 'speechdialog,,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=**3,60020,1626745338656, seqNum=0
解决方法: hosts文件添加HBase集群各个节点hostname,如:
**.**.**.114 *** **.**.**.1*5 *** **.**.**.1*2 *** **.**.**.1*3 ***
在基于 HDFS 存储的 HBase 中,主要有两种调优方式:
绕过RPC的选项,称为short circuit reads 开启让HDFS推测性地从多个datanode读数据的选项,称为 hedged reads
一般来说,HBase RegionServer 与 HDFS DataNode在一起,所以可以实现很好的数据本地化。但是在早期Hadoop 1.0.0版本中,RegionServer 在与 DataNode通过RPC通信时,与其他常规客户端一样,需要经过整个RPC通信过程。在 Hadoop 1.0.0 版本之后,加入了short-circuit read选项,它可以完全绕过RPC栈,通过本地clients直接从底层文件系统读数据。
Hadoop 2.x 之后进一步优化了这个实现。当前DataNode与HDFS客户端(HBase也是其中一个)可以使用一个称为file descriptor passing的功能,使得数据交换全部发生在OS kernel层。相较于之前的实现会更快,更高效。使得多个进程在同一个实例上进行高效地交互。
在Hadoop中,可以参考以下官方文档配置启用short-circuit reads:
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html
下面是一个配置参考,需要在hbase-site.xml 与 hdfs-site.xml 两个配置文件中均配置,且配置完后需重启进程:
<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
<description>
This configuration parameter turns on short-circuit local reads.
</description>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>/var/lib/hadoop-hdfs/dn_socket</value>
<description>
Optional. This is a path to a UNIX domain socket that will be used for
communication between the DataNode and local HDFS clients.
If the string "_PORT" is present in this path, it will be replaced by the
TCP port of the DataNode.
</description>
</property>
需要注意的是:dfs.domain.socket.path指定的文件(可以先不存在)的owner必须为OS的root用户,或者是运行datanode服务的用户。
最后,short-circuit read buffers的默认大小由dfs.client.read.shortcircuit.buffer.size指定,对于很繁忙的HBase 集群来说,默认值可能会比较高。在HBase中,如果没有没有显示指定此值,则会从默认的 1MB 直接降为 128KB(使用的是hbase.dfs.client.read.shortcircuit.buffer.size 属性,默认为128KB)。
在HBase 中的HDFS客户端,会为每个打开的data block分配一个direct byte buffer ,大小为参数hbase.dfs.client.read.shortcircuit.buffer.size 指定大小。此功能可以让HBase永久保持它的HDFS文件打开,所以会很快地增加。
Hedged reads是HDFS的一个功能,在Hadoop 2.4.0之后引入。一般来说,每个读请求都会由生成的一个线程处理。在Hedged reads 启用后,客户但可以等待一个预配置的时间,如果read没有返回,则客户端会生成第二个读请求,访问同一份数据的另一个block replica。之后,其中任意一个read 先返回的话,则另一个read请求则被丢弃。
Hedged reads使用的场景是:解决少概率的slow read(可能由瞬时错误导致,例如磁盘错误或是网络抖动等)。
HBase region server 是一个 HDFS client,所以我们可以在HBase中启用hedged reads,通过在 RegionServer 中的 hbase-site.xml 配置增加以下参数,并且根据实际环境对参数进行调整:
def.client.hedged.read.threadpool.size:默认值为0。指定有多少线程用于服务hedged reads。如果此值设置为0(默认),则hedged reads为disabled状态 dfs.client.hedged.read.threshold.millis:默认为500(0.5秒):在spawning 第二个线程前,等待的时间。 下面是一个示例配置,设置等待阈值为10ms,并且线程数为20:
<property>
<name>dfs.client.hedged.read.threadpool.size</name>
<value>20</value>
</property>
<property>
<name>dfs.client.hedged.read.threshold.millis</name>
<value>10</value>
</property>
需要注意的是:hedged reads 在HDFS中的功能,类似于MapReduce中的speculative execution:需要消耗额外的资源。例如,根据集群的负载与设定,它可能需要触发很多额外的读操作,且大部分是发送到远端的block replicas。产生的额外的I/O、以及网络可能会对集群性能造成较大影响。对此,需要在生产环境中的负载进行测试,以决定是否使用此功能。