hue 3.9安装与使用 - lg1011/SparkLearn GitHub Wiki

介绍
HUE=Hadoop User Experience Hue是一个开源的Apache Hadoop UI系统，由Cloudera Desktop演化而来，最后Cloudera公司将其贡献给Apache基金会的Hadoop社区，它是基于Python Web框架Django实现的。
通过使用Hue我们可以在浏览器端的Web控制台上与Hadoop集群进行交互来分析处理数据，例如操作HDFS上的数据，运行MapReduce Job，执行Hive的SQL语句，浏览Hbase数据库等等。

安装
安装hue的依赖
yum install ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c++ krb5-devel libffi-devel libxml2-devel libxslt-devel make mysql mysql-devel openldap-devel python-devel sqlite-devel gmp-devel

注意：yum 安装ant会自动下载安装openJDK，这样的话会java的版本就会发生变化，可是使用软连接重新改掉/usr/bin/java，使其指向自己安装的java即可。
rm /usr/bin/java
ln -s /opt/jdk1.8.0_112/bin/java /usr/bin/java

解压hue包
cd /software
tar -zxf hue-3.9.0-cdh5.5.0.tar.gz
mv hue-3.9.0-cdh5.5.0 /opt/hue

编译hue
cd /opt/hue
make apps

配置hue
vim /opt/hue/desktop/conf/hue.ini
修改如下内容：

Webserver listens on this address and port
http_host=node3
http_port=8888

Time zone name
time_zone=Asia/Shanghai

配置MySQL数据库，管理hue元数据
在node1上创建hue元数据的数据库和用户
mysql -uroot -plg1011
添加新用户并授权远程访问与本地访问
grant all privileges on . to ’hue’@’%’ identified by ‘lg1011’;
grant all privileges on . to ’hue’@’node3’ identified by ‘lg1011’;
grant all privileges on . to ’hue’@’localhost’ identified by ‘lg1011’;

刷新权限
flush privileges;

查看权限
select host,user from user;

使用hue账户登录，创建hue数据库
mysql -uhue -plg1011
create database hue;

修改hue配置文件
vim /opt/hue/desktop/conf/hue.ini
修改如下内容：
database

Database engine is typically one of:
postgresql_psycopg2, mysql, sqlite3 or oracle.
#
Note that for sqlite3, ‘name’, below is a path to the filename. For other backends, it is the database name.
Note for Oracle, options={"threaded":true} must be set in order to avoid crashes.
Note for Oracle, you can use the Oracle Service Name by setting “port=0” and then “name=:/”.
Note for MariaDB use the ‘mysql’ engine.
engine=mysql
host=node1
port=3306
user=hue
password=lg1011
name=hue
1. options={}

初始化数据库
该步骤是创建表和插入部分数据。hue的初始化数据表命令由hue/bin/hue syncdb完成，创建期间，需要输入用户名和密码。
同步数据库
/opt/hue/build/env/bin/hue syncdb
用户名：hdfs、密码：lg1011
导入数据，主要包括oozie、pig、desktop所需要的的表
/opt/hue/build/env/bin/hue migrate

使用hue账户登录MySQL，查看hue元数据表信息是否已创建

启动hue
/opt/hue/build/env/bin/supervisor > /home/hdfs/hue/log/hue.log 2>&1 &

查看8888端口是否已经启动
netstat -npl | grep 8888

访问hue UI界面
http://node3:8888/
用户名：hdfs，密码：lg1011

配置Hadoop
修改Hadoop配置文件
vim /opt/hadoop-2.7.2/etc/hadoop/core-site.xml
添加如下配置，配置hadoop代理用户hadoop.proxyuser.${user}.hosts，第一个user是安装hadoop的user，或者说可以访问hdfs的user，从node1:50070 -》Utilities-》Browse the file system可以看到的Owner信息，第二个hue是给hue这样的权限，第三个是给httpfs这样的权限：

hadoop.proxyuser.hdfs.groups
*

hadoop.proxyuser.hdfs.hosts * hadoop.proxyuser.hue.groups * hadoop.proxyuser.hue.hosts * hadoop.proxyuser.httpfs.groups * hadoop.proxyuser.httpfs.hosts *

注：httpfs用来干嘛的？
hue与Hadoop连接，可以使用两种方式
1.WebHDFS
提供高速数据传输，client可以直接和DataNode通信
2.HttpFS
代理服务，方便于集群外部的系统进行集成。并且HA模式下只能使用该方式。

开启运行hue web访问hdfs
vim /opt/hadoop-2.7.2/etc/hadoop/hdfs-site.xml
添加如下内容：

dfs.webhdfs.enabled
true

配置httpfs
vim /opt/hadoop-2.7.2/etc/hadoop/httpfs-site.xml

httpfs.proxyuser.hue.hosts
*

httpfs.proxyuser.hue.groups *

分发修改后的配置文件到其他节点
xsync /opt/hadoop-2.7.2/etc/hadoop/core-site.xml
xsync /opt/hadoop-2.7.2/etc/hadoop/hdfs-site.xml
xsync /opt/hadoop-2.7.2/etc/hadoop/httpfs-site.xml

重启Hadoop
cluster.sh start

启动httpfs
/opt/hadoop-2.7.2/sbin/httpfs.sh start

检查端口号是否开启
netstat -anop | grep 14000

配置hue，集成Hadoop
[hadoop]

Configuration for HDFS NameNode
-——————————————————————————————————-
hdfs_clusters
HA support by using HttpFs

[default]

Enter the filesystem uri
使用Hadoop的HA模式，所以只能配置httpfs方式访问hdfs文件
fs_defaultfs=hdfs://ns:8020

NameNode logical name.
logical_name=ns

Use WebHdfs/HttpFs as the communication mechanism.
Domain should be the NameNode or HttpFs host.
Default port is 14000 for HttpFs.
webhdfs_url=http://node1:14000/webhdfs/v1

Change this if your HDFS cluster is Kerberos-secured
1. security_enabled=false

In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
have to be verified against certificate authority
1. ssl_cert_ca_verify=True

Directory of the Hadoop configuration
hadoop_conf_dir=/opt/hadoop-2.7.2/etc/hadoop/

Configuration for YARN (MR2)
-——————————————————————————————————-
yarn_clusters

[default]

Enter the host on which you are running the ResourceManager
resourcemanager_host=node1

The port where the ResourceManager IPC listens on
resourcemanager_port=8032

Whether to submit jobs to this cluster
submit_to=True

Resource Manager logical name (required for HA)
logical_name=ns-yarn

Change this if your YARN cluster is Kerberos-secured
1. security_enabled=false

URL of the ResourceManager API
resourcemanager_api_url=http://node1:8088

URL of the ProxyServer API
1. proxy_api_url=http://localhost:8088

URL of the HistoryServer API
history_server_api_url=http://node3:19888

In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
have to be verified against certificate authority
1. ssl_cert_ca_verify=True

[desktop]

Webserver runs as this user
server_user=hue
server_group=hue

This should be the Hue admin and proxy user
default_user=hue

This should be the hadoop cluster admin
default_hdfs_superuser=hdfs

重启hue
检查hue是否已经启动
netstat -npl | grep 8888
如果已经启动先kill掉
ps -ef | grep runcherrypyserver | grep -v grep | awk ‘{print $2}’ | xargs kill
启动hue
/opt/hue/build/env/bin/supervisor > /home/hdfs/hue/log/hue.log 2>&1 &
此时在hue上就能看到hdfs文件了

Hive集成
vim /opt/hue/desktop/conf/hue.ini
[beeswax]

Host where HiveServer2 is running.
If Kerberos security is enabled, use fully-qualified domain name (FQDN).
hive_server_host=node1

Port where HiveServer2 Thrift server runs on.
hive_server_port=10000

Hive configuration directory, where hive-site.xml is located
hive_conf_dir=/opt/hive/conf/

重启hue服务，就可以看到Hive的数据库和表
检查hue是否已经启动
netstat -npl | grep 8888
如果已经启动先kill掉
ps -ef | grep runcherrypyserver | grep -v grep | awk ‘{print $2}’ | xargs kill
启动hue
/opt/hue/build/env/bin/supervisor > /home/hdfs/hue/log/hue.log 2>&1 &

Spark集成
介绍
hue与spark的集成使用livy server进行中转，livy server类似于hive server2。提供一套基于restful风格的服务，接受client提交http的请求，然后转发给spark集群。livy server不在spark的发行包中，需要单独下载。
注意：hue中通过netebook编写scala或者python程序，要确保notebook可以使用，需要启动hadoop的httpfs进程
注意下载使用较高的版本，否则有些类找不到。

安装
解压：unzip apache-livy-0.7.0-incubating-bin.zip -d /opt/livy-server

启动livy服务器
/opt/livy-server/bin/livy-server
Exception in thread “main” java.lang.IllegalArgumentException: Livy requires the SPARK_HOME environment variable
提示该错误，需要配置livy-server的运行环境
vim /opt/livy-server/conf/livy-env.sh
JAVA_HOME=/opt/jdk1.8.0_112
HADOOP_CONF_DIR=/opt/hadoop-2.7.2/etc/hadoop
SPARK_HOME=/opt/spark
SPARK_CONF_DIR=/opt/spark/conf
LIVY_LOG_DIR=/opt/livy-server/logs
LIVY_PID_DIR=/opt/livy-server/pids
重新启动livy服务
nohup /opt/livy-server/bin/livy-server >> /opt/livy-server/logs/start-livy-server.log 2>&1 &

配置hue
推荐使用local或yarn模式启动job，这里我们配置成spark://node1:7077。
[spark]

Host address of the Livy Server.
livy_server_host=node3

Port of the Livy Server.
livy_server_port=8998

Configure livy to start with ‘process’, ‘thread’, or ‘yarn’ workers.
livy_server_session_kind=spark://node1:7077

If livy should use proxy users when submitting a job.
1. livy_impersonation_enabled=true

List of available types of snippets
1. languages=‘[{"name": “Scala Shell”, “type”: "spark"},{"name": “PySpark Shell”, “type”: "pyspark"},{"name": “R Shell”, “type”: "r"},{"name": “Jar”, “type”: "Jar"},{"name": “Python”, “type”: "py"},{"name": “Impala SQL”, “type”: "impala"},{"name": “Hive SQL”, “type”: "hive"},{"name": “Text”, “type”: "text"}]’

重启hue服务
检查hue是否已经启动
netstat -npl | grep 8888
如果已经启动先kill掉
ps -ef | grep runcherrypyserver | grep -v grep | awk ‘{print $2}’ | xargs kill
启动hue
/opt/hue/build/env/bin/supervisor > /home/hdfs/hue/log/hue.log 2>&1 &

问题解决
1.hue窗口中文报错
修改hue元数据库的字符编码
alter database hue character set latin1;
alter table beeswax_queryhistory modify `query` longtext CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;
ALTER TABLE desktop_document2 modify column name varchar(255) CHARACTER SET utf8;
ALTER TABLE desktop_document2 modify column description longtext CHARACTER SET utf8;
ALTER TABLE desktop_document2 modify column search longtext CHARACTER SET utf8;

hue 3.9安装与使用 - lg1011/SparkLearn GitHub Wiki

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️