hue 3.9安装与使用 - lg1011/SparkLearn GitHub Wiki
介绍
HUE=Hadoop User Experience Hue是一个开源的Apache Hadoop UI系统,由Cloudera Desktop演化而来,最后Cloudera公司将其贡献给Apache基金会的Hadoop社区,它是基于Python Web框架Django实现的。
通过使用Hue我们可以在浏览器端的Web控制台上与Hadoop集群进行交互来分析处理数据,例如操作HDFS上的数据,运行MapReduce Job,执行Hive的SQL语句,浏览Hbase数据库等等。
安装
安装hue的依赖
yum install ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c++ krb5-devel libffi-devel libxml2-devel libxslt-devel make mysql mysql-devel openldap-devel python-devel sqlite-devel gmp-devel
注意:yum 安装ant会自动下载安装openJDK,这样的话会java的版本就会发生变化,可是使用软连接重新改掉/usr/bin/java,使其指向自己安装的java即可。
rm /usr/bin/java
ln -s /opt/jdk1.8.0_112/bin/java /usr/bin/java
解压hue包
cd /software
tar -zxf hue-3.9.0-cdh5.5.0.tar.gz
mv hue-3.9.0-cdh5.5.0 /opt/hue
编译hue
cd /opt/hue
make apps
配置hue
vim /opt/hue/desktop/conf/hue.ini
修改如下内容:
- Webserver listens on this address and port
http_host=node3
http_port=8888
- Time zone name
time_zone=Asia/Shanghai
配置MySQL数据库,管理hue元数据
在node1上创建hue元数据的数据库和用户
mysql -uroot -plg1011
添加新用户并授权远程访问与本地访问
grant all privileges on . to ’hue’@’%’ identified by ‘lg1011’;
grant all privileges on . to ’hue’@’node3’ identified by ‘lg1011’;
grant all privileges on . to ’hue’@’localhost’ identified by ‘lg1011’;
刷新权限
flush privileges;
查看权限
select host,user from user;
使用hue账户登录,创建hue数据库
mysql -uhue -plg1011
create database hue;
修改hue配置文件
vim /opt/hue/desktop/conf/hue.ini
修改如下内容:
database
- Database engine is typically one of:
- postgresql_psycopg2, mysql, sqlite3 or oracle.
# - Note that for sqlite3, ‘name’, below is a path to the filename. For other backends, it is the database name.
- Note for Oracle, options={"threaded":true} must be set in order to avoid crashes.
- Note for Oracle, you can use the Oracle Service Name by setting “port=0” and then “name=:/”.
- Note for MariaDB use the ‘mysql’ engine.
engine=mysql
host=node1
port=3306
user=hue
password=lg1011
name=hue- options={}
初始化数据库
该步骤是创建表和插入部分数据。hue的初始化数据表命令由hue/bin/hue syncdb完成,创建期间,需要输入用户名和密码。
同步数据库
/opt/hue/build/env/bin/hue syncdb
用户名:hdfs、密码:lg1011
导入数据,主要包括oozie、pig、desktop所需要的的表
/opt/hue/build/env/bin/hue migrate
使用hue账户登录MySQL,查看hue元数据表信息是否已创建
启动hue
/opt/hue/build/env/bin/supervisor > /home/hdfs/hue/log/hue.log 2>&1 &
查看8888端口是否已经启动
netstat -npl | grep 8888
访问hue UI界面
http://node3:8888/
用户名:hdfs,密码:lg1011
配置Hadoop
修改Hadoop配置文件
vim /opt/hadoop-2.7.2/etc/hadoop/core-site.xml
添加如下配置,配置hadoop代理用户hadoop.proxyuser.${user}.hosts,第一个user是安装hadoop的user,或者说可以访问hdfs的user,从node1:50070 -》Utilities-》Browse the file system可以看到的Owner信息,第二个hue是给hue这样的权限,第三个是给httpfs这样的权限:
hadoop.proxyuser.hdfs.groups
*
注:httpfs用来干嘛的?
hue与Hadoop连接,可以使用两种方式
1.WebHDFS
提供高速数据传输,client可以直接和DataNode通信
2.HttpFS
代理服务,方便于集群外部的系统进行集成。并且HA模式下只能使用该方式。
开启运行hue web访问hdfs
vim /opt/hadoop-2.7.2/etc/hadoop/hdfs-site.xml
添加如下内容:
dfs.webhdfs.enabled
true
配置httpfs
vim /opt/hadoop-2.7.2/etc/hadoop/httpfs-site.xml
httpfs.proxyuser.hue.hosts
*
分发修改后的配置文件到其他节点
xsync /opt/hadoop-2.7.2/etc/hadoop/core-site.xml
xsync /opt/hadoop-2.7.2/etc/hadoop/hdfs-site.xml
xsync /opt/hadoop-2.7.2/etc/hadoop/httpfs-site.xml
重启Hadoop
cluster.sh start
启动httpfs
/opt/hadoop-2.7.2/sbin/httpfs.sh start
检查端口号是否开启
netstat -anop | grep 14000
配置hue,集成Hadoop
[hadoop]
- Configuration for HDFS NameNode
-
-——————————————————————————————————-
hdfs_clusters - HA support by using HttpFs
- Enter the filesystem uri
- 使用Hadoop的HA模式,所以只能配置httpfs方式访问hdfs文件
fs_defaultfs=hdfs://ns:8020
- NameNode logical name.
logical_name=ns
- Use WebHdfs/HttpFs as the communication mechanism.
- Domain should be the NameNode or HttpFs host.
- Default port is 14000 for HttpFs.
webhdfs_url=http://node1:14000/webhdfs/v1
- Change this if your HDFS cluster is Kerberos-secured
- security_enabled=false
- In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
- have to be verified against certificate authority
- ssl_cert_ca_verify=True
- Directory of the Hadoop configuration
hadoop_conf_dir=/opt/hadoop-2.7.2/etc/hadoop/
- Configuration for YARN (MR2)
-
-——————————————————————————————————-
yarn_clusters
- Enter the host on which you are running the ResourceManager
resourcemanager_host=node1
- The port where the ResourceManager IPC listens on
resourcemanager_port=8032
- Whether to submit jobs to this cluster
submit_to=True
- Resource Manager logical name (required for HA)
logical_name=ns-yarn
- Change this if your YARN cluster is Kerberos-secured
- security_enabled=false
-
URL of the ResourceManager API
resourcemanager_api_url=http://node1:8088
-
URL of the ProxyServer API
- proxy_api_url=http://localhost:8088
-
URL of the HistoryServer API
history_server_api_url=http://node3:19888
- In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
- have to be verified against certificate authority
- ssl_cert_ca_verify=True
[desktop]
- Webserver runs as this user
server_user=hue
server_group=hue
- This should be the Hue admin and proxy user
default_user=hue
- This should be the hadoop cluster admin
default_hdfs_superuser=hdfs
重启hue
检查hue是否已经启动
netstat -npl | grep 8888
如果已经启动先kill掉
ps -ef | grep runcherrypyserver | grep -v grep | awk ‘{print $2}’ | xargs kill
启动hue
/opt/hue/build/env/bin/supervisor > /home/hdfs/hue/log/hue.log 2>&1 &
此时在hue上就能看到hdfs文件了
Hive集成
vim /opt/hue/desktop/conf/hue.ini
[beeswax]
- Host where HiveServer2 is running.
- If Kerberos security is enabled, use fully-qualified domain name (FQDN).
hive_server_host=node1
- Port where HiveServer2 Thrift server runs on.
hive_server_port=10000
- Hive configuration directory, where hive-site.xml is located
hive_conf_dir=/opt/hive/conf/
重启hue服务,就可以看到Hive的数据库和表
检查hue是否已经启动
netstat -npl | grep 8888
如果已经启动先kill掉
ps -ef | grep runcherrypyserver | grep -v grep | awk ‘{print $2}’ | xargs kill
启动hue
/opt/hue/build/env/bin/supervisor > /home/hdfs/hue/log/hue.log 2>&1 &
Spark集成
介绍
hue与spark的集成使用livy server进行中转,livy server类似于hive server2。提供一套基于restful风格的服务,接受client提交http的请求,然后转发给spark集群。livy server不在spark的发行包中,需要单独下载。
注意:hue中通过netebook编写scala或者python程序,要确保notebook可以使用,需要启动hadoop的httpfs进程
注意下载使用较高的版本,否则有些类找不到。
安装
解压:unzip apache-livy-0.7.0-incubating-bin.zip -d /opt/livy-server
启动livy服务器
/opt/livy-server/bin/livy-server
Exception in thread “main” java.lang.IllegalArgumentException: Livy requires the SPARK_HOME environment variable
提示该错误,需要配置livy-server的运行环境
vim /opt/livy-server/conf/livy-env.sh
JAVA_HOME=/opt/jdk1.8.0_112
HADOOP_CONF_DIR=/opt/hadoop-2.7.2/etc/hadoop
SPARK_HOME=/opt/spark
SPARK_CONF_DIR=/opt/spark/conf
LIVY_LOG_DIR=/opt/livy-server/logs
LIVY_PID_DIR=/opt/livy-server/pids
重新启动livy服务
nohup /opt/livy-server/bin/livy-server >> /opt/livy-server/logs/start-livy-server.log 2>&1 &
配置hue
推荐使用local或yarn模式启动job,这里我们配置成spark://node1:7077。
[spark]
- Host address of the Livy Server.
livy_server_host=node3
- Port of the Livy Server.
livy_server_port=8998
- Configure livy to start with ‘process’, ‘thread’, or ‘yarn’ workers.
livy_server_session_kind=spark://node1:7077
- If livy should use proxy users when submitting a job.
- livy_impersonation_enabled=true
- List of available types of snippets
- languages=‘[{"name": “Scala Shell”, “type”: "spark"},{"name": “PySpark Shell”, “type”: "pyspark"},{"name": “R Shell”, “type”: "r"},{"name": “Jar”, “type”: "Jar"},{"name": “Python”, “type”: "py"},{"name": “Impala SQL”, “type”: "impala"},{"name": “Hive SQL”, “type”: "hive"},{"name": “Text”, “type”: "text"}]’
重启hue服务
检查hue是否已经启动
netstat -npl | grep 8888
如果已经启动先kill掉
ps -ef | grep runcherrypyserver | grep -v grep | awk ‘{print $2}’ | xargs kill
启动hue
/opt/hue/build/env/bin/supervisor > /home/hdfs/hue/log/hue.log 2>&1 &
问题解决
1.hue窗口中文报错
修改hue元数据库的字符编码
alter database hue character set latin1;
alter table beeswax_queryhistory modify `query` longtext CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;
ALTER TABLE desktop_document2 modify column name varchar(255) CHARACTER SET utf8;
ALTER TABLE desktop_document2 modify column description longtext CHARACTER SET utf8;
ALTER TABLE desktop_document2 modify column search longtext CHARACTER SET utf8;