hue 3.9安装与使用 - lg1011/SparkLearn GitHub Wiki

介绍
HUE=Hadoop User Experience Hue是一个开源的Apache Hadoop UI系统,由Cloudera Desktop演化而来,最后Cloudera公司将其贡献给Apache基金会的Hadoop社区,它是基于Python Web框架Django实现的。 
通过使用Hue我们可以在浏览器端的Web控制台上与Hadoop集群进行交互来分析处理数据,例如操作HDFS上的数据,运行MapReduce Job,执行Hive的SQL语句,浏览Hbase数据库等等。

安装
安装hue的依赖
yum install ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c++ krb5-devel libffi-devel libxml2-devel libxslt-devel make mysql mysql-devel openldap-devel python-devel sqlite-devel gmp-devel

注意:yum 安装ant会自动下载安装openJDK,这样的话会java的版本就会发生变化,可是使用软连接重新改掉/usr/bin/java,使其指向自己安装的java即可。
rm /usr/bin/java
ln -s /opt/jdk1.8.0_112/bin/java /usr/bin/java

解压hue包
cd /software
tar -zxf hue-3.9.0-cdh5.5.0.tar.gz
mv hue-3.9.0-cdh5.5.0 /opt/hue

编译hue
cd /opt/hue
make apps

配置hue
vim /opt/hue/desktop/conf/hue.ini
修改如下内容:

  1. Webserver listens on this address and port
    http_host=node3
    http_port=8888
  1. Time zone name
    time_zone=Asia/Shanghai

配置MySQL数据库,管理hue元数据
在node1上创建hue元数据的数据库和用户
mysql -uroot -plg1011
添加新用户并授权远程访问与本地访问
grant all privileges on . to ’hue’@’%’ identified by ‘lg1011’;
grant all privileges on . to ’hue’@’node3’ identified by ‘lg1011’;
grant all privileges on . to ’hue’@’localhost’ identified by ‘lg1011’;

刷新权限
flush privileges;

查看权限
select host,user from user;

使用hue账户登录,创建hue数据库
mysql -uhue -plg1011
create database hue;

修改hue配置文件
vim /opt/hue/desktop/conf/hue.ini
修改如下内容:
database

  1. Database engine is typically one of:
  2. postgresql_psycopg2, mysql, sqlite3 or oracle.
    #
  3. Note that for sqlite3, ‘name’, below is a path to the filename. For other backends, it is the database name.
  4. Note for Oracle, options={"threaded":true} must be set in order to avoid crashes.
  5. Note for Oracle, you can use the Oracle Service Name by setting “port=0” and then “name=:/”.
  6. Note for MariaDB use the ‘mysql’ engine.
    engine=mysql
    host=node1
    port=3306
    user=hue
    password=lg1011
    name=hue
    1. options={}

初始化数据库
该步骤是创建表和插入部分数据。hue的初始化数据表命令由hue/bin/hue syncdb完成,创建期间,需要输入用户名和密码。
同步数据库
/opt/hue/build/env/bin/hue syncdb
用户名:hdfs、密码:lg1011
导入数据,主要包括oozie、pig、desktop所需要的的表
/opt/hue/build/env/bin/hue migrate

使用hue账户登录MySQL,查看hue元数据表信息是否已创建

启动hue
/opt/hue/build/env/bin/supervisor > /home/hdfs/hue/log/hue.log 2>&1 &

查看8888端口是否已经启动
netstat -npl | grep 8888

访问hue UI界面
http://node3:8888/
用户名:hdfs,密码:lg1011

配置Hadoop
修改Hadoop配置文件
vim /opt/hadoop-2.7.2/etc/hadoop/core-site.xml
添加如下配置,配置hadoop代理用户hadoop.proxyuser.${user}.hosts,第一个user是安装hadoop的user,或者说可以访问hdfs的user,从node1:50070 -》Utilities-》Browse the file system可以看到的Owner信息,第二个hue是给hue这样的权限,第三个是给httpfs这样的权限:

hadoop.proxyuser.hdfs.groups
*

hadoop.proxyuser.hdfs.hosts * hadoop.proxyuser.hue.groups * hadoop.proxyuser.hue.hosts * hadoop.proxyuser.httpfs.groups * hadoop.proxyuser.httpfs.hosts *

注:httpfs用来干嘛的?
hue与Hadoop连接,可以使用两种方式
1.WebHDFS
提供高速数据传输,client可以直接和DataNode通信
2.HttpFS
代理服务,方便于集群外部的系统进行集成。并且HA模式下只能使用该方式。

开启运行hue web访问hdfs
vim /opt/hadoop-2.7.2/etc/hadoop/hdfs-site.xml
添加如下内容:


dfs.webhdfs.enabled
true

配置httpfs
vim /opt/hadoop-2.7.2/etc/hadoop/httpfs-site.xml

httpfs.proxyuser.hue.hosts
*

httpfs.proxyuser.hue.groups *

分发修改后的配置文件到其他节点
xsync /opt/hadoop-2.7.2/etc/hadoop/core-site.xml
xsync /opt/hadoop-2.7.2/etc/hadoop/hdfs-site.xml
xsync /opt/hadoop-2.7.2/etc/hadoop/httpfs-site.xml

重启Hadoop
cluster.sh start

启动httpfs
/opt/hadoop-2.7.2/sbin/httpfs.sh start

检查端口号是否开启
netstat -anop | grep 14000

配置hue,集成Hadoop
[hadoop]

  1. Configuration for HDFS NameNode
  2. -——————————————————————————————————-
    hdfs_clusters
  3. HA support by using HttpFs
[default]
  1. Enter the filesystem uri
  2. 使用Hadoop的HA模式,所以只能配置httpfs方式访问hdfs文件
    fs_defaultfs=hdfs://ns:8020
  1. NameNode logical name.
    logical_name=ns
  1. Use WebHdfs/HttpFs as the communication mechanism.
  2. Domain should be the NameNode or HttpFs host.
  3. Default port is 14000 for HttpFs.
    webhdfs_url=http://node1:14000/webhdfs/v1
  1. Change this if your HDFS cluster is Kerberos-secured
    1. security_enabled=false
  1. In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
  2. have to be verified against certificate authority
    1. ssl_cert_ca_verify=True
  1. Directory of the Hadoop configuration
    hadoop_conf_dir=/opt/hadoop-2.7.2/etc/hadoop/
  1. Configuration for YARN (MR2)
  2. -——————————————————————————————————-
    yarn_clusters
[default]
  1. Enter the host on which you are running the ResourceManager
    resourcemanager_host=node1
  1. The port where the ResourceManager IPC listens on
    resourcemanager_port=8032
  1. Whether to submit jobs to this cluster
    submit_to=True
  1. Resource Manager logical name (required for HA)
    logical_name=ns-yarn
  1. Change this if your YARN cluster is Kerberos-secured
    1. security_enabled=false
  1. URL of the ResourceManager API
    resourcemanager_api_url=http://node1:8088
  1. URL of the ProxyServer API
    1. proxy_api_url=http://localhost:8088
  1. URL of the HistoryServer API
    history_server_api_url=http://node3:19888
  1. In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
  2. have to be verified against certificate authority
    1. ssl_cert_ca_verify=True

[desktop]

  1. Webserver runs as this user
    server_user=hue
    server_group=hue
  1. This should be the Hue admin and proxy user
    default_user=hue
  1. This should be the hadoop cluster admin
    default_hdfs_superuser=hdfs

重启hue
检查hue是否已经启动
netstat -npl | grep 8888
如果已经启动先kill掉
ps -ef | grep runcherrypyserver | grep -v grep | awk ‘{print $2}’ | xargs kill
启动hue
/opt/hue/build/env/bin/supervisor > /home/hdfs/hue/log/hue.log 2>&1 &
此时在hue上就能看到hdfs文件了

Hive集成
vim /opt/hue/desktop/conf/hue.ini
[beeswax]

  1. Host where HiveServer2 is running.
  2. If Kerberos security is enabled, use fully-qualified domain name (FQDN).
    hive_server_host=node1
  1. Port where HiveServer2 Thrift server runs on.
    hive_server_port=10000
  1. Hive configuration directory, where hive-site.xml is located
    hive_conf_dir=/opt/hive/conf/

重启hue服务,就可以看到Hive的数据库和表
检查hue是否已经启动
netstat -npl | grep 8888
如果已经启动先kill掉
ps -ef | grep runcherrypyserver | grep -v grep | awk ‘{print $2}’ | xargs kill
启动hue
/opt/hue/build/env/bin/supervisor > /home/hdfs/hue/log/hue.log 2>&1 &

Spark集成
介绍
hue与spark的集成使用livy server进行中转,livy server类似于hive server2。提供一套基于restful风格的服务,接受client提交http的请求,然后转发给spark集群。livy server不在spark的发行包中,需要单独下载。
注意:hue中通过netebook编写scala或者python程序,要确保notebook可以使用,需要启动hadoop的httpfs进程
注意下载使用较高的版本,否则有些类找不到。

安装
解压:unzip apache-livy-0.7.0-incubating-bin.zip -d /opt/livy-server

启动livy服务器
/opt/livy-server/bin/livy-server
Exception in thread “main” java.lang.IllegalArgumentException: Livy requires the SPARK_HOME environment variable
提示该错误,需要配置livy-server的运行环境
vim /opt/livy-server/conf/livy-env.sh
JAVA_HOME=/opt/jdk1.8.0_112
HADOOP_CONF_DIR=/opt/hadoop-2.7.2/etc/hadoop
SPARK_HOME=/opt/spark
SPARK_CONF_DIR=/opt/spark/conf
LIVY_LOG_DIR=/opt/livy-server/logs
LIVY_PID_DIR=/opt/livy-server/pids
重新启动livy服务
nohup /opt/livy-server/bin/livy-server >> /opt/livy-server/logs/start-livy-server.log 2>&1 &

配置hue
推荐使用local或yarn模式启动job,这里我们配置成spark://node1:7077。
[spark]

  1. Host address of the Livy Server.
    livy_server_host=node3
  1. Port of the Livy Server.
    livy_server_port=8998
  1. Configure livy to start with ‘process’, ‘thread’, or ‘yarn’ workers.
    livy_server_session_kind=spark://node1:7077
  1. If livy should use proxy users when submitting a job.
    1. livy_impersonation_enabled=true
  1. List of available types of snippets
    1. languages=‘[{"name": “Scala Shell”, “type”: "spark"},{"name": “PySpark Shell”, “type”: "pyspark"},{"name": “R Shell”, “type”: "r"},{"name": “Jar”, “type”: "Jar"},{"name": “Python”, “type”: "py"},{"name": “Impala SQL”, “type”: "impala"},{"name": “Hive SQL”, “type”: "hive"},{"name": “Text”, “type”: "text"}]’

重启hue服务
检查hue是否已经启动
netstat -npl | grep 8888
如果已经启动先kill掉
ps -ef | grep runcherrypyserver | grep -v grep | awk ‘{print $2}’ | xargs kill
启动hue
/opt/hue/build/env/bin/supervisor > /home/hdfs/hue/log/hue.log 2>&1 &

问题解决
1.hue窗口中文报错
修改hue元数据库的字符编码
alter database hue character set latin1;
alter table beeswax_queryhistory modify `query` longtext CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;
ALTER TABLE desktop_document2 modify column name varchar(255) CHARACTER SET utf8;
ALTER TABLE desktop_document2 modify column description longtext CHARACTER SET utf8;
ALTER TABLE desktop_document2 modify column search longtext CHARACTER SET utf8;

⚠️ **GitHub.com Fallback** ⚠️