Hive on Spark配置 - lg1011/SparkLearn GitHub Wiki

Spark SQL的三种启动方式
Spark SQL是一个处理结构化数据的Spark模块
注意Spark SQL和Hive on Spark的区别
环境准备
需要把将HIVE_HOME/conf下的hive-site.xml复制到$SPARK_HOME/conf文件夹下
将$HIVE_HOME/lib下的mysql-connector-java-5.1.27.jar复制到~/software文件夹下
cp /opt/hive/conf/hive-site.xml /opt/spark/conf/
xsync /opt/spark/conf/hive-site.xml

修改spark下hive-site.xml文件的hive执行引擎为spark
vim /opt/spark/conf/hive-site.xml

hive.execution.engine
spark

mkdir ~/software
cp /opt/hive/lib/mysql-connector-java-5.1.38-bin.jar /home/hdfs/software
sync /home/hdfs/software/mysql-connector-java-5.1.38-bin.jar

将hdfs-site.xml、mapred-site.xml、yarn-site.xml、core-site.xml复制到spark的conf目录下
cp /opt/hadoop-2.7.2/etc/hadoop/hdfs-site.xml /opt/spark/conf/
cp /opt/hadoop-2.7.2/etc/hadoop/mapred-site.xml /opt/spark/conf/
cp /opt/hadoop-2.7.2/etc/hadoop/yarn-site.xml /opt/spark/conf/
cp /opt/hadoop-2.7.2/etc/hadoop/core-site.xml /opt/spark/conf/

建立软连接,将spark相关文件链接到hive的lib目录中
ln -snf /opt/spark/jars/spark-core_2.11-2.2.0.jar /opt/hive/lib/spark-core_2.11-2.2.0.jar
ln -snf /opt/spark/jars/scala-library-2.11.8.jar /opt/hive/lib/scala-library-2.11.8.jar

第一种方式启动
/opt/spark/bin/spark-shell —master yarn —jars ~/software/mysql-connector-java-5.1.38-bin.jar
使用:spark.sql(“show tables”).show(false)

第二种方式启动
/opt/spark/bin/spark-sql —master yarn —driver-class-path ~/software/mysql-connector-java-5.1.38-bin.jar
使用:select count(1) from gmall.dws_uv_detail_mn;

第三种方式启动
启用thriftserver服务端
/opt/spark/sbin/start-thriftserver.sh —master yarn —jars ~/software/mysql-connector-java-5.1.38-bin.jar
/opt/spark/bin/beeline -u ‘jdbc:hive2://node1-mrli.com:2181,node2-mrli.com:2181,node3-mrli.com:2181/gmall;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2’ -n hdfs

⚠️ **GitHub.com Fallback** ⚠️