Spark配置启用LZO压缩 - lg1011/SparkLearn GitHub Wiki

修改spark-env.sh和spark-default.conf文件

vim spark-env.sh
添加如下内容

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/hadoop-2.7.2/lib/native

export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/opt/hadoop-2.7.2/lib/native

export SPARK_CLASSPATH=$SPARK_CLASSPATH:/opt/hadoop-2.7.2/share/hadoop/yarn/:/opt/hadoop-2.7.2/share/hadoop/yarn/lib/:/opt/hadoop-2.7.2/share/hadoop/common/:/opt/hadoop-2.7.2/share/hadoop/common/lib/、:/opt/hadoop-2.7.2/share/hadoop/hdfs/:/opt/hadoop-2.7.2/share/hadoop/hdfs/lib/:/opt/hadoop-2.7.2/share/hadoop/mapreduce/:/opt/hadoop-2.7.2/share/hadoop/mapreduce/lib/:/opt/hadoop-2.7.2/share/hadoop/tools/lib/:/opt/spark/jars/:/opt/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20.jar

vim spark-defaults.conf
添加如下内容

spark.driver.extraClassPath /opt/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20.jar
spark.executor.extraClassPath /opt/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20.jar

将spark-env.sh和spark-defaults.conf文件分发到其他节点
xsync /opt/spark/conf/spark-env.sh
xsync /opt/spark/conf/spark-defaults.conf

复制LZO的jar包到spark的jars目录下
cp /opt/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20.jar /opt/spark/jars/

重新启动spark集群
进入spark-sql执行查询具有LZO压缩的表
/opt/spark/bin/spark-sql —master yarn —driver-class-path ~/software/mysql-connector-java-5.1.38-bin.jar

⚠️ **GitHub.com Fallback** ⚠️