SparkSQL, Spark - stanislawbartkowski/hdpactivedirectory GitHub Wiki
SparkSQL, Thrift Server
Configure
As Hive, Spark SQL / Thrift Server can be accessed through beeline command line. Identify the host where Spark2 Thrift Server is installed and the connection port (default: 10016). The beeline command line is to be like:
kinit ... beeline -u "jdbc:hive2://aa1.fyre.ibm.com:10016/;principal=hive/[email protected];transportMode=binary;httpPath=cliservice"
beeline -u "jdbc:hive2://aa1.fyre.ibm.com:10016/;principal=hive/[email protected];transportMode=binary;httpPath=cliservice"
[perf@varlet1 ~]$ thr
Connecting to jdbc:hive2://aa1.fyre.ibm.com:10016/;principal=hive/[email protected];transportMode=binary;httpPath=cliservice
Connected to: Spark SQL (version 2.3.0.2.6.5.1050-37)
Driver: Hive JDBC (version 1.2.1000.2.6.5.1050-37)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1000.2.6.5.1050-37 by Apache Hive
0: jdbc:hive2://aa1.fyre.ibm.com:10016/> show databases;
+---------------+--+
| databaseName |
+---------------+--+
| bigsql |
| datalake |
| default |
| perfdb |
| perfte |
+---------------+--+
5 rows selected (0,129 seconds)
Ranger
Although SparkSQL is running on the top of Hive tables, the Hive Ranger policies do not impact SparkSQL because SparkSQL is a separate SQL engine and bypasses Hive. There is no dedicated Ranger plugin for SparkSQL and the protection should be orchestrated using other means. https://hortonworks.com/blog/sparksql-ranger-llap-via-spark-thrift-server-bi-scenarios-provide-row-column-level-security-masking/
SparkSQL shell, Spark
Spark SQL can be also launched directly without Thrift Server. There is a time penalty until Spark shell is up and ready.
export SPARK_MAJOR_VERSION=2 kinit spark-sql --master yarn --num-executors 2 -S
...............
spark-sql> show databases;
bigsql
datalake
default
perfdb
perfte
pyspark
SPARK_MAJOR_VERSION is set to 2, using Spark2
Python 2.7.5 (default, Oct 30 2018, 23:45:53)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.3.0.2.6.5.1050-37
/_/
Using Python version 2.7.5 (default, Oct 30 2018 23:45:53)
SparkSession available as 'spark'.
>>> spark
<pyspark.sql.session.SparkSession object at 0x7f89c94c57d0>