Pyhive Beeline - cchantra/bigdata.github.io GitHub Wiki
Requirements
- install mysql or derby jdbc driver
see set: https://dzone.com/articles/how-configure-mysql-metastore
or https://mapr.com/docs/51/Hive/UsingMySQLfortheHiveMetas_29655346-d3e108.html
- install python pip
Install related modules
sudo apt-get install libsasl2-dev
sudo pip install sasl thrift
sudo pip install pyhive
sudo pip install thrift_sasl
If you get some local error, set the following in .bashrc
export LC_ALL="en_US.UTF-8"
export LC_CTYPE="en_US.UTF-8"
and run
sudo dpkg-reconfigure locales
Next create hive data
Assume demo.txt contains:
chantana
ploy
arnon
putchong
mudang
bundit
wandee
Then create folder in hdfs called /user1/
hadoop dfs -put demo.txt /user1/
Then call hive to get the prompt and type the command
hive> load data inpath 'hdfs://localhost:9000/user1/demo.txt' into table demo2;
Loading data to table default.demo2
OK
Time taken: 2.041 seconds
Start hiveserver2 (if you ‘ve done this, skip it)
my case: hive is at /home/hadoop
cd hive/
nohup bin/hive --service metastore &
bin/hiveserver2 &
Use beeline to connect via jdbc
test connection
bin/beeline -u jdbc:hive2://
you should get
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/chantana/apache-hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://
19/02/22 19:07:00 [main]: WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
Connected to: Apache Hive (version 2.3.4)
Driver: Hive JDBC (version 2.3.4)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.3.4 by Apache Hive
===
0: jdbc:hive2://> select * from demo2;
OK
+-------------+
| demo2.name |
+-------------+
| chantana |
| ploy |
| arnon |
| putchong |
| mudang |
| bundit |
+-------------+
6 rows selected (2.844 seconds)
=====================
- Try the code:
testhive.py
copy the following
from pyhive import hive
host_name = "localhost"
port = 10000 #default is 10000
user = "" # user name mysql
password = "" # pass mysql
database="default"
def hiveconnection(host_name, port, user,password, database):
conn = hive.Connection(host=host_name, port=port, username=user,password=password,
database=database, auth='CUSTOM')
cur = conn.cursor()
cur.execute('select name from demo2 return limit 2')
result = cur.fetchall()
return result
# Call above function
output = hiveconnection(host_name, port, user,password, database)
print(output)
See more with pyarrow + hdfs
References
http://dwgeek.com/guide-connecting-hiveserver2-using-python-pyhive.html/