Pyhive Beeline - cchantra/bigdata.github.io GitHub Wiki

Requirements

  • install mysql or derby jdbc driver

see set: https://dzone.com/articles/how-configure-mysql-metastore

or https://mapr.com/docs/51/Hive/UsingMySQLfortheHiveMetas_29655346-d3e108.html

  • install python pip

Install related modules

sudo apt-get install libsasl2-dev
sudo pip install  sasl thrift
sudo pip install pyhive
sudo pip install  thrift_sasl

If you get some local error, set the following in .bashrc

export LC_ALL="en_US.UTF-8"
export LC_CTYPE="en_US.UTF-8"

and run

sudo dpkg-reconfigure locales

Next create hive data

Assume demo.txt contains:

chantana
ploy
arnon
putchong
mudang
bundit
wandee

Then create folder in hdfs called /user1/

hadoop dfs -put demo.txt /user1/

Then call hive to get the prompt and type the command

hive> load data inpath 'hdfs://localhost:9000/user1/demo.txt' into table demo2;

Loading data to table default.demo2
OK
Time taken: 2.041 seconds

Start hiveserver2 (if you ‘ve done this, skip it)

my case: hive is at /home/hadoop

cd hive/
nohup bin/hive --service metastore &
bin/hiveserver2 &

Use beeline to connect via jdbc

test connection

bin/beeline -u jdbc:hive2://

you should get

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/home/chantana/apache-hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]


SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.


SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Connecting to jdbc:hive2://

19/02/22 19:07:00 [main]: WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.

Connected to: Apache Hive (version 2.3.4)

Driver: Hive JDBC (version 2.3.4)


Transaction isolation: TRANSACTION_REPEATABLE_READ


Beeline version 2.3.4 by Apache Hive


===
0: jdbc:hive2://> select * from demo2;


OK


+-------------+


| demo2.name  |


+-------------+

| chantana    |

| ploy        |

| arnon       |

| putchong    |

| mudang      |

| bundit      |

+-------------+


6 rows selected (2.844 seconds)

=====================

  • Try the code:

testhive.py

copy the following


from pyhive import hive

host_name = "localhost"
port = 10000  #default is 10000
user = "" # user name mysql 
password = "" # pass mysql
database="default"

def hiveconnection(host_name, port, user,password, database):
    conn = hive.Connection(host=host_name, port=port, username=user,password=password,
                           database=database, auth='CUSTOM')
    cur = conn.cursor()
    cur.execute('select name  from demo2 return limit 2')
    result = cur.fetchall()

    return result

# Call above function
output = hiveconnection(host_name, port, user,password, database)

print(output)

See more with pyarrow + hdfs

References

http://dwgeek.com/guide-connecting-hiveserver2-using-python-pyhive.html/