Hive and Pyhive - Nantawat6510545543/big-data-summary GitHub Wiki
Download and extract Hive:
cd /home/hadoop
wget https://archive.apache.org/dist/hive/hive-2.3.9/apache-hive-2.3.9-bin.tar.gz
tar xzf apache-hive-2.3.9-bin.tar.gz
mv apache-hive-2.3.9-bin hive
Edit .bashrc
:
nano ~/.bashrc
Add:
export HADOOP_USER_CLASSPATH_FIRST=true
export HIVE_HOME=/home/hadoop/hive
export PATH=$HIVE_HOME/bin:$PATH
Apply changes:
source ~/.bashrc
hadoop fs -mkdir -p /user1/hive/warehouse
hadoop fs -chmod g+w /tmp
hadoop fs -chmod g+w /user1/hive/warehouse
cp $HADOOP_HOME/share/hadoop/common/lib/guava-27.0-jre.jar $HIVE_HOME/lib/
rm $HIVE_HOME/lib/guava-14.0.1.jar
hive
You should see the Hive prompt:
hive>
Install MySQL and start the service:
sudo apt-get update
sudo apt-get install mysql-server
sudo systemctl start mysql
Download MySQL JDBC driver:
wget https://downloads.mysql.com/archives/get/p/3/file/mysql-connector-java-5.1.48.tar.gz
tar xvf mysql-connector-java-5.1.48.tar.gz
cp mysql-connector-java-5.1.48/mysql-connector-java-5.1.48.jar $HIVE_HOME/lib/
Add connector to classpath:
Edit .bashrc
:
nano ~/.bashrc
Add:
export CLASSPATH=$CLASSPATH:/home/hadoop/hive/lib
Apply changes:
source ~/.bashrc
Create or edit:
nano $HIVE_HOME/conf/hive-site.xml
Paste minimal config:
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>password</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://localhost:9000/user1/hive/warehouse</value>
</property>
</configuration>
sudo mysql
Inside MySQL prompt:
ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'password';
EXIT;
mysql -u root -p
Inside MySQL:
DROP DATABASE IF EXISTS metastore;
EXIT;
$HIVE_HOME/bin/schematool -initSchema -dbType mysql
If successful, you’ll see output ending with:
Initialization script completed
schemaTool completed
After this, you can start Hive services:
hive --service metastore &
hiveserver2 &
sudo apt-get install libsasl2-dev
sudo pip install sasl thrift
sudo pip install pyhive
sudo pip install thrift_sasl
Edit .bashrc
:
nano ~/.bashrc
Add:
export LC_ALL="en_US.UTF-8"
export LC_CTYPE="en_US.UTF-8"
Then run:
source ~/.bashrc
sudo dpkg-reconfigure locales
Create testhive.py
:
nano testhive.py
Paste:
from pyhive import hive
def hiveconnection():
conn = hive.Connection(
host="localhost",
port=10000,
username="root",
password="password",
database="default",
auth='CUSTOM'
)
cur = conn.cursor()
cur.execute("SELECT name FROM demo2 LIMIT 2")
return cur.fetchall()
print(hiveconnection())
Then run:
python3 testhive.py