Hive and Pyhive - Nantawat6510545543/big-data-summary GitHub Wiki

Apache Hive Installation and Setup

Install Hive

Download and extract Hive:

cd /home/hadoop
wget https://archive.apache.org/dist/hive/hive-2.3.9/apache-hive-2.3.9-bin.tar.gz
tar xzf apache-hive-2.3.9-bin.tar.gz
mv apache-hive-2.3.9-bin hive

Configure Environment

Edit .bashrc:

nano ~/.bashrc

Add:

export HADOOP_USER_CLASSPATH_FIRST=true
export HIVE_HOME=/home/hadoop/hive
export PATH=$HIVE_HOME/bin:$PATH

Apply changes:

source ~/.bashrc

Prepare HDFS for Hive

hadoop fs -mkdir -p /user1/hive/warehouse
hadoop fs -chmod g+w /tmp
hadoop fs -chmod g+w /user1/hive/warehouse

Resolve Guava Conflict (Hive 2.3.9)

cp $HADOOP_HOME/share/hadoop/common/lib/guava-27.0-jre.jar $HIVE_HOME/lib/
rm $HIVE_HOME/lib/guava-14.0.1.jar

Start Hive CLI

hive

You should see the Hive prompt:

hive>

Set Up Metastore (MySQL)

Install MySQL and start the service:

sudo apt-get update
sudo apt-get install mysql-server
sudo systemctl start mysql

Download MySQL JDBC driver:

wget https://downloads.mysql.com/archives/get/p/3/file/mysql-connector-java-5.1.48.tar.gz
tar xvf mysql-connector-java-5.1.48.tar.gz
cp mysql-connector-java-5.1.48/mysql-connector-java-5.1.48.jar $HIVE_HOME/lib/

Add connector to classpath:

Edit .bashrc:

nano ~/.bashrc

Add:

export CLASSPATH=$CLASSPATH:/home/hadoop/hive/lib

Apply changes:

source ~/.bashrc

Configure hive-site.xml

Create or edit:

nano $HIVE_HOME/conf/hive-site.xml

Paste minimal config:

<configuration>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>password</value>
  </property>
  <property>
    <name>datanucleus.autoCreateSchema</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>hdfs://localhost:9000/user1/hive/warehouse</value>
  </property>
</configuration>

Initialize or Fix Hive Metastore Schema (MySQL)

1. Set or Reset MySQL Root Password

sudo mysql

Inside MySQL prompt:

ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'password';
EXIT;

2. Drop Existing Metastore Database (Optional Cleanup)

mysql -u root -p

Inside MySQL:

DROP DATABASE IF EXISTS metastore;
EXIT;

3. Initialize Hive Schema

$HIVE_HOME/bin/schematool -initSchema -dbType mysql

If successful, you’ll see output ending with:

Initialization script completed
schemaTool completed

After this, you can start Hive services:

hive --service metastore &
hiveserver2 &

PyHive Setup and Hive Access via Python

Install Required Packages

sudo apt-get install libsasl2-dev
sudo pip install sasl thrift
sudo pip install pyhive
sudo pip install thrift_sasl

Fix Locales (if you encounter errors)

Edit .bashrc:

nano ~/.bashrc

Add:

export LC_ALL="en_US.UTF-8"
export LC_CTYPE="en_US.UTF-8"

Then run:

source ~/.bashrc
sudo dpkg-reconfigure locales

Python Hive Query Example

Create testhive.py:

nano testhive.py

Paste:

from pyhive import hive

def hiveconnection():
    conn = hive.Connection(
        host="localhost",
        port=10000,
        username="root",
        password="password",
        database="default",
        auth='CUSTOM'
    )
    cur = conn.cursor()
    cur.execute("SELECT name FROM demo2 LIMIT 2")
    return cur.fetchall()

print(hiveconnection())

Then run:

python3 testhive.py
⚠️ **GitHub.com Fallback** ⚠️