python - youdar/How-to GitHub Wiki

Python things

When working in environment that you have no root permissions but still need python that is newer than the 2.6.6 that is installed in many corporate machines the following method might help

Problem installing thrift_sasl

sudo yum install gcc-c++ python-devel.x86_64 cyrus-sasl-devel.x86_64

Installing python on linux (without admin permissions)

Based on
https://danieleriksson.net/2017/02/08/how-to-install-latest-python-on-centos
For older CentOS look at https://benad.me/blog/2018/07/17/python-3.7-on-centos-6/

$ cd
$ mkdir python
$ export PYTHON_BASE="$HOME/python"

Copy the python into this new folder and unzip it

$ wget https://www.python.org/ftp/python/3.6.8/Python-3.6.8.tgz
$ tar xvf Python-3.6.8.tgz
$ cd Python-3.6.8
(Might need to run:   
sudo yum install yum-utils 
sudo yum-builddep python 
sudo yum install -y libffi-devel
sudo yum install openssl-devel

need to run:
yum install sqlite-devel 
in order to be able to use sqlite3

At last try, the same process did not work for python 3.7.2
)   
$ ./configure --with-sqlite3 --enable-unicode=ucs4 --enable-shared --enable-optimizations --with-ensurepip=install --prefix=/home/user/python/Python-3.6.8 LDFLAGS="-Wl,-rpath=/home/user/python/Python-3.6.8/lib"
$ make -j 4
$ make altinstall

Add to .bash_profile

export PYTHON27="$HOME/python/Python-3.6.8/bin"
PATH=$PATH:$PYTHON36
export PATH

Make sure the latest setuptools and pip are installed and install packages:

$PYTHON37/python -m ensurepip    
$PYTHON37/pip install --upgrade setuptools pip
$PYTHON37/pip install --upgrade virtualenv wheel
$PYTHON37/pip install --upgrade pysam matplotlib pandas cython scipy
$PYTHON37/pip install --upgrade jupyter scikit­-learn datarobot seaborn numpy
# Note that impyla will not work with the latest thrift package 
$PYTHON37/pip install --upgrade vertica_python thrift==0.9.3 impyla bs4 avro
# If you are having issues with bs4 run
$PYTHON37/pip install --upgrade --force-reinstall bs4

Using Pandas to look at Hadoop data

#!/usr/bin/env python
from impala.dbapi import connect
import pandas as pd
import pandas_profiling as ppd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

SQL_CMD = 'select * from db_name.table_name limit 10'

impyla_conn = connect(host='impalad.host.com',auth_mechanism='GSSAPI')
impyla_curr = impyla_conn.cursor()
df = pd.read_sql_query(SQL_CMD, impyla_conn)

# Adjust DataFrame params
pd.options.display.max_columns = 300

impyla_curr.close()
impyla_conn.close()

print('Done')

Packaging code

After creating the setup.py
use the pip install -e path_to_package to install the package in
developer mode (developer mode allows simple update of the package)
Do not run python setup.py, when I did that the autocomplete functionality
of PyCharm did not work well

Command line Code Execution

Execute command line script that continue to run after python script is done

import os
import subprocess

# create script_file sh_fn
with open(sh_fn, 'wt') as f:
    f.write(shell_scripts_string)
devnull = open(os.devnull, 'wb')
subprocess.call(['chmod', '755', sh_fn])
p = subprocess.Popen(['nohup', sh_fn], stdout=devnull, stderr=devnull)
print(p.pid)