python - youdar/How-to GitHub Wiki
Python things
When working in environment that you have no root permissions but still need python that is newer than the 2.6.6 that is installed in many corporate machines the following method might help
Problem installing thrift_sasl
sudo yum install gcc-c++ python-devel.x86_64 cyrus-sasl-devel.x86_64
Installing python on linux (without admin permissions)
Based on
https://danieleriksson.net/2017/02/08/how-to-install-latest-python-on-centos
For older CentOS look at https://benad.me/blog/2018/07/17/python-3.7-on-centos-6/
$ cd
$ mkdir python
$ export PYTHON_BASE="$HOME/python"
Copy the python into this new folder and unzip it
$ wget https://www.python.org/ftp/python/3.6.8/Python-3.6.8.tgz
$ tar xvf Python-3.6.8.tgz
$ cd Python-3.6.8
(Might need to run:
sudo yum install yum-utils
sudo yum-builddep python
sudo yum install -y libffi-devel
sudo yum install openssl-devel
need to run:
yum install sqlite-devel
in order to be able to use sqlite3
At last try, the same process did not work for python 3.7.2
)
$ ./configure --with-sqlite3 --enable-unicode=ucs4 --enable-shared --enable-optimizations --with-ensurepip=install --prefix=/home/user/python/Python-3.6.8 LDFLAGS="-Wl,-rpath=/home/user/python/Python-3.6.8/lib"
$ make -j 4
$ make altinstall
Add to .bash_profile
export PYTHON27="$HOME/python/Python-3.6.8/bin"
PATH=$PATH:$PYTHON36
export PATH
Make sure the latest setuptools and pip are installed and install packages:
$PYTHON37/python -m ensurepip
$PYTHON37/pip install --upgrade setuptools pip
$PYTHON37/pip install --upgrade virtualenv wheel
$PYTHON37/pip install --upgrade pysam matplotlib pandas cython scipy
$PYTHON37/pip install --upgrade jupyter scikit-learn datarobot seaborn numpy
# Note that impyla will not work with the latest thrift package
$PYTHON37/pip install --upgrade vertica_python thrift==0.9.3 impyla bs4 avro
# If you are having issues with bs4 run
$PYTHON37/pip install --upgrade --force-reinstall bs4
Using Pandas to look at Hadoop data
#!/usr/bin/env python
from impala.dbapi import connect
import pandas as pd
import pandas_profiling as ppd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
SQL_CMD = 'select * from db_name.table_name limit 10'
impyla_conn = connect(host='impalad.host.com',auth_mechanism='GSSAPI')
impyla_curr = impyla_conn.cursor()
df = pd.read_sql_query(SQL_CMD, impyla_conn)
# Adjust DataFrame params
pd.options.display.max_columns = 300
impyla_curr.close()
impyla_conn.close()
print('Done')
Packaging code
After creating the setup.py
use the pip install -e path_to_package to install the package in
developer mode (developer mode allows simple update of the package)
Do not run python setup.py, when I did that the autocomplete functionality
of PyCharm did not work well
Command line Code Execution
Execute command line script that continue to run after python script is done
import os
import subprocess
# create script_file sh_fn
with open(sh_fn, 'wt') as f:
f.write(shell_scripts_string)
devnull = open(os.devnull, 'wb')
subprocess.call(['chmod', '755', sh_fn])
p = subprocess.Popen(['nohup', sh_fn], stdout=devnull, stderr=devnull)
print(p.pid)