HDP, Zeppelin and Python3 - stanislawbartkowski/wikis GitHub Wiki

Problem

HDP Zeppelin does not come with Python 3 interpreter out of the box. Some steps are necessary to enable it.

https://zeppelin.apache.org/docs/0.6.2/interpreter/python.html

Install Python 3

Python 3

The first thing to do is to install Python3, it is not included in the base CentOS or RHEL. The installation should be conducted on all nodes participating in the cluster.

yum install python36

Verify.

python3

Python 3.6.8 (default, Aug  7 2019, 17:28:10) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 

Install ML Python packages

Verify that pip is enabled.

python3 -m ensurepip

WARNING: Running pip install with root privileges is generally not a good idea. Try `__main__.py install --user` instead.
Requirement already satisfied: setuptools in /usr/lib/python3.6/site-packages
Requirement already satisfied: pip in /usr/lib/python3.6/site-packages

Upgrade pip

python3 -m pip install --upgrade pip

WARNING: Running pip install with root privileges is generally not a good idea. Try `__main__.py install --user` instead.
Collecting pip
  Downloading https://files.pythonhosted.org/packages/30/db/9e38760b32e3e7f40cce46dd5fb107b8c73840df38f0046d8e6514e675a1/pip-19.2.3-py2.py3-none-any.whl (1.4MB)
    100% |████████████████████████████████| 1.4MB 1.0MB/s 
Installing collected packages: pip
Successfully installed pip-19.2.3

Install pakacges

python3 -m pip install numpy python3 -m pip install pandas python3 -m pip install scikit-learn python3 -m pip install matplotlib python3 -m pip install py4j

Verify

python3
Python 3.6.8 (default, Aug  7 2019, 17:28:10) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sklearn
>>> import matplotlib

Zeppelin

List existing interpreters

/usr/hdp/current/zeppelin-server/bin/install-interpreter.sh -l

OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0
........
alluxio			Alluxio interpreter
angular			HTML and AngularJS view rendering
beam			Beam interpreter
bigquery			BigQuery interpreter
cassandra			Cassandra interpreter built with Scala 2.11
elasticsearch			Elasticsearch interpreter
file			HDFS file interpreter
flink			Flink interpreter built with Scala 2.11
hbase			Hbase interpreter
ignite			Ignite interpreter built with Scala 2.11
jdbc			Jdbc interpreter
kylin			Kylin interpreter
lens			Lens interpreter
livy			Livy interpreter
md			Markdown support
pig			Pig interpreter
python			Python interpreter
scio			Scio interpreter
shell			Shell command

Enable python interpreter

/usr/hdp/current/zeppelin-server/bin/install-interpreter.sh -n python

(be patient, can take several minutes)

...
Interpreter python installed under /usr/hdp/current/zeppelin-server/interpreter/python.

1. Restart Zeppelin
2. Create interpreter setting in 'Interpreter' menu on Zeppelin GUI
3. Then you can bind the interpreter on your note

Ambari, Zeppelin configuration

Ambari->Zeppelin->Configs

Parameter Value Example
zeppelin.interpreter.group.order Add python group at the end spark,angular,jdbc,livy,md,sh,python

Restart Zeppelin from Ambari console.

Add Python 3 interpreter

Logon to Zeppelin as the user with admin authority and open Interpreter setting. Create a new interpreter.

Parameter Value Example
Interpreter name Name of the interpreter python3
Interpreter group python
zeppelin.python python3

Test

Create a simple python3 notebook.

from platform import python_version print (python_version())

3.6.8

Another simple plot test. https://matplotlib.org/tutorials/introductory/usage.html#sphx-glr-tutorials-introductory-usage-py

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 2, 100)

plt.plot(x, x, label='linear')
plt.plot(x, x**2, label='quadratic')
plt.plot(x, x**3, label='cubic')

plt.xlabel('x label')
plt.ylabel('y label')

plt.title("Simple Plot")

plt.legend()

plt.show()