Python and Version management - cchantra/bigdata.github.io GitHub Wiki

pyenv allows you to install and switch between multiple Python versions easily without interfering with the system interpreter.

Install system dependencies required to build Python from source:

sudo apt update
sudo apt install build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev python3-openssl git

Install pyenv: Follow the installation instructions on the pyenv GitHub page.

curl -fsSL https://pyenv.run | bash

A common way is using the automatic installer. Use pyenv to install Python 3.8:

pyenv install 3.8

Set Python 3.8 as the default for your user or a specific project: Set as global default for your user

pyenv global 3.8  

Or set for a specific project directory

cd my_project_folder
pyenv local 3.8

Then install for your python3.8

pip install ipykernel

Create example script that active jupyter lab using this env.

vi ./start-pyspark-jupyter.sh
#!/bin/bash

export HOME=/home/hadoop

# pyenv
export PYENV_ROOT="$HOME/.pyenv"
export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init --path)"
eval "$(pyenv init -)"
pyenv shell 3.8

# Java & Spark (IMPORTANT)
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export SPARK_HOME=/home/hadoop/spark
export PATH=$SPARK_HOME/bin:$JAVA_HOME/bin:$PATH

# Python for Spark
export PYSPARK_PYTHON=$(pyenv which python)

# JupyterLab
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="lab \
  --no-browser \
  --ip=0.0.0.0 \
  --port=8888 \
  --ServerApp.token='' \
  --ServerApp.password=''"

# Start
$SPARK_HOME/bin/pyspark \
  --py-files graphframes-0.8.2-spark3.1-s_2.12.jar \
  --jars /home/hadoop/spark/jars/graphframes-0.8.2-spark3.1-s_2.12.jar

and

chmod +x ./start-pyspark-jupyter.sh

Try

./start-pyspark-jupyter.sh

If it works, create a service file

vi /etc/systemd/system/Jupyter-3.8.service
[Unit]
Description=PySpark Jupyter Notebook Service
After=network.target

[Service]
Type=simple
User=hadoop
WorkingDirectory=/home/hadoop
ExecStart=/home/hadoop/start-pyspark-jupyter.sh
Restart=always
RestartSec=10

# Environment (optional if already in script)
Environment=PYSPARK_DRIVER_PYTHON=jupyter
Environment=PYSPARK_DRIVER_PYTHON_OPTS=notebook --no-browser --ip=0.0.0.0 --port=8888

# Logging
StandardOutput=append:/var/log/pyspark-jupyter.log
StandardError=append:/var/log/pyspark-jupyter-error.log

[Install]
WantedBy=multi-user.target

Then

sudo systemctl enable Jupyter-3.8.service
sudo systemctl start  Jupyter-3.8.service

sudo systemctl status  Jupyter-3.8.service
<img width="1435" height="428" alt="Screen Shot 2569-04-08 at 09 37 19" src="https://github.com/user-attachments/assets/6c56b612-ae01-4afe-ad3a-171961d98f2d" />