Local SparkSession - pathfinder-analytics-uk/dab_project GitHub Wiki
Links and Resources
Project Files
scratch/spark.py
try:
from databricks.connect import DatabricksSession
spark = DatabricksSession.builder.getOrCreate()
print("Using DatabricksSession.")
except ImportError:
try:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
print("Using SparkSession.")
except ImportError:
raise ImportError("Neither DatabricksSession nor SparkSession could be imported.")
requirements-pyspark.txt
pyspark==3.5.0
py4j==0.10.9.7
pandas==1.5.3
pyarrow==14.0.1
numpy==1.23.5
grpcio==1.60.0
grpcio-status==1.60.0
googleapis-common-protos==1.63.2
Commands
Deactivating the current Virtual Environment
deactivate
Creating a Virtual Environment
The below commands create a virtual environment.
MacOS/Linux
python3.11 -m venv .venv_pyspark
Windows
py -3.11 -m venv .venv_pyspark
Activating a Virtual Environment
The below commands activates a virtual environment.
MacOS/Linux
source .venv_pyspark/bin/activate
Windows
.venv_pyspark\Scripts\activate
Installing Python Dependencies
pip install -r requirements-pyspark.txt