Local SparkSession - pathfinder-analytics-uk/dab_project GitHub Wiki

Links and Resources


Project Files

scratch/spark.py

try:
    from databricks.connect import DatabricksSession
    spark = DatabricksSession.builder.getOrCreate()
    print("Using DatabricksSession.")
except ImportError:
    try:
        from pyspark.sql import SparkSession
        spark = SparkSession.builder.getOrCreate()
        print("Using SparkSession.")
    except ImportError:
        raise ImportError("Neither DatabricksSession nor SparkSession could be imported.")

requirements-pyspark.txt

pyspark==3.5.0
py4j==0.10.9.7
pandas==1.5.3
pyarrow==14.0.1
numpy==1.23.5
grpcio==1.60.0
grpcio-status==1.60.0
googleapis-common-protos==1.63.2

Commands

Deactivating the current Virtual Environment

deactivate

Creating a Virtual Environment

The below commands create a virtual environment.

MacOS/Linux

python3.11 -m venv .venv_pyspark

Windows

py -3.11 -m venv .venv_pyspark

Activating a Virtual Environment

The below commands activates a virtual environment.

MacOS/Linux

source .venv_pyspark/bin/activate

Windows

.venv_pyspark\Scripts\activate

Installing Python Dependencies

pip install -r requirements-pyspark.txt