Spark - CSharplie/ploosh GitHub Wiki

Ploosh can be executed over spark (in Databricks, Microsoft Fabric or local)) using spark connectors and by calling from python code.

Examples

Microsoft Fabric

Cell 1 : Install Ploosh package from PyPi package manager

pip install ploosh

Cell 2 : Mount the lakehouse to acces the case and connection files

mount_point = "/ploosh_config"
workspace_name = "ploosh"
lakehouse_name = "data"

if(mssparkutils.fs.mount(f"abfss://{workspace_name}@onelake.dfs.fabric.microsoft.com/{lakehouse_name}.Lakehouse/", mount_point)):
    ploosh_config_path =  mssparkutils.fs.getMountPath(mountPoint = mount_point)

Cell 3 : Execute ploosh framework

from ploosh import execute_cases

connections_file_path = f"{ploosh_config_path}/Files/connections.yaml"
cases_folder_path = f"{ploosh_config_path}/Files/cases"

execute_cases(cases = cases_folder_path, connections = connections_file_path, spark_session = spark)

Databricks

Cell 1 : Install Ploosh package from PyPi package manager

%pip install ploosh

Cell 2 : Restart python to make the package available

dbutils.library.restartPython()

Cell 3 : Execute ploosh framework

from ploosh import execute_cases

root_folder = "/Workspace/Shared"

execute_cases(cases=f"{root_folder}/cases", path_output=f"{root_folder}/output", spark_session=spark)

Local

Step 1 : Install Ploosh package from PyPi package manager

pip install ploosh

Step 2 : Initialize the spark session

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("Ploosh").getOrCreate()

Step 3 : Execute ploosh framework

from ploosh import execute_cases

execute_cases(cases = "test_cases", connections = "connections.yml", spark_session = spark)