Spark - CSharplie/ploosh GitHub Wiki
Ploosh can be executed over spark (in Databricks, Microsoft Fabric or local)) using spark connectors and by calling from python code.
Examples
Microsoft Fabric
Cell 1 : Install Ploosh package from PyPi package manager
pip install ploosh
Cell 2 : Mount the lakehouse to acces the case and connection files
mount_point = "/ploosh_config"
workspace_name = "ploosh"
lakehouse_name = "data"
if(mssparkutils.fs.mount(f"abfss://{workspace_name}@onelake.dfs.fabric.microsoft.com/{lakehouse_name}.Lakehouse/", mount_point)):
ploosh_config_path = mssparkutils.fs.getMountPath(mountPoint = mount_point)
Cell 3 : Execute ploosh framework
from ploosh import execute_cases
connections_file_path = f"{ploosh_config_path}/Files/connections.yaml"
cases_folder_path = f"{ploosh_config_path}/Files/cases"
execute_cases(cases = cases_folder_path, connections = connections_file_path, spark_session = spark)
Databricks
Cell 1 : Install Ploosh package from PyPi package manager
%pip install ploosh
Cell 2 : Restart python to make the package available
dbutils.library.restartPython()
Cell 3 : Execute ploosh framework
from ploosh import execute_cases
root_folder = "/Workspace/Shared"
execute_cases(cases=f"{root_folder}/cases", path_output=f"{root_folder}/output", spark_session=spark)
Local
Step 1 : Install Ploosh package from PyPi package manager
pip install ploosh
Step 2 : Initialize the spark session
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("Ploosh").getOrCreate()
Step 3 : Execute ploosh framework
from ploosh import execute_cases
execute_cases(cases = "test_cases", connections = "connections.yml", spark_session = spark)