Home - CSharplie/ploosh GitHub Wiki
What is Ploosh?
Ploosh is yaml based framework used to automatize the testing process in data projects. It is designed to be simple to use and to be easily integrated in any CI/CD pipelines and it is also designed to be easily extended to support new data connectors.
Connectors
| Type | Native connectors | Spark connectors |
|---|---|---|
| Databases | ![]() |
![]() |
| Files | ![]() |
|
| Others | ![]() |
![]() |
| Not yet but soon | ![]() |
![]() |
Get started
Steps
- Install Ploosh package
- Setup connection file
- Setup test cases
- Run tests
- Get results
Install Ploosh package
Install from PyPi package manager:
pip install ploosh
Setup connection file
Add a yaml file with name "connections.yml" and following content:
mssql_getstarted:
type: mysql
hostname: my_server_name.database.windows.net
database: my_database_name
username: my_user_name
// using a parameter is highly recommended
password: $var.my_sql_server_password
Setup test cases
Add a folder "test_cases" with a yaml file with any name. In this example "example.yaml". Add the following content:
Test aggregated data:
options:
sort:
- gender
- domain
source:
connection: mysql_demo
type: mysql
query: |
select gender, right(email, length(email) - position("@" in email)) as domain, count(*) as count
from users
group by gender, domain
expected:
type: csv
path: ./data/test_target_agg.csv
Test invalid data:
source:
connection: mysql_demo
type: mysql
query: |
select id, first_name, last_name, email, gender, ip_address
from users
where email like "%%.gov"
expected:
type: empty
Run tests
ploosh --connections "connections.yml" --cases "test_cases" --export "JSON" --p_my_sql_server_password "mypassword"

Test results
[
{
"name": "Test aggregated data",
"state": "passed",
"source": {
"start": "2024-02-05T17:08:36Z",
"end": "2024-02-05T17:08:36Z",
"duration": 0.0032982
},
"expected": {
"start": "2024-02-05T17:08:36Z",
"end": "2024-02-05T17:08:36Z",
"duration": 6.0933333333333335e-05
},
"compare": {
"start": "2024-02-05T17:08:36Z",
"end": "2024-02-05T17:08:36Z",
"duration": 0.00046468333333333334
}
},
{
"name": "Test invalid data",
"state": "failed",
"source": {
"start": "2024-02-05T17:08:36Z",
"end": "2024-02-05T17:08:36Z",
"duration": 0.00178865
},
"expected": {
"start": "2024-02-05T17:08:36Z",
"end": "2024-02-05T17:08:36Z",
"duration": 1.49e-05
},
"compare": {
"start": "2024-02-05T17:08:36Z",
"end": "2024-02-05T17:08:36Z",
"duration": 1.8333333333333333e-07
},
"error": {
"type": "count",
"message": "The count in source dataset (55) is different than the count in the expected dataset (0)"
}
}
]
Run with spark
It's possible to run the tests with spark. To do that, you need to install the spark package or use a platform that already has it installed like Databricks or Microsoft Fabric.
See the Spark connector for more information.












