Quality Assurance process - d-fine/Dataland GitHub Wiki
In Dataland, external contributors can aid in quality assurance by uploading so-called Quality Assurance Reports for individual data sets. Based on the quality assurance reports, data uploaders or Dataland itself can improve the quality of their data sets. The quality assurance reports are visible to any Dataland user. Data quality reports will be displayed in the front end next to the data set itself. Generally, the company association of the provider of the data quality report can also be made visible.
The upload of quality assurance reports does require REVIEWER
rights for the API. If you are interested in aiding in quality assurance for Dataland and receiving REVIEWER
rights please get in contact with Erik Breen via [email protected] .
The creation and upload of a data quality report is described in the following:
Downloading a data set to perform QA
Quality assurance can be performed for data sets in QaStatus
ACCEPTED
or PENDING
.
- Datasets for which Quality Assurance is outstanding can be identified by using the
GET /datasets
endpoint of the Dataland QA API.Reviewer
rights are required to access these. - An overview of datasets that are already accepted can be accessed through the
GET /metadata
endpoint of the Dataset API - For datasets in both,
PENDING
andACCEPTED
status, theGet /datasets/{dataId}
endpoint can be used to identify theDataTypeEnum
of the dataset which can e.g. besfdr
. - A data set can be retrieved with its framework's corresponding
GET
endpoint. E.g.GET /data/sfdr/{dataId}
for a SFDR data set.
Downloading referenced documents
In a data set, many data points will be accompanied by a data source. Taking this sample extract of a SFDR data set:
{
"companyId": "7475cfd8-4715-4495-bfcf-ae1bdaf92466",
"reportingPeriod": "2023",
"data": {
"general": {
},
"environmental": {
"greenhouseGasEmissions": {
"scope1GhgEmissionsInTonnes": {
"value": 1100,
"quality": "Reported",
"comment": "The company's greenhouse gas emissions for Scope 1 is 1,100 tons of CO2e in 2023.",
"dataSource": {
"page": 60,
"tagName": null,
"fileName": "YearlyReport.pdf",
"fileReference": "dfc3d090d4d1265f1b7dc41f52fdadf7b249af9fd852079936c4c830f4b91200"
}
}
}
},
"social": {
}
}
}
The value of 1'100 Tons of Co2e Scope 2 green house gas emissions should be validatable against page 60 of the document with reference dfc3d090d4d1265f1b7dc41f52fdadf7b249af9fd852079936c4c830f4b91200
.
With this reference, the document in question can be downloaded from the GET /{documentId}
endpoint of the Document API.
Creating a QA Report
For each framework, a corresponding QA report exists. For each Framework a corresponding endpoint exists through with QAReports for the respective framework can be uploaded. Examples are: SFDR: https://dataland.com/qa/swagger-ui/index.html#/sfdr-data-qa-report-controller/postSfdrDataQaReport EU Taxonomy non-financials: https://dataland.com/qa/swagger-ui/index.html#/eutaxonomy-non-financials-data-qa-report-controller/postEutaxonomyNonFinancialsDataQaReport
Generally, the Qa Report data model mimics the framework data model but have a comment
, verdict
, and correctedData
field for each data point. The correctedData
field should still contain all values of the data point that were correct and have incorrect values corrected. E.g. in the example above, if the value of 1100 was correct but the page on which the information can be found was 62 instead of 60 the QA Report should look as follows:
{
"companyId": "7475cfd8-4715-4495-bfcf-ae1bdaf92466",
"reportingPeriod": "2023",
"data": {
"general": {
},
"environmental": {
"greenhouseGasEmissions": {
"scope1GhgEmissionsInTonnes": {
"comment": "The state page is incorrect",
"verdict": "QaRejected",
"correctedData": {
"value": 1100,
"quality": "Reported",
"comment": "The company's greenhouse gas emissions for Scope 1 is 1,100 tons of CO2e in 2023.",
"dataSource": {
"page": 62,
"tagName": null,
"fileName": "YearlyReport.pdf",
"fileReference": "dfc3d090d4d1265f1b7dc41f52fdadf7b249af9fd852079936c4c830f4b91200"
}
}
}
}
},
"social": {
}
}
}
If the quality assurer can't correct any values, the correctedData
field should be left empty.
QA Report Norms
When creating a QA report, please follow these norms:
QA Verdict
The verdict field can be filled with the following values
Verdict value | Meaning |
---|---|
QaAccepted |
Some or all of the fields of the data point were validated and no data quality issue was found |
QaInconclusive |
For some of the fields for which QA has been attempted no clear verdict could be made. More details should be given in the comment field. |
QaRejected |
Some or all of the fields of the data point were validated and a data quality issue was found for any of the fields. If possible, the corrected value should be provided in the correctedData . If this is not possible the comment field should state why the data point was rejected |
QaNotAttempted |
None of the fields of the data point were validated |
This does imply that a data point can be marked as QaAccepted
even if not all fields were reviewed. E.g. if the QA does not check the quality field or the page field.
Page Numbers
On Dataland Page Numbers are defined as the n-th page of the document, i.e. the page number entered when looking at the PDF e.g. in the Chrome, Edge or Firefox Browser. This may deviate from the "human-readable" page which is displayed on the page itself. This may also deviate from the page displayed in Adobe Acrobat Reader as this may pick up specially defined pages such as Roman Numerals at the beginning of the document.
PENDING
to ACCEPTED
Promoting a dataset from [!NOTE] The preconditions under which a data set will be moved from
PENDING
toACCEPTED
have not been determined yet.