Quality Buckets - d-fine/Dataland GitHub Wiki
Overview
There are various quality buckets available for a data point. Alternatively, a data point may be left entirely blank, meaning no value or quality information is provided. The meaning of the different quality categories is summarized in the table below:
Quality Bucket | Meaning |
---|---|
Reported | Data point was found in the supporting document |
Incomplete | Data point was partially found but relevant parts are missing |
No Data Found | Data point was searched for but not found |
Data Point is null | Data point was not searched for or is not relevant |
Decision Tree
FAQ
Q1: What is helpful information about unavailability?
A1: Any information given in the documents which explains why the value could not be determined. This could be:
- A statement from the company explaining why they did not report a value for the field in question.
- Information on the topic of the field that provides some insights, even if it is insufficient to determine a specific value.
(See example 3)
Q2: When is a value ambiguous?
A2: A value can be ambiguous for three main reasons:
- From the information available, it is unclear if the value fits the definition of the field precisely.
- There are two or more conflicting values derived from the document.
- There is an incompleteness of information—some necessary information to confirm the value with certainty is missing from the documents.
(See examples 4 and 5)
Q3: How could an explanation of ambiguity look like?
A3: After identifying the reason for the ambiguity, the explanation should include the following:
- Why is the value assessed to potentially not fit the definition of the field precisely?
- What are the conflicting values, and on what grounds was one of them chosen?
- Which necessary information is missing?
(See examples 4 and 5)
Q4: When is an explanatory comment for unambiguous values helpful?
A4: Not all data points with unambiguous values need a comment. However, in the following cases, a comment is helpful (1) or required (2):
- If the field is a qualitative field (usually the values here are “Yes” or “No”), an explanation on how the verdict was reached is usually enriching. If the field is a quantitative field and the value is explicitly stated in the document, in case it is found in a larger table it can still be helpful to denote in the comment in which column/row it can be found. (See example 4 and 5)
- If the field is a quantitative field and its value is the result of a calculation or (currency) conversion applied by the data collector, all applied operations need to be mentioned in the comment (including exchange rates and their date, if applicable). (See examples 7 and 8)
Examples
Taunus-Sparkasse 2024
1.Datapoint:
Insurance/Reinsurance – All Fields
Quality:
null
Explanation:
Data points are left empty.
Image:
Mercedes Benz Group AG 2024
2.Datapoint:
Scope 4 GHG Emissions
Quality:
NoDataFound
Explanation:
The field was looked for, but no value or helpful information about unavailability was found.
Image:
STMICROELECTRONICS N.V. 2024
3.Datapoint:
ISO 14001 Certificate
Quality:
Incomplete
Explanation:
The company’s Annual Report provides a table listing sites and indicating which are ISO14001-certified. Since most, but not all sites are certified, the field cannot be filled with ‘Yes’ or ‘No’. Instead, the page with the table is provided and helpful information about unavailability is included in the comment.
Image:
AMPHENOL CORPORATION 2024
4.Datapoint:
Scope 3 Upstream GHG Emissions
Quality:
Incomplete
Explanation:
The field asks for the total amount of Scope 3 upstream GHG emissions (categories 1-8). The company’s report, however, does not distinguish between Categories 4 and 9. As a result, the total upstream emissions value is ambiguous and cannot reliably fit the definition. The comment contains an explanation of the ambiguity.
Image:
Daimler Truck Holding AG 2024
5.Datapoint:
Substantial Contribution to Climate Change Mitigation In Percent - Aligned
Quality:
Incomplete
Explanation:
Two conflicting values exist for this field, depending on which cells of the reporting table are used. One value (100%) comes from the cell marked in pink from the “Climate Change Mitigation” column, while another value (1.5%) results from summing aligned activities (see yellow-marked cells). The inconsistency suggests the latter value (1.5%) is correct, as the former is relative to the total aligned share of revenue. The comment explains the ambiguity in detail.
Image:
VOLKSWAGEN AKTIENGESELLSCHAFT 2024
6.Datapoint:
Scope 1 and 2 GHG Emissions Location-Based
Quality:
Reported
Explanation:
The value was not directly stated but calculated by summing related fields explicitly outlined in the document. The result is unambiguous but accompanied by an explanatory comment to clarify the calculation.
Image:
DEUTSCHE TELEKOM AG 2024
7.Datapoint:
Carbon Reduction Initiatives
Quality:
Reported
Explanation:
The field explicitly specifies Carbon Reduction Initiatives aligned with the Paris Agreement. The company lists several initiatives in its Annual Report to achieve climate neutrality by 2040. This information is summarized in the explanatory comment.
Image:
Daimler Truck Holding AG 2024
8.Datapoint:
Revenue - Aligned Share - Absolute Share
Quality:
Reported
Explanation:
The value is explicitly stated in the reporting table. An explanatory comment is included to guide users to find the value, though the comment could be omitted as it does not add explanatory information about the value itself.
Image: