entropy - ObjectVision/GeoDMS GitHub Wiki
Aggregation functions entropy
- entropy(a)
- entropy(a, relation)
- entropy(a) results in a parameter with the total Shannon entropy (in bits) of the non-null values of attribute a.
- entropy(a, relation) results in an attribute with the total Shannon entropy (in bits) of the non-null values of attribute a, grouped by relation. The domain unit of the resulting attribute is the values unit of the relation.
The total Shannon entropy of a set of N observations is defined as:
entropy(a) = N · H(a)
= -∑ nᵢ · log₂(nᵢ / N)
where nᵢ is the count of each distinct non-null value and N = ∑ nᵢ is the total number of non-null observations.
This equals N times the average (per-element) Shannon entropy H(a). See average_entropy for the average Shannon entropy H(a).
For a uniform distribution over k distinct values, entropy(a) equals N · log₂(k).
The result is 0 when all observations have the same value (no uncertainty), or when N = 0 (empty partition).
- attribute a with any scalar value type
- relation with value type of the group CanBeDomainUnit
- The domain of argument a and relation must match.
14.4.0
parameter<float64> entropyLifeStyleCode := entropy(City/LifeStyleCode);
// result ≈ 8.757
attribute<float64> entropyLifeStyleCodePerRegion (Region) := entropy(City/LifeStyleCode, City/Region_rel);
| City/LifeStyleCode | City/Region_rel |
|---|---|
| 2 | 0 |
| 0 | 1 |
| 1 | 2 |
| 0 | 1 |
| 1 | 3 |
| 1 | null |
| null | 3 |
domain City, nr of rows = 7
For the total: non-null values are [2, 0, 1, 0, 1, 1], so N = 6, counts: 0→2, 1→3, 2→1.
entropy = -(2·log₂(2/6) + 3·log₂(3/6) + 1·log₂(1/6)) ≈ 8.757
| entropyLifeStyleCodePerRegion |
|---|
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
domain Region, nr of rows = 5
Each region has only one unique non-null value (or no non-null data), so entropy = 0 for all regions. Region 3 has City 6 with null LifeStyleCode (excluded) and City 4 with LifeStyleCode=1 (only one unique value → entropy 0). Region 4 has no cities at all, so N=0 and entropy = 0.
- average_entropy - the Shannon entropy per element (H = entropy / N), i.e. the standard Shannon entropy formula
- modus - the most frequently occurring value
- unique_count - number of distinct non-null values