A1 Central Tendency Measures - JulTob/R GitHub Wiki
📜 Ledger of Central Tendency Measures
“If data be a sea of numbers, these are the islands where they most often gather.”
Measures of central tendency summarize where values tend to cluster. They preserve the units of the data, and the best choice depends on the data’s nature and distribution.
⚖️ Arithmetic Mean
Also called the Expected Value or Average, this is the most common measure of central tendency.
📐 Definition
For a sample x of size n:
$$ \text{Mean}(x) = \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i $$
If values repeat:
$$ \bar{x} = \frac{\sum x_i \cdot n_i}{n} $$
🧪 In R:
> mean(x)
🧮 Other Means
These are less common but useful in specific cases.
▪️ Quadratic Mean (Root Mean Square)
Used when large deviations matter (e.g., error magnitudes).
$$ \text{RMS}(x) = \sqrt{\frac{1}{n} \sum x_i^2} $$
> sqrt(mean(x^2))
This is the Pythagorian mean, and corresponds with the normalized hiper-hypothenuse of all the x as perpendicular catheti.
So, if you normalize that length over the number of components, the RMS gives the average contribution per dimension to the vector’s length, a kind of hypotenuse per cathetus.
In other words, RMS tells you the average “energy” or “magnitude” of the values in a way that’s deeply rooted in Euclidean distance. It’s not literally the shortest distance from the origin to a hyperplane or diagonal, but it is tied to the vector length in Euclidean space, which is the very heart of the Pythagorean idea.
⸻
▪️ Geometric Mean
Best for ratios and growth rates (e.g., finance, biology):
$$ \text{GM}(x) = \left( \prod_{i=1}^{n} x_i \right)^{1/n} $$
> exp(mean(log(x))) # Only works for x > 0
⸻
▪️ Harmonic Mean
Used with rates (e.g., speed = distance/time):
$$ \text{HM}(x) = \frac{n}{\sum \frac{1}{x_i}} $$
> 1 / mean(1 / x)
⸻
🔪 Median
The middle value when data are ordered. For odd n, it’s the center. For even n, it’s the average of the two middle values.
> median(x)
The median resists outliers better than the mean.
⸻
📦 Quantiles
Quantiles divide the ordered data into equal parts.
- Quartiles: 4 parts (Q1 = 25%, Q2 = median, Q3 = 75%)
- Percentiles: 100 parts (P25 = 25%, P90 = 90%, etc.)
> quantile(x) # Default quartiles
> quantile(x, probs = c(0.1, 0.25, 0.5, 0.75, 0.9)) # Custom percentiles
⸻
🎯 Mode
The most frequent value in the data.
R doesn’t have a built-in mode() function for this meaning, so we use:
> table(x)
> which.max(table(x)) # Position of the mode
> names(which.max(table(x))) # Actual mode value
If multiple values tie, this only returns one—ye can add custom handling for multimodal cases.
⸻
🧭 Choosing the Right Marker
| Measure | Best When… |
|---|---|
| Mean | Data are symmetric and numeric |
| Median | Data are skewed or contain outliers |
| Mode | Data are categorical or multimodal |
| Geometric | Data are multiplicative or percentage-based |
| Harmonic | Data represent rates or ratios |