A1 Central Tendency Measures - JulTob/R GitHub Wiki

📜 Ledger of Central Tendency Measures

“If data be a sea of numbers, these are the islands where they most often gather.”

Measures of central tendency summarize where values tend to cluster. They preserve the units of the data, and the best choice depends on the data’s nature and distribution.

⚖️ Arithmetic Mean

Also called the Expected Value or Average, this is the most common measure of central tendency.

📐 Definition

For a sample x of size n:

$$ \text{Mean}(x) = \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i $$

If values repeat:

$$ \bar{x} = \frac{\sum x_i \cdot n_i}{n} $$

🧪 In R:

> mean(x)

🧮 Other Means

These are less common but useful in specific cases.

▪️ Quadratic Mean (Root Mean Square)

Used when large deviations matter (e.g., error magnitudes).

$$ \text{RMS}(x) = \sqrt{\frac{1}{n} \sum x_i^2} $$

> sqrt(mean(x^2))

This is the Pythagorian mean, and corresponds with the normalized hiper-hypothenuse of all the x as perpendicular catheti.

So, if you normalize that length over the number of components, the RMS gives the average contribution per dimension to the vector’s length, a kind of hypotenuse per cathetus.

In other words, RMS tells you the average “energy” or “magnitude” of the values in a way that’s deeply rooted in Euclidean distance. It’s not literally the shortest distance from the origin to a hyperplane or diagonal, but it is tied to the vector length in Euclidean space, which is the very heart of the Pythagorean idea.

⸻

▪️ Geometric Mean

Best for ratios and growth rates (e.g., finance, biology):

$$ \text{GM}(x) = \left( \prod_{i=1}^{n} x_i \right)^{1/n} $$

> exp(mean(log(x)))  # Only works for x > 0

⸻

▪️ Harmonic Mean

Used with rates (e.g., speed = distance/time):

$$ \text{HM}(x) = \frac{n}{\sum \frac{1}{x_i}} $$

> 1 / mean(1 / x)

⸻

🔪 Median

The middle value when data are ordered. For odd n, it’s the center. For even n, it’s the average of the two middle values.

> median(x)

The median resists outliers better than the mean.

⸻

📦 Quantiles

Quantiles divide the ordered data into equal parts.

Quartiles: 4 parts (Q1 = 25%, Q2 = median, Q3 = 75%)
Percentiles: 100 parts (P25 = 25%, P90 = 90%, etc.)

> quantile(x)        # Default quartiles
> quantile(x, probs = c(0.1, 0.25, 0.5, 0.75, 0.9))  # Custom percentiles

⸻

🎯 Mode

The most frequent value in the data.

R doesn’t have a built-in mode() function for this meaning, so we use:

> table(x)
> which.max(table(x))      # Position of the mode
> names(which.max(table(x)))  # Actual mode value

If multiple values tie, this only returns one—ye can add custom handling for multimodal cases.

⸻

🧭 Choosing the Right Marker

Measure	Best When…
Mean	Data are symmetric and numeric
Median	Data are skewed or contain outliers
Mode	Data are categorical or multimodal
Geometric	Data are multiplicative or percentage-based
Harmonic	Data represent rates or ratios