Density Estimation - shivamvats/notes GitHub Wiki

Kernel Density Estimation

Nearest-neighbour based approach
Takes in two parameters - a kernel function and a bandwidth
Given an input, it computes its density by summing contributions from every point in its training data. Each contribution is computed by applying the kernel on the distance of the input from every a point scaled by the bandwidth.
Low bandwidth leads to greater weight to nearby points but high variance. High bandwidth leads to smooth density function but high bias.

Gaussian Mixture Model

The density distribution is approximated with a mixture of k (specified) Gaussian distributions. The density at every point is sum(a_i * N(mu_i, sigma_i) where sum(a_i) = 1.

CDF Estimation

Estimating the CDF is easy. Use the Empirical Distribution (frequentist approach). Basically assign probability mass 1/n at every point and use the resulting step-function CDF as the estimate. This empirical CDF is guaranteed to converge to the true CDF exponentially fast in the number of data-points, i.e., the probability that max error between these two distributions is greater that epsilon goes down exponentially fast in n.

details2

Density Estimation - shivamvats/notes GitHub Wiki

Kernel Density Estimation

Gaussian Mixture Model

CDF Estimation

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️