by Ping
A lazy-evaluated multivariate kernel density estimation library for Julia.
Kernel density estimation, as known as KDE, is a classic algorithm belonging to Nonparametric statistics. The objective of KDE is to estimate a distribution given some observations without a closed-form assumption. Kernel function is required for KDE, a kernel function is essentially a probabilistic mass function, but in this scenario used for estimating an unknown distribution. In most cases, it’ll be good if kernel function have two properties:
Imagine we have \(D=\{x_{1}, x_{2}, \ldots, x_{n}\}\), a set of observations of random variable \(X\). We want to guess the probability distribution of \(X\) over kernel function \(K\), then using KDE, probability mass at point \(x\) is estimated as \(\widehat{P}(X=x)=\frac{1}{nh}\Sigma_{i}^{n} K(\frac{x-x_{i}}{h})\), there \(h\) is a hyperparameter named bandwidth that controls how \(D\) influences \(\widehat{P}\), \(\frac{1}{nh}\) is a normalization term to make sure \(\int_{-\infty}^{\infty} \widehat{P}(X=u)du = \int_{-\infty}^{\infty} K(u)du\), then easy to realize that when \(K\) is normalized, \(\widehat{P}\) is normalized as well!
Although there are several nice KDE implementations in Julia, like KernelDensity.jl and KernelDensityEstimate.jl. I feel is it necessary to have another implementation during I tried to implement BOHB. Because although very remarkable, they don’t support several needed features very well:
Demo 1: KDE visualization over 50 random observations from \( \mathcal{N}(0, 1) \) using gaussian kernel, with difference bandwidths. A smaller \(h\) makes the curve fluctuated and more sensitive to observations, and vice versa.
Demo 2: Same setting as above, but 2-dimensional version with \( \mathcal N(\begin{bmatrix} 0\\0 \end{bmatrix}, \mathrm{diag}(2)) \).
Reference(s):
[1] Nonparametric statistics,
Kernel,
Hyperparameter optimization, Wikipedia.
[2] BOHB: Robust and Efficient Hyperparameter Optimization at Scale, Falkner et al. 2018.
[3] Algorithms for Hyper-Parameter Optimization, Bergstra et al. 2011.
[4] Kernel density estimation slides, NCCU.