For any 2 distributions $p(x)$ and $q(x)$ over [[Random Variable]] $X$, the KL divergence is the [[average]] [[log-likelihood ratio]]. $ D_{KL}(p||q) = \sum_x{p(x) log \frac{p(x)}{q(x)}} \geq 0 $ You can interpret KL Divergence as the relative [[Information Entropy|entropy]] between 2 [[probability distributions]]. In other words, you can interpret it as ## Rough Notes - how far apart are 2 distributions - the KL Divergence is also called the relative entropy and is equal to the difference between the [[Cross-entropy]] and the [[Information Entropy|Shannon Entropy]] (equivalently $\text{Cross Entropy} = \text{Entropy} + \text{KL Divergence}$) - KL divergence is **not** symmetric which is why it's called a divergence rather than a distance. - KL divergence is a special case of the more general Bregman divergence. The triangle inequality holds for the Bregman divergence. ## Resources --- - Links: - Created at: 2023-05-10