For any 2 distributions $p(x)$ and $q(x)$ over [[Random Variable]] $X$, the KL divergence is the [[average]] [[log-likelihood ratio]].
$
D_{KL}(p||q) = \sum_x{p(x) log \frac{p(x)}{q(x)}} \geq 0
$
You can interpret KL Divergence as the relative [[Information Entropy|entropy]] between 2 [[probability distributions]]. In other words, you can interpret it as
## Rough Notes
- how far apart are 2 distributions
- the KL Divergence is also called the relative entropy and is equal to the difference between the [[Cross-entropy]] and the [[Information Entropy|Shannon Entropy]] (equivalently $\text{Cross Entropy} = \text{Entropy} + \text{KL Divergence}$)
- KL divergence is **not** symmetric which is why it's called a divergence rather than a distance.
- KL divergence is a special case of the more general Bregman divergence. The triangle inequality holds for the Bregman divergence.
## Resources
---
- Links:
- Created at: 2023-05-10