The Shannon Entropy can be expressed as ^[It's often represented in it's simplified form $-\sum_i{p_i \log p_i}$. In fact Shannon presented it in this simplified form in his original paper. I prefer the the unsimplified ratio because it evokes the correct intuition - that entropy is an additive, probability-weighted measure. It also resembles the common formulation of the [[KL-Distribution divergence | KL Divergence]]]^[log base 2]
$
H = \sum_i{p_i \log \frac{1}{p_i}}
$
## Rough notes
- We want a measure of the rate the information is produced from a process
- the expected value of the self-information of a variable
- Entropy is the average amount of information needed to represent a message.
- It gives a lower bound, for the number of bits needed, on average, to encode events drawn from a distribution
- If we witness a rare event, we gain more information than if we witness a common event.
- Lack of order and predictability
- Learning that an unlikely event has occurred is more informative than learning that a likely event has occurred
- Shannon's insight: the more predictable the information is, the less space is required to store it
- If $s^n$ is the message space where $s$ is the number of symbols in the alphabet and $n$ is the message length. Hartley defined information $H$ as $H = \log{s^n}$ ^[often simplified as $H = n \log{s}$] i.e. information is the logarithm of the number of possible symbol sequences for sequences of a given length. ^[in the case of binary we have 2 symbols $s = 2$ so if we take the log base 2 information simplifies to simply being $n$ the number of bits] The base used is arbitrary. So long as we're consistent with the choice of base, the information content $H$ of different message spaces are comparable to each other. When using log base 2 we call the unit bits; for log base $e$, i.e. the natural logarithm, we call the units natural units or nats, base 10 we call the units digits.
- But communication (and the start of the world) is rarely completely uniform. We shouldn't need to ask $n$ yes/no questions if we already know something about the context.
- Shannon key insight - the information contained in a message must be somehow equivalent to a process that can generate those messages (e.g. a [[Markov chain]] / probabilistic state machine) ^[this is quite similar to arguments between [[computability theory]] and language parsing.]
- You can't simply turn the sum into an integration for the continuous case because the value of entropy for a distribution wouldn't converge. You instead have to use another approach like [[KL-Distribution divergence]]
- Being able to calculate entropy of a random variable gives us the ability to compute other measures like [[Mutual Information]] and also provides the basis for calculating the difference between 2 [[probability distribution|probability distributions]] either with [[Cross-entropy]] or [[KL-Distribution divergence|KL-divergence]]
## Resources
- [A Gentle Introduction to Information Entropy - MachineLearningMastery.com](https://machinelearningmastery.com/what-is-information-entropy/)
- [Information Entropy. A layman’s introduction to information… | by A S | Towards Data Science](https://towardsdatascience.com/information-entropy-c037a90de58f)
- [Understanding Shannon entropy: (1) variability within a distribution - YouTube](https://www.youtube.com/watch?v=-Rb868HKCo8&list=WL&index=6&t=6s)
- [Log probability, entropy, and intuition about uncertainty in random events – (Michael Chinen)](https://michaelchinen.com/2020/12/12/log-probability-entropy-and-intuition-about-the-uncertainty-in-random-events/#:~:text=Setting%20that%20aside%2C%20there%20are%20a)
- "The Transmission of Information - Hartley (building on work of Nyquist)
- "A Mathematical Theory of Communication" - Claud Shannon [shannon1948.dvi (harvard.edu)](https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf) presents not only a description of entropy but inherent bandwidth limits on a noisy channel (i.e. the channel's capacity)
- [Differential entropy - Wikipedia](https://en.wikipedia.org/wiki/Differential_entropy)
---
- Links:
- Created at: 2023-05-08