Information Entropy - /s (strangemonad's notes)

The Shannon Entropy can be expressed as ^[It's often represented in it's simplified form $-\sum_i{p_i \log p_i}$. In fact Shannon presented it in this simplified form in his original paper. I prefer the the unsimplified ratio because it evokes the correct intuition - that entropy is an additive, probability-weighted measure. It also resembles the common formulation of the [[KL-Distribution divergence | KL Divergence]]]^[log base 2] $ H = \sum_i{p_i \log \frac{1}{p_i}} $ ## Rough notes - We want a measure of the rate the information is produced from a process - the expected value of the self-information of a variable - Entropy is the average amount of information needed to represent a message. - It gives a lower bound, for the number of bits needed, on average, to encode events drawn from a distribution - If we witness a rare event, we gain more information than if we witness a common event. - Lack of order and predictability - Learning that an unlikely event has occurred is more informative than learning that a likely event has occurred - Shannon's insight: the more predictable the information is, the less space is required to store it - If $s^n$ is the message space where $s$ is the number of symbols in the alphabet and $n$ is the message length. Hartley defined information $H$ as $H = \log{s^n}$ ^[often simplified as $H = n \log{s}$] i.e. information is the logarithm of the number of possible symbol sequences for sequences of a given length. ^[in the case of binary we have 2 symbols $s = 2$ so if we take the log base 2 information simplifies to simply being $n$ the number of bits] The base used is arbitrary. So long as we're consistent with the choice of base, the information content $H$ of different message spaces are comparable to each other. When using log base 2 we call the unit bits; for log base $e$, i.e. the natural logarithm, we call the units natural units or nats, base 10 we call the units digits. - But communication (and the start of the world) is rarely completely uniform. We shouldn't need to ask $n$ yes/no questions if we already know something about the context. - Shannon key insight - the information contained in a message must be somehow equivalent to a process that can generate those messages (e.g. a [[Markov chain]] / probabilistic state machine) ^[this is quite similar to arguments between [[computability theory]] and language parsing.] - You can't simply turn the sum into an integration for the continuous case because the value of entropy for a distribution wouldn't converge. You instead have to use another approach like [[KL-Distribution divergence]] - Being able to calculate entropy of a random variable gives us the ability to compute other measures like [[Mutual Information]] and also provides the basis for calculating the difference between 2 [[probability distribution|probability distributions]] either with [[Cross-entropy]] or [[KL-Distribution divergence|KL-divergence]] ## Resources - [A Gentle Introduction to Information Entropy - MachineLearningMastery.com](https://machinelearningmastery.com/what-is-information-entropy/) - [Information Entropy. A layman’s introduction to information… | by A S | Towards Data Science](https://towardsdatascience.com/information-entropy-c037a90de58f) - [Understanding Shannon entropy: (1) variability within a distribution - YouTube](https://www.youtube.com/watch?v=-Rb868HKCo8&list=WL&index=6&t=6s) - [Log probability, entropy, and intuition about uncertainty in random events – (Michael Chinen)](https://michaelchinen.com/2020/12/12/log-probability-entropy-and-intuition-about-the-uncertainty-in-random-events/#:~:text=Setting%20that%20aside%2C%20there%20are%20a) - "The Transmission of Information - Hartley (building on work of Nyquist) - "A Mathematical Theory of Communication" - Claud Shannon [shannon1948.dvi (harvard.edu)](https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf) presents not only a description of entropy but inherent bandwidth limits on a noisy channel (i.e. the channel's capacity) - [Differential entropy - Wikipedia](https://en.wikipedia.org/wiki/Differential_entropy) --- - Links: - Created at: 2023-05-08