ÊÊÊÊÊÊÊÊÊÊÊ Claude Shannonâs seminal 1948 paper entitled ÎThe Mathematics of Communicationâ (1) introduced the notion of the entropy of a random variable.Ê He desired a quantification of the amount of ãchoiceä in the selection of an event.Ê He proposed the following properties for this function H(X):
1. That it be continuous in the probabilities of X.
2. That it should increase monotonically with the number of possible outcomes when the distribution of X is uniform.
3. If a choice can be broken down into two successive choices, the original H should be the weighted sum of the individual values of H.
The only function that satisfies these conditions is the (Shannon) entropy:
H=-sum(pi*log(pi))
Where the sum is over all pi and the units are based on the base of the logarithm.Ê If we have a base 2 logarithm, then the units are bits.Ê When the pi are uniform, we recover the better-known thermodynamic (Boltzmann) entropy.Ê A generalized second law of thermodynamics can be proved in which any combination of pis increases the entropy.Ê Furthermore, H is the average number of bits necessary to specify a choice from the probability distribution of X.Ê It is also the average number of bits per symbol.Ê As such, it provides a theoretical bound for the limits of data compression.Ê It can be also be thought of as the amount of information that needs to be put into the axiom of choice to choose a right inverse for some surjective function, among numerous other interpretations. Intuitively, a sharply peaked distribution has more certainty and hence lower entropy than a broad, flat distribution.Ê The uniform distribution has the maximum entropy for discrete distributions, and the Gaussian has the maximum entropy for a continuous distribution with some standard deviation.Ê (1,2)
Entropy does not imply content or semantics, it merely bounds the amount of symbolic information that can be contained therein.Ê Also beware of the equating uncertainty with entropy and entropy with information, let you equate uncertainty with information.Ê One should think of an increase in uncertainty/entropy as a decrease in information.
Example: Suppose that the distribution of nucleotides in a certain DNA sequence are 37% purines and 63% pyrimidines.Ê Calculate the entropy in nats (base e).
H=-(.37*ln(.37)+.63*ln(.63))=0.659
nats
Information-theoretic quantities based on entropy and relative entropy arise in BLAST searches (Karlin-Altschul Statistics) (3)