info_theory

For a random variable $X$ following distribution $p$, the information of a sample $x$ is:

$$ I(x) = \log_2 \frac{1}{p(x)} = - \log_2 p(x) $$

The entropy of the random variable $X$ is the expected information of the random variable, i.e.:

$$ H(X) = E_X [I(X)] = -\sum_x p(x) \log_2 p(x) $$

It can also be thought of the amount of uncertainty or surprise of the random variable. A uniform distribution, where every outcome is equally likely to occur, has high entropy.

The $log_2$ (i.e. binary bits) can be thought of like asking "yes/no" questions.