For a random variable $X$ following distribution $p$, the *information* of a sample $x$ is:

$$
I(x) = \log_2 \frac{1}{p(x)} = - \log_2 p(x)
$$

The *entropy* of the random variable $X$ is the expected information of the random variable, i.e.:

$$
H(X) = E_X [I(X)] = -\sum_x p(x) \log_2 p(x)
$$

It can also be thought of the amount of uncertainty or surprise of the random variable. A uniform distribution, where every outcome is equally likely to occur, has high entropy.

The $log_2$ (i.e. binary bits) can be thought of like asking "yes/no" questions.