Next:Properties of Shannon's EntropyUp:Shannon's Entropy
Previous:IntroductionGo to:Table of Contents

Measures of Uncertainty: Shannon's Entropy


Let $ X$ be a discrete random variable taking a finite number of possible values $ x_1,x_2,...,x_n$ with probabilities $ p_1,p_2,...,p_n$ respectively such that $ p_i \geq0, i=1,2,...,n$$ \sum_{i=1}^n{p_i=1}$. We attempt to arrive at a number that will measure the amount of uncertainty. Let $ h$ be a function defined on the interval $ (0,1]$ and $ h(p)$ be interpreted as the uncertainty associated with the event $ {X=x_i}$,$ i=1,2,...,n$ or the information conveyed by revealing that $ X$ has taken on the value $ x_i$ in a given performance of the experiment. For each n, we shall define a function $ H_n$ of the n variables $ p_1,p_2,...,p_n$. The function$ H_n(p_1,p_2,...,p_n)$ is to be interpreted as the average uncertainty associated with the event $ {\{X=x_i\}},\ i=1,2,...,n$ given by

$\displaystyle H_n(p_1,p_2,...,p_n) = \sum_{i=1}^n{p_ih(p_i)}.$
    (1.1)

Thus $ H_n(p_1,p_2,...,p_n)$ is the average uncertainty removed by revealing the value of $ X$. For simplicity we shall denote

$\displaystyle \Delta_n =\left\{ P=(p_1,p_2,...,p_n):p_i\geq 0,\\sum_{i=1}^n{p_i=1}\right\}.$
We shall now present some axiomatic characterizations of the measure of uncertainty $ H_n(p_1,p_2,...,$$ p_n)$ to arrive at its exact expression. For that, let $ X$ and$ Y$ be two independent experiments with n and m values respectively. Let $ P=(p_1,p_2,...,p_n)\ \in\ \Delta_n$ be a probability distribution associated with $ X$ and$ Q=(q_1,q_2,...,q_m)\ \in\ \Delta_m$ be a probability distribution associated with $ Y$. This lead us to write that
$\displaystyle H_{nm}(P*Q) = H_n(P)+H_m(Q),$
    (1.2)

for all $ P=(p_1,p_2,...,p_n) \in \Delta_n,$$ Q=(q_1,q_2,...,q_m)\in \Delta_m,$ and $ {P*Q}=(p_1q_1,..., p_1q_m,p_2q_1,...,$$ p_2q_m,..., p_nq_1,..., p_nq_m)\in\ \Delta_{nm}$. Replacing $ p_ih(p_i)$ by $ f(p_i),$$ \forall \,\, i=1,2,...,n,$ we get

$\displaystyle H_n(P)=\sum_{i=1}^n{f(p_i)}.$
    (1.3)

Based on (1.2) and (1.3) we present the following theorem.

Theorem 1.1. Let $ H_n:\Delta_n \toI\!\!R \ (n \geq 2)$ be a function satisfying (1.2) and (1.3), where $ f$ is real valued continuous function defined over [0,1]. Then $ H_n$ is given by

$\displaystyle H_n(p_1,p_2,...,p_n)=- C \sum_{i=1}^n p_i \log_b p_i,$
    (1.4)

where $ C>0,\ b>1,$ with $ 0\log_b0=0$.

The proof is based on the following Lemma (ref. Chaundy and McLeod, 1961 [27]).

Lemma 1.1. Let $ f:[0,1] \to I\!\!R$ be a continuous function satisfying

$\displaystyle \sum_{i=1}^n{\sum_{j=1}^m{}f(p_i,q_j)} =\sum_{i=1}^n{f(p_i)}+\sum_{j=1}^m{f(q_j)},$
    (1.5)

for all $ p_i \geq 0,\ q_j \geq 0,\ \sum_{i=1}^n{p_i} =\sum_{j=1}^m{q_j}=1.$ Then

$\displaystyle f(p)= - C p \log_b p,\ c>0,\ b>1,$
    (1.6)

for all $ p\ \in\ [0,1]$ with $ 0\log_b0=0$.

Alternatively the measure (1.4) can be characterized as follows (ref. Shannon,1948 [86]; Feinstein, 1958 [36]).

Theorem 1.2. Let $ H_n:\Delta_n \toI\!\!R \ (n \geq 2)$ be a function satisfying the following axioms:

(i) $ H_2(p,1-p)$ is a continuous function of $ p\ \in\ [0,1]$.
(ii) $ H_n(p_1,p_2,...,p_n)$ be a symmetric function of its arguments.
(iii) $ H_n(p_1,p_2,...,p_n) = H_{n-1}(p_1+p_2,p_3...,p_n) + $
$\displaystyle +\,(p_1+p_2)H_2\Big({p_1\over p_1+p_2},{p_2\over p_1+p_2}\Big), \ p_1+p_2 > 0.$
Then $ H_n(p_1,p_2,...,p_n)$ is given by (1.4).

A third way to characterize the measure (1.4) is as follows (ref. Aczél and Daróczy, 1975 [2]).

Theorem 1.3. Let $ H_n:\Delta_n \to I\!\!R,\ n\geq 2$ be a function satisfying the following axioms:

(i) $ H_n(p_1,p_2,...,p_n)$ is a continuous and symmetric function with respect to its arguments.
(ii) $ H_{n+1}(p_1,p_2,...,p_n,0) = H_n(p_1,p_2,...,p_n).$
(iii) $ H_n(p_1,p_2,...,p_n)\leq H_n\Big({1\over n},...,{1\over n}\Big).$
(iv) For $ q_{kj}\geq 0,\ \sum_{k=1}^n{\sum_{j=1}^m{q_{kj}=1}},\ p_k=\sum_{j=1}^n{q_{kj}},\ \forall \ k=1,2,...,n,$ we have $ H_{nm}(q_{11},...,q_{1m},q_{21},...,q_{2m},...,q_{n1},...,q_{nm})$
$\displaystyle =H_n(p_1,p_2,...,p_n)+\sum_{k=1}^n{p_k H_m\Big({q_{k1}\overp_k},...,{q_{km} \over p_k}\Big)},\ p_k>0, \ \forall\ k.$
Then $ H_n(p_1,p_2,...,p_n)$ is given by (1.4).

The following is a different way to characterize the measure (1.4). It is based on the functional equation famous as fundamental equation of information.

Theorem 1.4. Let $ H_n:\Delta_n \to I\!\!R,\ n\geq 2$ be a function satisfying

$\displaystyle H_n(p_1,p_2,...,p_n)=\sum_{t=2}^n{(p_1+...+p_t) \psi\Big({p_t\over p_1+...+p_t}\Big)},$
where $ \psi$ satisfies the following functional equation 
$\displaystyle \psi(p)+(1-p)\psi\Big({q\over1-p}\Big)=\psi(q)+(1-q)\psi\Big({p\over 1-q}\Big), \ p,q\ \in\[0,1),\ p+q\leq 1,$
with $ K \leq \psi(p) \leq 0 $ for all $ p \in[0,1)$. Then $ H_n(p_1,p_2,...,p_n)$ is given by (1.4).

For simplicity, let us take $ b=2$ in (1.4). If we put the restriction $ H_2\bigl({1\over 2},{1\over 2}\bigl)=1$ in the above theorems with $ b=2$, we get $ C=1$. This yields

$\displaystyle H_n(p_1,p_2,...,p_n)=-\sum_{i=1}^n{p_i\log_2p_i}.$
    (1.7)
The expression (1.7) is famous as Shannon's entropy or measure of uncertainty.

For more characterizations of the measure (1.4) or (1.7) refer to Aczél and Daróczy (1975) [2] and Mathai and Rathie (1975) [71].
 


21-06-2001
Inder Jeet Taneja
Departamento de Matemática - UFSC
88.040-900 Florianópolis, SC - Brazil