Next:Properties of Shannon's EntropyUp:Shannon's Entropy
Previous:IntroductionGo to:Table of Contents

Measures of Uncertainty: Shannon's Entropy

Let be a discrete random variable taking a finite number of possible values with probabilities respectively such that $p_i \geq0, i=1,2,...,n$ $\sum_{i=1}^n{p_i=1}$ . We attempt to arrive at a number that will measure the amount of uncertainty. Let be a function defined on the interval and be interpreted as the uncertainty associated with the event ${X=x_i}$ , or the information conveyed by revealing that has taken on the value in a given performance of the experiment. For each n, we shall define a function of the n variables . The function is to be interpreted as the average uncertainty associated with the event ${\{X=x_i\}},\ i=1,2,...,n$ given by

$\displaystyle H_n(p_1,p_2,...,p_n) = \sum_{i=1}^n{p_ih(p_i)}.$

(1.1)

Thus is the average uncertainty removed by revealing the value of . For simplicity we shall denote

$\displaystyle \Delta_n =\left\{ P=(p_1,p_2,...,p_n):p_i\geq 0,\\sum_{i=1}^n{p_i=1}\right\}.$ We shall now present some axiomatic characterizations of the measure of uncertainty

to arrive at its exact expression. For that, let

and

be two independent experiments with n and m values respectively. Let $P=(p_1,p_2,...,p_n)\ \in\ \Delta_n$ be a probability distribution associated with

and $Q=(q_1,q_2,...,q_m)\ \in\ \Delta_m$ be a probability distribution associated with

. This lead us to write that

$\displaystyle H_{nm}(P*Q) = H_n(P)+H_m(Q),$

(1.2)

for all $P=(p_1,p_2,...,p_n) \in \Delta_n,$ $Q=(q_1,q_2,...,q_m)\in \Delta_m,$ and ${P*Q}=(p_1q_1,..., p_1q_m,p_2q_1,...,$ $p_2q_m,..., p_nq_1,..., p_nq_m)\in\ \Delta_{nm}$ . Replacing by $\forall \,\, i=1,2,...,n,$ we get

$\displaystyle H_n(P)=\sum_{i=1}^n{f(p_i)}.$

(1.3)

Based on (1.2) and (1.3) we present the following theorem.

Theorem 1.1. Let $H_n:\Delta_n \toI\!\!R \ (n \geq 2)$ be a function satisfying (1.2) and (1.3), where is real valued continuous function defined over [0,1]. Then is given by

$\displaystyle H_n(p_1,p_2,...,p_n)=- C \sum_{i=1}^n p_i \log_b p_i,$

(1.4)

where $C>0,\ b>1,$ with $0\log_b0=0$ .

The proof is based on the following Lemma (ref. Chaundy and McLeod, 1961 [27]).

Lemma 1.1. Let $f:[0,1] \to I\!\!R$ be a continuous function satisfying

$\displaystyle \sum_{i=1}^n{\sum_{j=1}^m{}f(p_i,q_j)} =\sum_{i=1}^n{f(p_i)}+\sum_{j=1}^m{f(q_j)},$

(1.5)

for all $p_i \geq 0,\ q_j \geq 0,\ \sum_{i=1}^n{p_i} =\sum_{j=1}^m{q_j}=1.$ Then

$\displaystyle f(p)= - C p \log_b p,\ c>0,\ b>1,$

(1.6)

for all $p\ \in\ [0,1]$ with $0\log_b0=0$ .

Alternatively the measure (1.4) can be characterized as follows (ref. Shannon,1948 [86]; Feinstein, 1958 [36]).

Theorem 1.2. Let $H_n:\Delta_n \toI\!\!R \ (n \geq 2)$ be a function satisfying the following axioms:

(i)

is a continuous function of $p\ \in\ [0,1]$ .

(ii)

be a symmetric function of its arguments.

(iii) $H_n(p_1,p_2,...,p_n) = H_{n-1}(p_1+p_2,p_3...,p_n) +$

$\displaystyle +\,(p_1+p_2)H_2\Big({p_1\over p_1+p_2},{p_2\over p_1+p_2}\Big), \ p_1+p_2 > 0.$ Then

is given by (1.4).

A third way to characterize the measure (1.4) is as follows (ref. Aczél and Daróczy, 1975 [2]).

Theorem 1.3. Let $H_n:\Delta_n \to I\!\!R,\ n\geq 2$ be a function satisfying the following axioms:

(i)

is a continuous and symmetric function with respect to its arguments.

(ii) $H_{n+1}(p_1,p_2,...,p_n,0) = H_n(p_1,p_2,...,p_n).$

(iii) $H_n(p_1,p_2,...,p_n)\leq H_n\Big({1\over n},...,{1\over n}\Big).$

(iv) For $q_{kj}\geq 0,\ \sum_{k=1}^n{\sum_{j=1}^m{q_{kj}=1}},\ p_k=\sum_{j=1}^n{q_{kj}},\ \forall \ k=1,2,...,n,$ we have $H_{nm}(q_{11},...,q_{1m},q_{21},...,q_{2m},...,q_{n1},...,q_{nm})$

$\displaystyle =H_n(p_1,p_2,...,p_n)+\sum_{k=1}^n{p_k H_m\Big({q_{k1}\overp_k},...,{q_{km} \over p_k}\Big)},\ p_k>0, \ \forall\ k.$ Then

is given by (1.4).

The following is a different way to characterize the measure (1.4). It is based on the functional equation famous as fundamental equation of information.

Theorem 1.4. Let $H_n:\Delta_n \to I\!\!R,\ n\geq 2$ be a function satisfying

$\displaystyle H_n(p_1,p_2,...,p_n)=\sum_{t=2}^n{(p_1+...+p_t) \psi\Big({p_t\over p_1+...+p_t}\Big)},$ where $\psi$ satisfies the following functional equation $\displaystyle \psi(p)+(1-p)\psi\Big({q\over1-p}\Big)=\psi(q)+(1-q)\psi\Big({p\over 1-q}\Big), \ p,q\ \in\[0,1),\ p+q\leq 1,$ with $K \leq \psi(p) \leq 0$ for all $p \in[0,1)$ . Then

is given by (1.4).

For simplicity, let us take in (1.4). If we put the restriction $H_2\bigl({1\over 2},{1\over 2}\bigl)=1$ in the above theorems with , we get . This yields

$\displaystyle H_n(p_1,p_2,...,p_n)=-\sum_{i=1}^n{p_i\log_2p_i}.$

(1.7)

The expression (1.7) is famous as Shannon's entropy or measure of uncertainty.

For more characterizations of the measure (1.4) or (1.7) refer to Aczél and Daróczy (1975) [2] and Mathai and Rathie (1975) [71].

21-06-2001

Inder Jeet Taneja

Departamento de Matemática - UFSC

88.040-900 Florianópolis, SC - Brazil