Next:Mutual InformationUp:Shannon's Entropy
Previous:Properties of Shannon's Entropy Go to:Table of Contents

Multivariate Entropies

Let $ X=\{ x_1,x_2,...,x_n \}$ and $ Y=\{y_1,y_2,...,y_n \}$ be two discrete finite random variables with joint and individual probability distributions given by

$\displaystyle p(x_i,y_j)=Pr\{ X=x_i,Y=y_j \},\ p(x_i,y_j)\geq 0,\\sum_{i=1}^n{\sum_{j=1}^m{p(x_i,y_j)}}=1,$
$\displaystyle p(x_i)=Pr\{ X=x_i\},\ p(x_i)\geq 0,\ \sum_{i=1}^n{p(x_i)}=1,$
$\displaystyle p(y_j)=Pr\{Y=y_j \},\ p(y_j)\geq 0,\ \sum_{i=1}^n{p(y_j)}=1.$
The conditional probability of $ Y=y_j$ given $ X=x_i$ is given by
$\displaystyle p(y_j\vert x_i)=Pr\{ Y=y_j\vert X=x_i \},\ p(y_j\vert x_i)\geq 0 \ {\rm for \each} \ i$
$\displaystyle \sum_{j=1}^m{p(y_j\vert x_i)}=1, \ \forall\i=1,2,...,n.$
The conditional probability of $ X=x_i$ given$ Y=y_j$ is given by 
$\displaystyle p(x_i\vert y_j)=Pr\{ X=x_i\vert Y=y_j \},\p(x_i\vert y_j)\geq 0 \ {\rm for \ each} \ j$
$\displaystyle \sum_{i=1}^n{p(x_i\vert y_j)}=1, \ \forall\ j=1,2,...,m.$
The following relations are well known in the literature:

$\displaystyle p(x_i,y_j)=p(x_i)p(y_j\vert x_i)=p(y_j)p(x_i\vert y_j),$

$\displaystyle p(x_i)=\sum_{j=1}^m{p(x_i,y_j)},$
$\displaystyle p(y_j)=\sum_{i=1}^n{p(x_i,y_j)},$
for each $ i=1,2,...,n;j=1,2,...,m.$ When $ X$ and $ Y$ are independent, we have

$\displaystyle p(x_i\vert y_j)=p(x_i);\ p(y_j\vert x_i)=p(y_j), $

$\displaystyle p(x_i,y_j)=p(x_i)p(y_j),$
for each $ i=1,2,...,n; \ j=1,2,...m.$ Based on the above notations, we give now the joint, individual and conditional measures of uncertainty. The joint measure of uncertainty of $ (X,Y)$ is given by
$\displaystyle H(X,Y)=-\sum_{i=1}^n{\sum_{j=1}^m{p(x_i,y_j)\log p(x_i,y_j)}}.$
The individual measures of uncertainty of $ X$ and $ Y$ are given by 

$\displaystyle H(X)=-\sum_{i=1}^n{p(x_i)\logp(x_i)}=-\sum_{i=1}^n{\sum_{j=1}^m{p(x_i,y_j)\log p(x_i)}},$

$\displaystyle H(Y)=-\sum_{j=1}^m{p(y_j)\logp(y_j)}=-\sum_{i=1}^n{\sum_{j=1}^m{p(x_i,y_j)\log p(y_j)}},$
respectively The conditional uncertainty of Y given $ X=x_i$ is given by
$\displaystyle H(Y\vert X=x_i)=-\sum_{j=1}^m{p(y_j\vert x_i) \log p(y_j\vert x_i)},$
for each$ i=1,2,...,n$. The conditional uncertainty of $ Y$ given $ X$ is the average uncertainty of $ H(Y\vert X=x_i)$ with the probabilities $ p(x_i)$ is given by
$\displaystyle H(Y\vert X)=\sum_{i=1}^n{p(x_i)H(Y\vert X=x_i)}=-\sum_{i=1}^n{\sum_{j=1}^m{p(x_i,y_j)\logp(y_j\vert x_i).}}$
Similarly, we can write the conditional uncertainty of $ X$ given $ Y$ as
$\displaystyle H(X\vert Y)=\sum_{j=1}^m{p(y_j)H(X\vert Y=y_j)}=-\sum_{i=1}^n{\sum_{j=1}^m{p(x_i,y_j)\logp(x_i\vert y_j).}}$
In case of three random variables $ X=\{x_1,...,x_n \}, \ Y=\{ y_1,...,y_m \}$ and $ Z=\{ z_1,...,z_l \}$ with their respective probability distributions, we have the following measures of uncertainty 

$\displaystyle H(X,Y,Z) =-\sum_{i=1}^{n}\sum_{j=1}^{m}\sum_{k=1}^{l}p(x_i,y_j,z_k)\logp(x_i,y_j,z_k) $

$\displaystyle H(X)=-\sum_{i=1}^{n}\sum_{j=1}^{m}\sum_{k=1}^{l}p(x_i,y_j,z_k)\logp(x_i)$
$\displaystyle H(X\vert Y) =-\sum_{i=1}^{n}\sum_{j=1}^{m}\sum_{k=1}^{l}p(x_i,y_j,z_k)\logp(x_i\vert y_j)$
$\displaystyle H(X,Y) =-\sum_{i=1}^{n}\sum_{j=1}^{m}\sum_{k=1}^{l}p(x_i,y_j,z_k)\logp(x_i,y_j)$
$\displaystyle H(X,Y\vert Z) = \sum_{k=1}^{l}p(z_k)H(X,Y\vert Z=z_k)= -\sum_{i=1}^{n}\sum_{j=1}^{m}\sum_{k=1}^{l}p(x_i,y_j,z_k)\log p(x_i,y_j\vert z_k) $
$\displaystyle H(X\vert Y,Z) =\sum_{j=1}^{m}\sum_{k=1}^{l}p(y_j,z_k)H(X\vert Y=y_j,Z=z_k) $
$\displaystyle =-\sum_{i=1}^{n}\sum_{j=1}^{m} \sum_{k=1}^{l}p(x_i,y_j,z_k)\logp(x_i\vert y_j,z_k) $

The following properties hold for the above uncertainty measures.

Property 1.38. We have

(i) $ H(X) \geq 0, \ H(X,Y) \geq 0,\ H(X,Y,Z) \geq 0.$
(ii) $ H(X\vert Y) \geq 0, \ H(X\vert Y,Z) \geq 0, \ H(X,Y\vert Z) \geq 0.$
Property 1.39. We have
(i) $ H(X,Y)=H(X)+H(Y\vert X)=H(Y)+H(X\vert Y).$
(ii) $ H(X,Y,Z) = H(X)+H(Y,Z\vert X)$ $ = H(X,Y)+H(Z\vert X,Y) $ $ = H(X)+H(Y\vert X)+H(Z\vert X,Y).$
Property 1.40. We have
(i) $ H(X\vert Y) \leq H(X)$, with equality iff $ X$ and $ Y$ are independent i.e., $ p(x_i,y_j)=p(x_i)p(y_j),$ $ \forall \,\, i,j.$
(ii) $ H(X\vert Y,Z) \leq H(X\vert Z)$, with equality iff $ X$ and $ Y$ are conditionally independent given $ Z$ i.e., $ p(x_i,y_j\vert z_k)=p(x_i\vert z_k)p(y_j\vert z_k),\, \forall \,\, i,j$ and each $ k$.
(iii) $ H(X,Y\vert Z) \leq H(X,Y)$, with equality iff $ (X,Y)$ and $ Z$ are independent i.e., $ p(x_i,y_j,z_k)=p(x_i,y_j)p(z_k), \, \forall\,\, i,j,k.$
Note 1.2. Since the random variables $ Y$ and $ Z$ are symmetric among them, then from the property 1.40(ii), we can write

$\displaystyle H(X\vert Y,Z) \leq max \{H(X\vert Y),H(X\vert Z)\}.$

Property 1.41. We have
(i) $ H(X,Y) \geq max\{H(X),H(Y)\}.$
(ii) $ H(X,Y,Z) \geq max\{H(X),H(Y),H(Z)\}.$
(iii) $ H(X,Y,Z) \geq max\{H(X,Y),H(Y,Z),H(Z,X)\}.$
(iv) $ H(X,Y\vert Z) \geq max\{H(X\vert Y),H(Y\vert Z)\}.$
Property 1.42. We have
(i) $ H(X,Y) \leq H(X)+H(Y)$, with equality iff $ X$ and $ Y$ are independent i.e., $ \ p(x_i,y_j)=p(x_i)p(y_j), \ \forall\ i,j.$
(ii) $ H(X,Y,Z) \leq H(X)+H(Y)+H(Z)$, with equality iff $ X$$ Y$ and $ Z$ are independent i.e., iff $ p(x_i,y_j,z_k)=p(x_i)p(y_j)p(z_k), \\forall\ i,j,k.$
(iii) $ H(X,Y\vert Z) \leq H(X\vert Z)+H(Y\vert Z)$, with equality iff $ Y$ and $ Z$ are conditionally independent given $ Z$ i.e., iff$ p(x_i,y_j\vert z_k)=p(x_i\vert z_k)p(y_j\vert z_k),\forall\ i,j,k.$
Property 1.43. We have
(i) $ H(X\vert Z) \leq H(X\vert Y)+H(Y\vert Z).$
(ii) If $ H(\dot,\dot) \neq 0$, then
$\displaystyle {H(X\vert Z)\over H(X,Z)} \leq {H(X\vert Y)\over H(X,Y)}+{H(Y\vert Z)\overH(Y,Z)}.$
Property 1.44. For each k, define
$\displaystyle A(z_k)=\sum_{i=1}^n{\sum_{j=1}^m{p(y_j)p(z_k\vert x_i,y_j)}}.$
$\displaystyle H(X\vert Y) \leq H(Z)+ \sum_{k=1}^l{p(z_k)\log A(z_k)}.$
Property 1.45. Let $ P_e=Pr\{X\not= Y\}$. Then 

$\displaystyle H(X\vert Y)\leq H(P_e)+ P_e \log (n-1).$

Note 1.3. The property 1.45 is famous as "Fano-inequality".

For four discrete random variables $ X_1,\ X_2,\ X_3{\rm\ and \ } X_4$ the following property holds.

Property 1.46. We have

(i) $ H(X_1,X_2,X_3,X_4)=H(X_1,X_2,X_3)+H(X_4\vert X_1,X_2,X_3)$ $ =H(X_1,X_2) + H(X_3\vert X_1,X_2) + H(X_4\vert X_1,X_2,X_3)$ $ =H(X_1) + H(X_2\vert X_1) + H(X_3\vert X_1,X_2) + H(X_4\vert X_1,X_2,X_3).$
(ii) $ H(X_1,X_2\vert X_3,X_4)=H(X_1\vert X_3,X_4)+H(X_2\vert X_1,X_3,X_4).$
(iii) $ H(X_1,X_2,X_3\vert X_4)=H(X_1,X_2\vert X_3,X_4)+H(X_3\vert X_1,X_2,X_4).$
(iv) $ H(X_1,X_2,X_3,X_4)\geq max\{H(X_t)\}, t=1,2,3 {\rm\ and \ } 4.$
(v) $ H(X_1,X_2,X_3,X_4)\geq max\{H(X_t,X_r)\}, t,r=1,2,3 {\rm\ and \ } 4.$
(vi) $ H(X_1,X_2,X_3,X_4)\geq max\{H(X_t,X_r,X_p)\}, t,r,p=1,2,3 \ {\rm and }\ 4.$
(vii) $ max\{H(X_1\vert X_2,X_3,X_4),H(X_2\vert X_1,X_3,X_4)\}$ $ \leq H(X_1,X_2\vert X_3,X_4) $ $ \leq min\{H(X_1,X_2,X_3\vert X_4),H(X_1,X_2,X_4\vert X_3)\}.$
(viii) $ H(X_1,X_2\vert X_3,X_4) \leq H(X_1\vert X_3,X_4)+H(X_2\vert X_3,X_4).$

Inder Jeet Taneja
Departamento de Matemática - UFSC
88.040-900 Florianópolis, SC - Brazil