Next:Entropy Rate and EntropyUp:Shannon's Entropy
Previous:Mutual Information Go to:Table of Contents

Information Measures for Discrete Random Vectors


Let us consider a random vector $ \underline{{\bfX}}=(X_1,X_2,...,X_N)$, where $ X_t$'s are the discrete finite random variables. The distribution of $ \underbar{{\bf X}}$ i.e., joint distribution of $ (X_1,X_2,...,X_N)$ is the function of$ p(x_{i_1},x_{i_2},...,x_{i_N})=Pr\{X_1=x_{i_1},...,X_n=x_{i_N}\}$, where each $ x_{i_j}\ (i_j=1,2,...,n_j,\ j=1,2,...,N)$ is the range of $ X_j$. The entropy of a discrete random vector $ \underline{{\bfX}}=(X_1,X_2,...,X_N)$ is defined as

$\displaystyle H(\underline{{\bf X}})$ $\displaystyle =$ $\displaystyle (X_1,X_2,...,X_N)$  
  $\displaystyle =$ $\displaystyle -\sum_{i_1=1}^{n_1} \sum_{i_2=1}^{n_2}...\sum_{i_N=1}^{n_N}p(x_{i_1},x_{i_2},...,x_{i_N})\log p(x_{i_1},x_{i_2},...,x_{i_N})$  
  $\displaystyle =$ $\displaystyle -\sum_{i} p({\underline{\bf x_i}})\log p({\underline{\bf x_i}})$  
  $\displaystyle =$ $\displaystyle E_{\underline{\bf X}}[-\log p({\underline{\bf x_i}})],$  

where 

$\displaystyle \sum_{i_1=1}^{n_1}{ \sum_{i_2=1}^{n_2}{...\sum_{i_N=1}^{n_N}{ p(x_{i_1},x_{i_2},...,x_{i_N})}}}=\sum_{i}{p(\underline{{\bf x}}_i)}=1.$

Similarly, we can write other measures given in section 1.4 for the discrete random vectors. Obviously, the properties 1.38 to 1.53 hold for discrete random vectors. We also have the following extra properties.

Property 1.54. We have 

$\displaystyle H(\underline{{\bfX}})\leq \sum_{t=1}^N{H(X_t)}$
with equality iff
$\displaystyle p(\underline{{\bf x}}_i)= \prod_{t=1}^N{p(x_{i_t})},\ \forall\i.$
Property 1.55. We have 
$\displaystyle H(\underline{{\bfX}}\vert\underline{{\bf Y}})\leq \sum_{t=1}^N{H(X_t\vert Y_t)},$
with equality iff 
$\displaystyle p(\underline{{\bf x}}_i\vert\underline{{\bf y}}_j)=\prod_{t=1}^N{p(x_{i_t}\vert y_{i_t})}.$
Property 1.56. We have

$\displaystyle H(X_1,...,X_N)=H(X_1)+H(X_2\vert X_1)+...+H(X_N\vert X_1,...,X_{N-1})$

$\displaystyle =\sum_{t=1}^N{H(X_t\vert X_1,...,X_{t-1})}.$

Note 1.4. The above property is famous as"Chain Rule".

Property 1.57. We have the following sequence of inequalities

$\displaystyle H(X_N\vert X_1,...,X_{N-1})\leqH(X_N\vert X_2,...,X_{N-1})\leq ... \leq H(X_N\vert X_{N-1})\leq H(X_N)$

Property 1.58. We have

(i) $ H(\underline{{\bf X}}\vert Y)=\sum_{t=1}^N{H(X_t\vert Y,X_1,...,X_{N-1}).}$
(ii) $ I(\underline{{\bf X}}\wedge Y)=\sum_{t=1}^N{H(X_t\wedge Y\vert X_1,...,X_{N-1}).}$
Property 1.59. We have 
$\displaystyle H(\underline{{\bfX}}\vert\underline{{\bf Y}})\leq Ed_H(\underl......-1)+k\ \psi \Big({1\overk}Ed_H(\underline{{\bf X}},\underline{{\bf Y}})\Big),$
where$ \psi$ is as given in (1.8).

Property 1.60. If the components$ X_1,X_2,...,X_N$ of X are independent i.e.,

$\displaystyle p(\underline{{\bf x}}_i)=\prod_{t=1}^N{p(\underline{{\bfx}}_{i_t})},\ \forall \ i,$
then 
$\displaystyle I(\underline{{\bf X}}\wedge\underline{{\bf Y}}) \geq \sum_{t=1}^N{I(X_t\wedge Y_t)},$
with equality iff 
$\displaystyle p(\underline{{\bf x}}_i\vert\underline{{\bfy}}_j)=p(x_{i_1},x_......_n}\vert y_{i_1},y_{i_2},...,y_{i_n})=\prod_{t=1}^N{p(x_{i_t}\vert y_{j_t})}.$
Property 1.61. If 
$\displaystyle p(\underline{{\bfx}}_i\vert\underline{{\bf y}}_j)= \prod_{t=1}^N{p(x_{i_t}\vert y_{j_t})},$
then 
$\displaystyle I(\underline{{\bf X}}\wedge \underline{{\bf Y}})\leq\sum_{t=1}^N{ I(X_t\wedge Y_t) },$
with equality iff
$\displaystyle p(\underline{{\bf y}}_j)= \prod_{t=1}^N{ p(y_{j_t}) },\ \forall\j$

21-06-2001
Inder Jeet Taneja
Departamento de Matemática - UFSC
88.040-900 Florianópolis, SC - Brazil