Polar Coding Notes III

Posted October 15, 2018 | View revision history

For any B-DMC $W$, the channels $\{W_N^{(i)}\}$ polarize in the sense that, for any fixed $\delta \in (0, 1)$, as $N$ goes to infinity through powers of two, the fraction of indices $i \in \{1, \dots, N\}$ for which $I(W_N^{(i)}) \in (1 − \delta, 1]$ goes to $I(W)$ and the fraction for which $I(W_N^{(i)}) \in [0, \delta)$ goes to $1−I(W)^{[1]}$.

Mrs. Gerber’s Lemma

Mrs. Gerber’s Lemma provides a lower bound on the entropy of the modulo-$2$ sum of two binary random vectors$^{[2][3]}$.
Let $h^{-1} : [0, 1] \to [0, 1/2]$ be the inverse of the binary entropy function $h(p) = -p\log p - (1-p)\log(1-p)$.
Here we set a binary symmetric channal with crossover probability $p_0$.
The convolution of $a$ and $b$ is denoted by

$$a \ast b := a(1 − b) + (1 − a)b$$

Convex: The function $f(u) = h(h^{-1}(u)\ast p_0), u \in [0,1]$ is convex in $u$ for every fixed $p_0 \in (0,1/2]^{[2]}$.
Scalar MGL: Let $X$ be a binary random variable and let $U$ be an arbitrary random variable. If $Z \sim Bern(p)$ is independent of $(X, U)$ and $Y = X \oplus Z$, then

$$h(Y|U) \ge h\big(h^{−1}(h(X|U)) \ast p\big)$$

Vector MGL: Let $X^n$ be a binary random vector and $U$ be an arbitrary random variable. If $Z^n$ is a vector of independent and identically distributed $Bern(p)$ random variables independent of $(X^n, U)$ and $Y^n = X^n \oplus Z^n$, then

$${h(Y^n|U) \over n} \ge h\big(h^{−1}({h(X^n|U) \over n}) \ast p\big)$$

Let $X,Y$ are two binary random variable, from$^{[2]}$

$$H(X|Y) = \sum_{y\in\{0,1\}} p(Y=y)H(X|Y=y) \tag{conditional entropy definition}$$

Let

$$\beta_0(y) = p(X=1|Y=y) \tag{1}$$ $$1 - \beta_0(y) = p(X=0|Y=y)$$

Then

$$\begin{align} H(X|Y=y) &= \sum_{x\in\{0,1\}} p(x|y)\log p(x|y) \tag{conditional entropy definition} \\&= h(\beta_0(y)) \tag{2}\end{align}$$

So that

$$h^{-1}(H(X|Y=y)) = p(X=1|Y=y) \tag{3}$$

and

$$\begin{align} H(X|Y) &= \sum_{y\in\{0,1\}} p(Y=y)H(X|Y=y) \\&= \sum_{y\in\{0,1\}} p(Y=y)h(\beta_0(y)) \\&= Eh(\beta_0) \tag{$\beta_0(y)=p(X=1|Y)$ as a RV} \end{align}$$

Let us consider first

$$\begin{align} H(X_1,X_2, ... ,X_n) &= \sum_{i=1}^n H(X_i|X_{i-1}, ... , X_1) \\&= \sum_{i=1}^n Eh(\beta_k) \tag{$\beta_k = p(X_k=1 | X_1, ... , X_{k-1}), 1 \le k \le n$} \end{align}$$

Now we have

$$\begin{align} p(Y_k=1 | X_1, ... , X_{k-1}) &= \sum_{X_k \in \{0,1\}} p(Y_k | X_k) p(X_k | X_1, ... , X_{k-1}) \\&= p_0 \ast \beta_k \end{align}$$

and

$$H(Y_k=1 | X_1, ... , X_{k-1}) = Eh(p_0 \ast \beta_k)$$

Strict Polarization for Binary-Input Channels

First notice that

$$\begin{align} I(W^-) &= I(U_1; Y_1,Y_2) \\&= I(X_1+X_2; Y_1,Y_2) \\&= H(X_1+X_2) - H(X_1+X_2|Y_1,Y_2) \end{align}$$

and

$$I(W) = I(X_1; Y_1) = 1- H(X_1|Y_1)$$

Since $H(X_1|Y_1) \in (0, 1)$, there exists an $\alpha \in (0, 1/2)$ such that $H(X_1|Y_1) = h(\alpha)$.

$$\begin{align} H(X_1+X_2|Y_1,Y_2) &= \sum_{y_1,y_2} p(y_1)p(y_2)H(X_1+X_2|y_1y_2) \\&= \sum_{y_1,y_2} p(y_1)p(y_2)h\big(\beta_0(y_1,y_2)\big) \tag{from (2)} \\&= \sum_{y_1,y_2} p(y_1)p(y_2)h\big(p(X_1+X_2 = 1|y_1y_2)\big) \tag{from (1)} \\&= \sum_{y_1,y_2} p(y_1)p(y_2)h\big(p(X_1=1|y_1)p(X_2=0|y_2) + p(X_1=0|y_1)p(X_2=1|y_2)\big) \\&= \sum_{y_1,y_2} p(y_1)p(y_2)h\big(p(X_1=1|y_1)\ast p(X_2=1|y_2)\big) \\&= \sum_{y_1,y_2} p(y_1)p(y_2)h\Big(h^{-1}\big(H(X_1|y_1)\big)\ast h^{-1}\big(H(X_2|y_2)\big)\Big) \tag{from (3)} \\&\ge \sum_{y_2} p(y_2)h\Big(h^{-1}\big(\sum_{y_1}p(y_1)H(X_1|y_1)\big)\ast h^{-1}\big(H(X_2|y_2)\big)\Big) \tag{Jensen’s inequality} \\&= \sum_{y_2} p(y_2)h\Big(h^{-1}\big(H(X_1|Y_1)\big)\ast h^{-1}\big(H(X_2|y_2)\big)\Big) \\&\ge h\Big(h^{-1}\big(H(X_1|Y_1)\big)\ast h^{-1}\big(H(X_2|Y_2)\big)\Big) \tag{Jensen’s inequality} \\&= h(\alpha \ast \alpha) \end{align}$$

Thus, we have that

$$I(W^-) \le 1 − h(\alpha \ast \alpha) < 1 − h(\alpha) = I(W)$$

So what we have concluded is that for every $\delta > 0$, there exists $\kappa(\delta) > 0$ such that if $I(W) \in (\delta, 1 − \delta)$, we have

$$\Delta(W) = {1\over 2}[I(W^+) - I(W^-) \ge \kappa(\delta) >0$$

Proof of Channel Polarization

Given $W$ and $\delta > 0$, define$^{[4][5]}$

$$\begin{align} \theta_n(\delta) &:= {1\over 2^n} \#\big\{s \in \{+, −\}^n : I(W^s) \in (\delta, 1-\delta)\big\} \tag{$\#$ means the cardinality of a set} \\&= {1\over 2^n} \sum_{s \in \{\pm\}^n} \Bbb{1}_{\{I(W^s) \in (\delta, 1-\delta])\}} \tag{$\Bbb{1}_{\{\cdot\}}=1$ if {$\cdot$} is true else $=0$} \end{align}$$

Let

$$\begin{align} \mu_n &= {1\over 2^n} \sum_{s \in \{\pm\}^n} I(W^s) \tag{e.g. $\mu_1 = {1\over 2}[I(W^+)+I(W^-)]$} \\ \nu_n &= {1\over 2^n} \sum_{s \in \{\pm\}^n} [I(W^s)]^2 \tag{e.g. $\nu_1 = {1\over 2} [I^2(W^+)+I^2(W^-)]$} \end{align}$$ $$\begin{align} \mu_{n+1} &= {1\over 2^{n+1}} \sum_{s \in \{\pm\}^{n+1}} I(W^s) \\&= {1\over 2^n} \sum_{t \in \{\pm\}^n} {1\over 2} [I(W^{t+})+I(W^{t-})] \\&= {1\over 2^n} \sum_{t \in \{\pm\}^n} I(W^t) \\&= \mu_n = \mu_0 = I(W) \\ \nu_{n+1} &= {1\over 2^{n+1}} \sum_{s \in \{\pm\}^{n+1}} [I(W^s)]^2 \\&= {1\over 2^n} \sum_{t \in \{\pm\}^n} {[I(W^{t+})]^2+[I(W^{t-}]^2 \over 2} \\&= {1\over 2^n} \sum_{t \in \{\pm\}^n} [I(W^t)]^2+[\Delta(W^t)]^2 \tag{${a^2+b^2 \over 2} = ({a+b \over 2})^2+({a-b \over 2})^2$} \\&\ge \nu_n + \theta_n(\delta)\kappa(\delta)^2 \tag{definition of $\theta_n(\delta)$} \end{align}$$

The sequence $\nu_n$ is thus bounded and monotone and consequently convergent; in particular $\nu_{n+1}−\nu_n$ converges to zero. As $\theta_n$ is sandwiched by

$$0 \le \theta_n(\delta) \le {\nu_{n+1} - \nu_n \over \kappa(\delta)^2}$$

between two quantities both convergent to zero, we conclude

$$\lim_{n \to \infty} \theta_n(\delta) = 0, \forall\delta > 0$$

This means that for large enough $n$, the fraction of mediocre channels (i.e., those with symmetric capacities in $(\delta, 1 − \delta)$) vanishes to zero. But by preservation of mutual information, we also know that if we define

$$\alpha_n(\delta) = {1\over 2^n} \sum_{s \in \{\pm\}^n} \Bbb{1}_{\{I(W^s) \ge 1-\delta\}}$$ $$\beta_n(\delta) = {1\over 2^n} \sum_{s \in \{\pm\}^n} \Bbb{1}_{\{I(W^s) \le \delta\}}$$

we automatically have

$$\lim_{n \to \infty} \alpha_n(\delta) = I(W)$$ $$\lim_{n \to \infty} \beta_n(\delta) = 1 - I(W)$$

This M.Alsan and E.Telatar’s proof is much simpler than the martingale convergence theorem used by Arıkan$^{[1]}$.

Reference:

E. Arikan. Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels. IEEE Trans. on Information Theory, vol.55, no.7, pp.3051–3073, July 2009.
A. D. Wyner and J. Ziv. A theorem on the entropy of certain binary sequences and applications (Part I). IEEE Trans.Inform.Theory, vol.19, no.6, pp.769-772, Nov.1973.
Abbas El Gamal and Young-Han Kim. Network Information Theory. Cambridge University Press. 2011.
M.Alsan and E.Telatar. A simple proof of polarization and polarization for non-stationary memoryless channels. IEEE Trans.Info.Theory, vol.62, no.9,pp.4873-4878. 2016.
Vincent Y. F. Tan. EE5139R: Information Theory for Communication Systems:2016/7, Semester 1. https://www.ece.nus.edu.sg/
Eren Sasoglu. Polarization and Polar Codes. Foundations and Trends in Communications and Information Theory Vol. 8, No. 4 (2011) 259–381

A good person! More about the author →