Correcting many errors

This term I’m happy to be lecturing error correcting codes, particularly because I’ve now cleared up a misconception I somehow managed to develop in my undergraduate days about the relationship between the Hamming and Shannon theories of coding.

Let $C \subseteq \mathbf{F}_2^n$ be a code (not necessarily linear). For $x,y \in \mathbf{F}_2^n$ let $d(x,y)$ be the number of positions in which $x$ and $y$ have different entries; this is the Hamming distance between $x$ and $y$. Let $d(C)$ denote the minimum Hamming distance between two elements of $C$.

When $C$ is used as a code in the Hamming setup of nearest neighbour decoding, we can guarantee to correct up to $e$ errors, where $e = \lfloor (d-1)/2 \rfloor$. (For the balls of radius $r$ about different $x,y \in C$ are disjoint if and only if $2r < d(C)$.) If each bit is corrupted with probability $p$ then a typical codeword is corrupted in $pn$ places when it is transmitted. So it might seem that to have any hope of communicating reliably when $p$ is near to $1/2$, we need $d(C)$ to be near to $n$. However one of the Plotkin bounds says that if $d > n/2$ then

$|C| \le \frac{2d}{2d-n} \le 2d \le 2n$

so such codes are necessarily rather small. Indeed, $|C| \le 2$ if $d > 3n/4$. So the Plotkin bound shows that whenever $p \ge 3/8$, it is impossible to reliably communicate using codes given by the Hamming setup, except by using a trivial two-word codebook.

On the other hand, Shannon’s Noisy Coding Theorem says that it is possible to communicate reliably at any rate $R < 1 - H(p)$, where $H(p) = -p \log p - (1-p) \log (1-p)$ is Shannon's entropy function. (All logs in this post are to base $2$ of course.) I won't try to define rate here, but it is certainly the case that if a code $C \subseteq \mathbf{F}_2^n$ can be used to communicate reliably at rate $R$, then $|C| \ge 2^{nR}$. This seems to contradict what was just said.

The resolution to the paradox is that the high-rate codes whose existence is guaranteed by Shannon's Noisy Coding Theorem do not satisfy the Hamming criterion of high minimum distance. Hamming’s criterion makes a code proof against an adversary who chooses errors to cause us maximum inconvenience. In the Shannon setup all we need is good performance on average. The usual proof of Shannon’s Noisy Coding Theorem uses a randomly chosen code of size about $2^{nR}$. Such a code will (with high probability) contain many pairs of codewords at distance about $pn$; the critical, and surprising, point is that with random errors, it is not particularly likely that applying $pn$ errors to a codeword will move it significantly closer to any of the other codewords. So on average, we still decode correctly.

It’s convention to draw pictures of Hamming balls as circles on the blackboard. With this mental picture, the `critical point’ above is far from intuitive. Instead one should perhaps imagine $\mathbb{F}_2^n$ as a huge amorphous cloud with movement possible in a bewildering number of directions. (Two steps in the direction return you to where you started, but moving at random, you’ll hardly ever make them!) Given two points $A$ and $B$ in $\mathbb{R}^2$ at distance $1$, and walking distance $1$ from $A$, the probability you finish nearer to $B$ than $A$ is $1/3$ (since in the unit circle on $A$, the angle subtended by the perpendicular bisector of $AB$ is $2\pi/3$). If instead $A$ and $B$ are at distance $pn$ in $\mathbb{F}_2^n$, where $n$ is large, then after walking distance $pn$ from $A$, the probability you finish nearer to $B$ than $A$ is negligible.