## Correcting many errors

This term I’m happy to be lecturing error correcting codes, particularly because I’ve now cleared up a misconception I somehow managed to develop in my undergraduate days about the relationship between the Hamming and Shannon theories of coding.

Let $C \subseteq \mathbf{F}_2^n$ be a code (not necessarily linear). For $x,y \in \mathbf{F}_2^n$ let $d(x,y)$ be the number of positions in which $x$ and $y$ have different entries; this is the Hamming distance between $x$ and $y$. Let $d(C)$ denote the minimum Hamming distance between two elements of $C$.

When $C$ is used as a code in the Hamming setup of nearest neighbour decoding, we can guarantee to correct up to $e$ errors, where $e = \lfloor (d-1)/2 \rfloor$. (For the balls of radius $r$ about different $x,y \in C$ are disjoint if and only if $2r < d(C)$.) If each bit is corrupted with probability $p$ then a typical codeword is corrupted in $pn$ places when it is transmitted. So to have any hope of communicating reliably when $p$ is near to $1/2$, we need $d(C)$ to be near to $n$. However one of the Plotkin bounds says that if $d > n/2$ then

$|C| \le \frac{2d}{2d-n} \le 2d \le 2n$

so such codes are necessarily rather small. Indeed, $|C| \le 2$ if $d > 3n/4$. So the Plotkin bound shows that whenever $p \ge 3/8$, it is impossible to reliably communicate using codes given by the Hamming setup, except by using a trivial two-word codebook.

On the other hand, Shannon’s Noisy Coding Theorem says that it is possible to communicate reliably at any rate $R < 1 - H(p)$, where $H(p) = -p \log p - (1-p) \log (1-p)$ is Shannon's entropy function. (All logs in this post are to base $2$ of course.) I won't try to define rate here, but it is certainly the case that if a code $C \subseteq \mathbf{F}_2^n$ can be used to communicate reliably at rate $R$, then $|C| \ge 2^{nR}$. This seems to contradict what was just said.

The resolution to the paradox is that the high-rate codes whose existence is guaranteed by Shannon's Noisy Coding Theorem do not satisfy the Hamming criterion of high minimum distance. This is allowable because in the Shannon setup the decoder is not specified, and all we need is good performance on average. These remarks show that this extra freedom makes a really big difference.