This term I’m happy to be lecturing error correcting codes, particularly because I’ve now cleared up a misconception I somehow managed to develop in my undergraduate days about the relationship between the Hamming and Shannon theories of coding.

Let be a code (not necessarily linear). For let be the number of positions in which and have different entries; this is the *Hamming distance* between and . Let denote the minimum Hamming distance between two elements of .

When is used as a code in the Hamming setup of nearest neighbour decoding, we can guarantee to correct up to errors, where . (For the balls of radius about different are disjoint if and only if .) If each bit is corrupted with probability then a typical codeword is corrupted in places when it is transmitted. So it might seem that to have any hope of communicating reliably when is near to , we need to be near to . However one of the Plotkin bounds says that if then

so such codes are necessarily rather small. Indeed, if . So the Plotkin bound shows that whenever , it is impossible to reliably communicate using codes given by the Hamming setup, except by using a trivial two-word codebook.

On the other hand, Shannon’s Noisy Coding Theorem says that it is possible to communicate reliably at any rate , where is Shannon's entropy function. (All logs in this post are to base of course.) I won't try to define rate here, but it is certainly the case that if a code can be used to communicate reliably at rate , then . This seems to contradict what was just said.

The resolution to the paradox is that the high-rate codes whose existence is guaranteed by Shannon's Noisy Coding Theorem do *not* satisfy the Hamming criterion of high minimum distance. Hamming’s criterion makes a code proof against an adversary who chooses errors to cause us maximum inconvenience. In the Shannon setup all we need is good performance *on average*. The usual proof of Shannon’s Noisy Coding Theorem uses a randomly chosen code of size about . Such a code will (with high probability) contain many pairs of codewords at distance about ; the critical, and surprising, point is that with *random errors*, it is not particularly likely that applying errors to a codeword will move it significantly closer to any of the other codewords. So on average, we still decode correctly.

It’s convention to draw pictures of Hamming balls as circles on the blackboard. With this mental picture, the `critical point’ above is far from intuitive. Instead one should perhaps imagine as a huge amorphous cloud with movement possible in a bewildering number of directions. (Two steps in the direction return you to where you started, but moving at random, you’ll hardly ever make them!) Given two points and in at distance , and walking distance from , the probability you finish nearer to than is (since in the unit circle on , the angle subtended by the perpendicular bisector of is ). If instead and are at distance in , where is large, then after walking distance from , the probability you finish nearer to than is negligible.