## How to invent the Hamming code

My course on error correcting codes has now finished. I think most of it went fairly well, but I’m not satisfied with the way I treated linear codes. I think I spent too much time on results requiring only basic linear algebra, and not enough time on the more interesting results that really are part of coding theory—in particular: how to encode and decode with linear codes, and how to construct linear codes with good error correcting properties.

For instance, I spent several lectures defining generator and parity check matrices and using them to show that if $C$ is a linear binary code of length $n$, then $C$ has dimension $m$ if and only if $C$ can be defined by a system of $n-m$ independent linear equations (which may be recorded in a parity check matrix). Next year I plan to just state this result and give examples, and not, as this year, to give a matrix based proof; this proof probably did not appeal to the stronger students—who should be shown the elegant proof using bilinear forms and the rank-nullity theorem—while, predictably, the weaker students just got lost.

Anyway, the purpose of this post is to show that it is possible for someone who is familiar with linear algebra and the first isomorphism theorem, but knows only the basic definitions of coding theory, to discover the rather elegant Hamming code of length 7.

Let $n \in \mathbf{N}$ and let $e(i) \in \mathbf{F}_2^n$ be the row vector of length $n$ which has a $1$ in position $i$ and $0$ in all other positions. Suppose that $C \subseteq \mathbf{F}_2^n$ is a one-error correcting linear binary code. If we transmit the codeword $u \in C$ and a single error occurs in transmission, so that we receive $u + e(i)$ for some $i$, then since $C$ is one-error correcting, $u+e(i)$ must be strictly nearer to $u$ (in the Hamming distance) than to any other codeword. It follows that if $u' \in C$ then $u+e(i) \not= u' + e(j)$, except when $u = u'$ and $i = j$. Since $C$ is linear, $C$ and its cosets $C+e(1), \ldots, C+e(n)$ are all mutually disjoint. (Standard array decoding motivates this observation quite nicely.)

How can we design $C$ so that it has this property? By the result from linear algebra mentioned above, if $C$ has dimension $m$ then there is a $n \times (n-m)$ matrix $H$ such that

$C = \{ u \in \mathbf{F}_2^n : uH = 0 \}.$

Now the critical observation is that the cosets $C + v$ and $C+ v'$ are equal if and only if $vH = v'H$. This can easily be shown directly; alternatively, it follows at once from first isomorphism theorem for the map $v \mapsto vH$. Since $e(i)H$ is the $i$-th row of $H$, it follows that $C$ is one-error correcting if and only the rows of $H$ are distinct and non-zero. (The non-zero condition ensures that $C \neq C+e(i)$ for any $i$.)

There are $2^{n-m}$ binary words of length $n-m$ and $n+1$ cosets of $C$ that $H$ must distinguish. Hence we must have $2^{n-m} \ge n + 1$. The most efficient choice is therefore to take $n=2^r - 1$ and $m = n - r$ where $r \in \mathbf{N}$, and to take for $H$ the $n \times r$-matrix which has all non-zero binary words of length $r$ as its rows. Provided that $r \ge 2$, the resulting code $C$ is a $1$-error correcting code of length $2^r-1$ and dimension $2^r-1-r$.

For example, if $r=2$ then we take

$H = \left( \begin{matrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{matrix} \right)$

to get the code $C \subseteq \mathbf{F}_2^3$ defined by the parity check equations $u_1+u_3 = 0$, $u_2+u_3 = 0$. So in this case $C = \{(0,0,0), (1,1,1)\}$ is the binary repetition code of length $3$.

If $r = 3$ then we expect to get a code of dimension $4$. Taking

$H = \left( \begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 1 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 1\\ 0 & 1 & 1 \\ 1 & 1 & 1 \end{matrix} \right)$

we get the code

$C = \left\{ u \in \mathbf{F}_2^7 : \begin{matrix} u_1 + u_3+u_5 + u_7 = 0 \\ u_2+u_3+u_6+u_7 = 0 \\ u_4+u_5+u_6+u_7 = 0 \end{matrix} \right\}.$

This is the Hamming $[7,4,3]$-code. One advantage of this construction is that it makes clear an easy decoding algorithm: if $v \in \mathbf{F}_2^7$ is received, then decode $v$ as $v+e(i)$ where $vH$ is the $i$-th row of $H$. It is also clear how to generalize the construction to get perfect one-error correcting codes over larger fields.