In 1949 Claude Shannon published *Communication theory of secrecy systems*, Bell Systems Technical Journal **28** 656–715. It’s available online here. Bell Labs must have been an amazing place to work: its probably not overkill to compare it to the Institute for Advanced Studies at Princeton. Its members around this time included Richard Hamming, the three inventors of the transistor and Harry Nyquist, of the Nyquist bound. Its technical journal published many seminal papers, including Shannon’s 1948 paper A mathematical theory of communication defining entropy and Hamming’s 1950 paper Error detecting and error correcting codes, defining Hamming distance and essentially inventing the modern `adversarial’ setting for coding theory.

Incidentally, the story goes that Shannon asked Von Neumann what he should call his new measure of information content (or, equivalently, redundancy), and Von Neumann replied

‘*You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one really knows what entropy really is, so in a debate you will always have the advantage.*‘

Like Hamming’s paper, Shannon’s two papers still amply repay reading today. One idea introduced in the 1949 paper is ‘perfect secrecy’.

### Perfect secrecy

Consider a cryptosystem with plaintexts and ciphertexts , with encryption functions parametrised by a set of keys. We assume that every ciphertext is the encryption (under some key) of at least one plaintext.

Suppose we observe a ciphertext : what, if anything do we learn about the corresponding plaintext ? Shannon supposes that there is a probability distribution on the plaintexts, assigning an *a priori* probability to each . He defines the *a posteriori* probability to be the conditional probability of given we observed ; the system then has *perfect secrecy* if, for any *a priori* probability distribution , we have for all , and all .

Shannon’s Theorem 6 states that a necessary and sufficient condition for perfect secrecy is that for all and .

The proof is a short application of Bayes’ Theorem: since , and since we may choose , we have if and only if .

**Corollary.** *In a system with perfect secrecy, . Moreover, if equality holds then every key must be used with equal probability and for each and there exist a unique such that .*

*Proof* Fix a plaintext . We claim that *for each there exists a key such that .* Indeed, if is never an encryption of then, for any choice of *a priori* probabilities that gives some probability to , we have . But, taking any that is chosen with non-zero probability, we have , so

a contradiction. Since has at most different encryptions, the claim implies that . Moreover if equality holds then for every there exists a unique such that . The conclusion of Theorem 6, that is constant for , then implies that each key is equiprobable.

The ‘only if’ direction of Theorem 2.4 in *Cryptography: theory and practice* (3rd edition) by Douglas Stinson, is the corollary above, but, according to my reading of pages 48 and 50, interpreted with a different definition of perfect secrecy, in which the *a priori* distribution is fixed, as part of the cryptosystem. Unfortunately this makes the result false. The diagram below shows a toy cryptosystem with two keys and two plaintexts.

Take and . Then and , so the system has perfect secrecy, no matter what probabilities we assign to the keys.

The error in the proof of Theorem 2.4 comes in the application of Bayes’ Law, where it is implicitly assumed that . This shows that the extra layer of quantification in Shannon’s paper is not a mere technicality. Given the difficulties students have with nested quantifiers, I’m inclined to keep Stinson’s definition, and fix the problem by assuming for each . (To be fair to Stinson, he observes before the theorem that plaintexts such that are never an obstacle to perfect secrecy, so clearly he was aware of the issue. He also assumes, as we have done, that for each , but this is something else.)

Incidentally, there is a subtle English trap in Shannon’s paper: he says, quite correctly, that when , ‘it is possible to obtain perfect secrecy with only this number of keys’. Here `with only’ does not mean the same as ‘only with’.