Shannon’s 1949 paper

September 20, 2017

In 1949 Claude Shannon published Communication theory of secrecy systems, Bell Systems Technical Journal 28 656–715. The members of Bell Labs around this time included Richard Hamming, the three inventors of the transistor and Harry Nyquist, of the Nyquist bound. Its technical journal published many seminal papers, including Shannon’s 1948 paper A mathematical theory of communication defining entropy and Hamming’s 1950 paper Error detecting and error correcting codes, defining Hamming distance and essentially inventing the modern `adversarial’ setting for coding theory.

Incidentally, the story goes that Shannon asked Von Neumann what he should call his new measure of information content and Von Neumann replied

You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one really knows what entropy really is, so in a debate you will always have the advantage.

Like Hamming’s paper, Shannon’s two papers still amply repay reading today. One idea introduced in the 1949 paper is ‘perfect secrecy’.

Perfect secrecy

Consider a cryptosystem with plaintexts \mathcal{P} and ciphertexts \mathcal{C}, with encryption functions e_k : \mathcal{P} \rightarrow \mathcal{C} parametrised by a set \mathcal{K} of keys.

Suppose we observe a ciphertext y \in \mathcal{C}: what, if anything do we learn about the corresponding plaintext x \in \mathcal{P}? Shannon supposes that there is a probability distribution on the plaintexts, assigning an a priori probability p_x to each x \in \mathcal{P}. He defines the a posteriori probability \mathbb{P}[X = x | Y = y] to be the conditional probability of the plaintext x \in \mathcal{P} given we observed the ciphertext y \in \mathcal{C}); the system then has perfect secrecy if, for any a priori probability distribution p_x, we have \mathbb{P}[X = x | Y = y] = p_x for all x \in \mathcal{P}, and all y \in \mathcal{C}. (This assumes implicitly that \mathbb{P}[Y = y] > 0 for every y \in \mathcal{C}.)

Shannon proves in his Theorem 6 that a necessary and sufficient condition for perfect secrecy is that \mathbb{P}[Y = y|X = x] = \mathbb{P}[Y = y] for all x \in \mathcal{P} and y \in \mathcal{C}.

The proof is a short application of Bayes’ Theorem: since \mathbb{P}[X = x | Y = y] = \mathbb{P}[Y = y|X = x]p_x/\mathbb{P}[Y = y], and since we may choose p_x \not=0 (which is necessary anyway for the conditional probability to be well-defined), we have p_x = \mathbb{P}[X = x | Y = y] if and only if \mathbb{P}[Y = y|X = x] = \mathbb{P}[Y = y].

Corollary. In a system with perfect secrecy, |\mathcal{K}| \ge |\mathcal{C}|. Moreover, if equality holds then every key must be used with equal probability 1/|\mathcal{K}| and for each x \in \mathcal{P} and y \in \mathcal{C} there exist a unique k \in \mathcal{K} such that e_k(x) = y.

Proof Fix a plaintext x \in \mathcal{P}. We claim that for each y \in \mathcal{C} there exists a key k such that e_k(x) = y. Indeed, if y_{\mathrm{bad}} \in \mathcal{C} is never an encryption of x then, for any choice of a priori probabilities that gives some probability to x, we have \mathbb{P}[Y = y_{\mathrm{bad}}|X = x] = 0, so by Shannon’s Theorem 6, \mathbb{P}[Y = y_\mathrm{bad}] = 0. But, by the implicit assumption, there is a non-zero chance of observing y_\mathrm{bad}, a contradiction.

Since x has at most |\mathcal{K}| different encryptions, the claim implies that |\mathcal{K}| \ge |\mathcal{C}|. Moreover if equality holds then for every y \in \mathcal{C} there exists a unique k \in \mathcal{K} such that e_k(x) = y. The conclusion of Theorem 6, that \mathbb{P}[Y = y|X = x] is constant for y \in \mathcal{C}, then implies that each key is equiprobable. \Box

The ‘only if’ direction of Theorem 2.4 in Cryptography: theory and practice (3rd edition) by Douglas Stinson, is the corollary above, but, according to my reading of pages 48 and 50, interpreted with a different definition of perfect secrecy, in which the a priori distribution is fixed, as part of the cryptosystem. Unfortunately this makes the result false. The diagram below shows a toy cryptosystem with two keys and two plaintexts.

Take p_w = 0 and p_x = 1. Then

\mathbb{P}[X = x | Y = y] = \mathbb{P}[X = x | Y = y'] = p_x = 1


\mathbb{P}[X = w | Y = y] = \mathbb{P}[X = w | Y = y'] = p_w = 0,

so the system has perfect secrecy, no matter what probabilities we assign to the keys. (Incidentally, setting p_{x} = 1 and \mathbb{P}[K = k] = 1 gives a cryptosystem where we always send x and observe the ciphertext y; the a posteriori probability of x is therefore the same as the a priori probability, so the system has perfect secrecy. This shows that Shannon’s implicit assumption is not just a technicality required to make the conditional probabilities well-defined.)

The error in the proof of Theorem 2.4 comes in the application of Bayes’ Law, where it is implicitly assumed that p_x \not= 0. This shows that the extra layer of quantification in Shannon’s paper is not a mere technicality. Given the difficulties students have with nested quantifiers, I’m inclined to keep Stinson’s definition, and fix the problem by assuming p_x\not =0 for each x \in \mathcal{P}. (To be fair to Stinson, he observes before the theorem that plaintexts x such that p_x = 0 are never an obstacle to perfect secrecy, so clearly he was aware of the issue. He also assumes, as we have done, that \mathbb{P}[Y = y] \not=0 for each y \in \mathcal{C}, but this is something else.)

Incidentally, there is a subtle English trap in Shannon’s paper: he says, quite correctly, that when |\mathcal{K}| = |\mathcal{C}|, ‘it is possible to obtain perfect secrecy with only this number of keys’. Here `with only’ does not mean the same as ‘only with’.

Example from permutation groups

Given a finite group G acting transitively on a set \Omega we obtain a cryptosystem with plaintexts and ciphertexts \Omega and encryption maps \alpha \mapsto \alpha g indexed by the elements of G. For which probability distributions on G does this cryptoscheme have perfect secrecy?

Let H_\beta denote the point stabiliser of \beta \in \Omega. Since

\mathbb{P}[X = \alpha | Y = \beta] = \mathbb{P}[k \in g_{\alpha \beta} H_\beta],

where g_{\alpha \beta} is any element such that \alpha g_{\alpha\beta}= \beta, the cryptosystem has perfect secrecy if and only if, for each \beta \in \Omega,

\mathbb{P}[K \in g_{\alpha\beta} H_\beta]

is constant as \alpha varies over \Omega. Call this condition (\star).

Lemma 2. Suppose that G has a regular normal subgroup N. Any probability distribution constant on the cosets of N satisfies (\star).

Proof. Since G = N \rtimes H_\beta, the subgroup H_\beta meets each coset of N in a unique element. The same holds for each coset t H_\beta, \ldots, t^{p-1} H_\beta. Therefore

\sum_{k \in g_{\alpha \beta} H_\beta} \mathbb{P}[k] = \sum_{k \in H_\beta} \mathbb{P}[k]

is independent of \alpha. \Box

In the case of the affine cipher, the converse also holds.

Theorem 3. Let p be a prime and let G be the affine group on \mathbb{F}_p with normal translation subgroup N. A probability distribution on G satisfies (\star) if and only if it is constant on each of the cosets of N.

Proof. Let r generate the multiplicative group of integers modulo p. Then G is a semidirect product N \rtimes H where N = \langle t \rangle is the regular subgroup generated by translation by +1 and the point stabiliser H = \langle h \rangle is generated by multiplication by r.

A probability distribution on G can be regarded as an element of the group algebra \mathbb{C}G. The probability distributions satisfying (\star) correspond to those \sum_{k \in G} c_k k \in \mathbb{C}G such that, for each \beta \in \mathbb{F}_p,

\sum_{k \in t^{-\alpha + \beta} H^{t^\beta}} c_k

is constant as \alpha varies over \mathbb{F}_p. Thus if U is the corresponding subspace of \mathbb{C}G, then U is invariant under left-multiplication by N and right-conjugation by N. But since kg = gk^t for k, g \in G, taken together, these conditions are equivalent to invariance under left- and right- multiplication by N. It is therefore natural to ask for the decomposition of \mathbb{C}G as a representation of N \times N, where N \times N acts by

k \cdot (g,g') = g^{-1}k g'.

Since G/N is abelian, (Nh^\beta)t = (Nh^\beta)^t = Nh^\beta for all \beta. Hence \mathbb{C}G = \bigoplus_{\beta = 0}^{p-1} Nh^\beta. The calculation behind this observation is

h^\beta t = h^\beta t h^{-\beta} h^\beta = t^{s^\beta} h^\beta

where s = r^{-1} in \mathbb{F}_p. This shows that h^\beta \in Nh^\beta is stabilised by (t^{-s^\beta}, t) \in N \times N and so Nh^\beta, regarded as a representation of N \times N, is induced from the subgroup of N \times N generated by (t^{-s^\beta}, t). By Frobenius Reciprocity, Nh^\beta decomposes as a direct sum of a trivial representation of N \times N, spanned by

h^\beta + th^\beta + \cdots + t^{p-1}h^\beta,

and a further p-1 non-trivial representations, each with kernel \langle (t^{-s^\beta}, t) \rangle. Critically all (p-1)^2 of these representations are non-isomorphic.

By Lemma 2, U contains all p-1 trivial representations. Writing T for their direct sum, we have \mathbb{C}G = T \oplus C for a unique complement C that decomposes uniquely as a sum of the non-trivial representations in the previous paragraph. Therefore if U properly contains T then U contains a non-trivial summand of some Nh^\beta, spanned by some u \in \mathbb{C}G of the form

b_0 h^\beta + b_1 t h^\beta + \cdots + b_{p-1} t^{p-1} h^\beta .

The support of u is h^\beta N, which meets each point stabiliser in a unique element. Hence all the b_i must be equal, a contradiction. \Box

This should generalize to an arbitrary Frobenius group with abelian kernel, and probably to other permutation groups having a regular normal subgroup.

The key equivocation given a ciphertext

In practice, we might care more about what an observed ciphertext tells us about the key. This question is considered in Part B of Shannon’s paper. Let X, Y and K be random variables representing the plaintext, ciphertext and key, respectively. We assume that X and K are independent. (This is easily motivated: for instance, it holds if the key is chosen before the plaintext, and the plaintext is not a message about the key.) The conditional entropy H(K | Y), defined by

\begin{aligned}H(K{}&{} | Y) \\ &= \sum_{y \in \mathcal{C}} \mathbb{P}[Y=y] H(K | Y = y) \\ &= \sum_{y \in \mathcal{C}}  \mathbb{P}[Y=y] \sum_{k \in \mathcal{K}} \mathbb{P}[K = k | Y = y] \log \frac{1}{\mathbb{P}[Y = y | K = k]} \end{aligned}

represents our expected uncertainty, or `equivocation’ to use Shannon’s term, in the key, after we have observed a ciphertext. (Throughout \log denotes logarithm in base 2.)

If we know the key then we know X if and only if we know Y. Therefore the joint entropies H(K, X) and H(K, Y) agree and

H(K | Y) + H(Y) = H(K,Y) = H(K,X) = H(K) + H(X)

by the independence assumption. Hence

H(K | Y) = H(K) + H(X) - H(Y),

a formula that is now a textbook staple.

Shannon considers a toy cryptosystem with keys K = \{k,k'\} and
\mathcal{P} = \mathcal{C} = \{0,1\}^n, in which a plaintext is encrypted bit-by-bit, as itself if K = k and flipping each bit if K = k'. Suppose that the keys are equiprobable, and that each plaintext bit is 0 independently with probability p, so the probability of a plaintext having exactly s zeros is \binom{n}{s} p^s (1-p)^{n-s}). Denoting this quantity by p_s we have H(X) = -\sum_{s=0}^n p_s \log p_s. The probability that the ciphertext Y has exactly s zeros is (p_s + p_{n-s})/2. Therefore

\begin{aligned} H(Y) &= -\sum_{s=0}^n \frac{p_s + p_{n-s}}{2} \log \frac{p_s + p_{n-s}}{2} \\ &= -\sum_{s=0}^n p_s \log \frac{p_s + p_{n-s}}{2}. \end{aligned}

and so

\begin{aligned} H(K{}&{} |Y) \\ &= 1 - \sum_{s=0}^n p_s \log \frac{p_s}{(p_s+p_{n-s})/2} \\ &= -\sum_{s=0}^n p_s \log \frac{p_s}{p_s + p_{n-s}} \\ &= -\sum_{s=0}^n \binom{n}{s} p^s (1-p)^{n-s} \log \frac{p^s (1-p)^{n-s}}{p^s (1-p)^{n-s} + p^{n-s}(1-p)^s} \\ &= \sum_{s=0}^n \binom{n}{s} p^s (1-p)^{n-s} \log \bigl(1 + p^{n-2s}(1-p)^{2s-n} \bigr) \\ &= \sum_{s=0}^n \binom{n}{s} p^s (1-p)^{n-s} \log \bigl(1 + q^{2s-n} \bigr). \end{aligned}

where q = (1-p)/p.

The graph on page 690 of Shannon’s paper shows H(K | Y) against n for two values of p. He does not attempt to analyse the formula any further, remarking `yet already the formulas are so involved as to be nearly useless’. I’m not convinced this is the case, but the analysis is certainly fiddly.

When p = \frac{1}{2} Shannon’s formula gives H(K | Y) = 1, as expected. We may therefore reduce to the case when p \textgreater 1/2 and so q \textless 1 ). Take n = 2m. The summand for m+u is

\binom{2m}{m+u} p^{m+u}(1-p)^{m-u} \log (1 + q^{2u}).

Since \binom{2m}{m+u}(m-u) = \binom{2m}{m+u+1}(m+u+1), the ratio of the summands for m+u and m+u+1 is

\displaystyle \frac{m+u+1}{m-u} \frac{1-p}{p} \frac{\log (1 + q^{2u})}{\log (1 + q^{2u+2})}.

When u \ge 0 we can use the inequalities

x \ge \log (1+x) \ge x - x^2/2

for x \in [0,1] to bound the ratio below by

\displaystyle \frac{m+u+1}{m-u} \Bigl( \frac{1}{q} -  q^{2u-1}/2 \Bigr).

Therefore the summands in Shannon’s formula get exponentially small, at a rate approximately 1/q > 1 as s increases from n/2. Their contribution can be bounded by summing a geometric series and the same order as the middle term. (This does not require the contribution from the binomial coefficient, which is initially small, but eventually dominates.)

Going the other way, the summands for m-t and m-t-1 is

\displaystyle \frac{m+t+1}{m-t} \frac{p}{1-p} \frac{\log (1 + q^{-2t})}{\log (1 + q^{-2t+2})}

which can be rewritten as

\displaystyle \frac{m+t+1}{m-t} \frac{1}{q} \frac{2t \log \frac{1}{q} + \log (1+q^{2t})}{(2t+2)\log \frac{1}{q} + \log (1+q^{2t+2})}.

It is useful to set p = 1/2 + \rho. If the final fraction were t/(t+1) then the maximum would occur when t/(t+1) = q, i.e. when t = q/(1-q) = (1-p)/(2p-1) = (1/2 -\rho)/\rho. Numerical tests suggest this gives about the right location of the maximum when m large (and so the first fraction can be neglected) and p near to 1/2.
For t of the same order of m the first fraction is dominant and gives exponential decay. It is therefore not so surprising that the middle term gives a reasonable lower bound for the entire series, namely

H(K | Y) \ge \frac{A}{\sqrt{n}} \exp (-2 \rho^2 n )

for some constant A. The graph below shows H(K|Y) and the middle summand for p = 1/2 + 1/20, 3/5, 2/3, 4/5 with colours red, green, blue, black.

It seems possible that the lower bound is, up to a multiplicative constant, also an upper bound. However the argument above will show at most that H(K | Y) \le A'\sqrt{n} \exp(-2 \rho^2 n).

Upper bound

A slightly weaker upper bound follows from tail estimates for binomial probabilities. (Update. There might be a stronger result using the Central Limit Theorem.) Let S be distributed as \mathrm{Bin}(p,2m), so \mathbb{P}[S= s] = \binom{2m}{s}p^s (1-p)^{2m-s}. By Hoeffding’s inequality,

\mathbb{P}[S - 2pm \le -2\epsilon m] \le \mathrm{e}^{-4 \epsilon^2 m}.

The argument by exponential decay shows that the contribution to H(K | Y) from the summands for s > m is of the same order as the middle term. Using \log(1+ q^{2s-n}) \le 1 + (n-2s) \log (1/q) we get

\mathbb{P}[S \le n/2] + 2\log (1/q) \sum_{s=0}^m (m-s) \mathbb{P}[S = s]

as an upper bound for the remaining terms. By a standard trick, related to the formula \mathbb{E}[X] = \mathbb{P}[X \ge 0] + \mathbb{P}[X \ge 1] + \cdots for the expectation of a random variable taking values in \mathbb{N}_0, we have

\sum_{s=0}^m (m-s)\mathbb{P}[S=s] = \sum_{s=0}^{m-1} \mathbb{P}[S\le s].

Take \epsilon = p - 1/2 + \alpha in the version of Hoeffding’s inequality to get

\begin{aligned} \mathbb{P}[S \le m(1-\alpha)] &\le \mathrm{e}^{-4 (p-1/2+\alpha)^2 m} \\ &= \mathrm{e}^{-4 (p-1/2)^2 m} \mathrm{e}^{-8 (p-1/2) \alpha m - 4 \alpha^2 m} \\ &\le \mathrm{e}^{-4 \rho^2 m} \mathrm{e}^{-8 \rho \alpha m}. \end{aligned}

Thus the upper bounds for \mathbb{P}[S \le s] become exponentially smaller as we decrease s from m by as much as \alpha m, for any \alpha > 0. By summing a geometric series, as before, we get

\sum_{s=0}^{m-1} P[S\le s] \le Bm \mathrm{e}^{-4 \rho^2 m}

for some constant B. The neglected contributions are of the order of the middle term, so bounded above by \mathrm{e}^{-4 \rho^2 m}. Therefore

H(K | Y) \le Cm \mathrm{e}^{-4 \rho^2 m}

for some further constant C.

Random ciphers and unicity distance

A selection effect

Imagine a hypothetical world where families have either no children, an only child, or two twins, with probabilities 1/4, 1/2, 1/4. The mean number of children per family is therefore 1. In an attempt to confirm this empirically, a researcher goes to a primary school and asks each child to state his or her number of siblings. Twin-families are half as frequent as only-families, but send two representatives to the school rather than one: these effects cancel, so the researcher observes equal numbers of children reporting 0 and 1 siblings. (Families with no children are, of course, never sampled.) The estimate for the mean number of children is therefore the inflated 3/2.

The random cipher

Shannon’s proof of the Noisy Coding Theorem for the memoryless binary channel is a mathematical gem; his chief insight was that a random binary code of suitable size can (with high probability) be used as part of a coding scheme that achieves the channel capacity. In his 1949 paper he considers the analogous random cipher.

Let \mathcal{A} = \{\mathrm{a}, \ldots, \mathrm{z}\} be the Roman alphabet. Let \mathcal{P}_n = \mathcal{C}_n = \mathcal{A}^n and let X_n and Y_n be the random variables recording the plaintext and ciphertext, respectively. We suppose that the message is chosen uniformly at random from those plaintexts that make good sense in English (once spaces are inserted). In yet another fascinating paper Shannon estimated that the per-character redundancy of English, R say, is between 3.2 and 3.7 bits. Thus H(X_n) = (\log_2 26 - R)n and the number of plausible plaintexts is S_n = 2^{(\log_2 26 - R)n}. Let T_n = |\mathcal{P}_n| = 26^n.

Fix k \in \mathbb{N}. The random cipher with k keys is constructed by choosing, for each ciphertext y \in \mathcal{C}_n, exactly k plaintexts in \mathcal{P}_n to be the decryptions of y under the k keys, which are each chosen with equal probability 1/k. The choice of plaintexts is made uniformly at random from \mathcal{P}_n, so the decryptions need not be distinct. Let g(y) be the number of plausible decryptions of the ciphertext y \in \mathcal{C}_n. Thus g : \mathrm{C}_n \rightarrow \mathbb{N}_0 is a random quantity, where the randomness comes from the choices made in the construction of the random cipher. If Z_n is chosen uniformly at random from \mathcal{C}_n then

\displaystyle \mathbb{P}[g(Z_n) = m] = \binom{k}{m} \bigl( \frac{S_n}{T_n} \bigr)^m \bigl( 1 - \frac{S_n}{T_n} \bigr)^{k-m}

and so g(Z_n) is distributed binomially as \mathrm{Bin}(S_n/T_n, k).

However this is not the distribution of g(Y): since Y is the encryption of a plausible plaintext, ciphertexts y with a high g(y) are more frequent, while, as in the family example, ciphertexts y such that g(y) = 0 are never seen at all.

Lemma 4. \mathbb{P}[g(Y_n) = m] = \displaystyle \frac{T_n m}{S_n k} \mathbb{P}[g(Z_n) = m].

Proof. Let \mathcal{L}_n(y) be the multiset containing the g(y) plausible plaintexts that are decryptions of y \in \mathcal{C}_n. Conditioning on the event that x \in \mathcal{L}_n(y) we have

\begin{aligned} \mathbb{P}[g(Y_n) = m] &= \sum_{y \in \mathcal{C}_n \atop g(y) = m} \mathbb{P}[Y=y] \\ &= \sum_{y \in \mathcal{C}_n \atop g(y) = m} \sum_{x \in \mathcal{P}_n(y)} \mathbb{P}[Y=y|X=x]\mathbb{P}[X=x] \\  &= \sum_{y \in \mathcal{C}_n \atop g(y) = m} \sum_{x \in \mathcal{P}_n(y)} \frac{1}{k} \frac{1}{S_n} \\ &= \sum_{y \in \mathcal{C}_n \atop g(y) = m} \frac{m}{kS_n} \\ &= T_n \mathbb{P}[Z_n = m] \frac{m}{kS_n} \\ &= \frac{T_n m}{S_n k} \mathbb{P}[Z_n = m] \end{aligned}

as required. \Box

Corollary 5. The random variable g(Y_n) is distributed as 1+ \mathrm{Bin}(S_n/T_n, k-1).

Proof. By Lemma 4 and the identity m \binom{k}{m} = k \binom{k-1}{m} we have

\begin{aligned} \mathbb{P}[g(Y_n) = m] &= \frac{T_n m}{S_n k}\binom{k}{m} \bigl( \frac{S_n}{T_n} \bigr)^m \bigl( 1 - \frac{S_n}{T_n} \bigr)^{k-m}\\ &= \binom{k-1}{m-1}\bigl( \frac{S_n}{T_n} \bigr)^{m-1} \bigl( 1 - \frac{S_n}{T_n} \bigr)^{k-m}. \end{aligned}

Hence g(Y_n)-1 is distributed as \mathrm{Bin}(S_n/T_n, k-1). \Box

It feels like there should be a quick direct proof of the corollary, along the lines `we know Y has one plausible decrypt; each of the remaining k-1 is plausible with probability S_n/T_n, hence … ‘. But this seems dangerously close to `we know two fair coin flips gave at least one head; the other flip is a head with probability 1/2, hence …’, which gives the wrong answer. The difference is that the plausible decrypt of Y comes with a known key, whereas the `at least one head’ could be either flip. Given the subtle nature of selection effects and conditional probability, I prefer the calculation in the lemma.

Shannon’s paper replaces the lemma with the comment `The probability of such a cryptogram [our event g(Y) = m] is mT/SK, since it can be produced by m keys from high probability messages [our plausible plaintexts] each with probability T/S.’ I cannot follow this: in particular T/S cannot be a probability, since it is far greater than 1.

Given that Y = y \in \mathcal{C}_n, the entropy in the key is H(K|Y = y) = \log g(y). Therefore, going back to the lemma, we have

\begin{aligned} H(K|Y) &= \sum_{m=1}^k \mathbb{P}[g(Y) = m] \log m  \\ &= \sum_{m=1}^k \frac{T_n}{S_n k} \mathbb{P}[Z_n = m] m \log m.\end{aligned}

Shannon argues that if k is large compared to n then \log m is almost constant for m near the mean kS_n/T_n of Z, and so the expected value can be approximated by

\frac{T_n}{S_n k} \frac{kS_n}{T_n} \log \frac{kS_n}{T_n} = \log k + \log S_n - \log T_n.

Observe that \log S_n - \log T_n = nR, where R is the per-character redundancy of English. Therefore Shannon’s approximation becomes

H(K|Y_n) = \log k - nR.

When n is large compared to k, Shannon uses a Poisson approximation. As an alternative, we argue from Corollary 5. Let Z^- be distributed as \mathrm{Bin}(S_n/T_n, m-1). We have

\begin{aligned} H(K|Y_n) &= \mathbb{E}[\log (1 + Z^-)]  \\ &\approx \log \bigl( 1+\mathbb{E} [Z^-] \bigr) \\ &= \log \bigl( 1 + (k-1)S_n/T_n \bigr) \\ &\approx (k-1)S_n/T_n  \\ &\approx k2^{-nR}.\end{aligned}

The graph of H(K|Y_n) is therefore as sketched below.

The quantity H(K) / R = \log k / R is known as the unicity distance of the cipher. Roughly one expects that, after observing n characters of the ciphertext, the key will be substantially known.


Cipher Systems: possible extras

September 12, 2017

This year I’m lecturing our course Cipher Systems for the first time. The purpose of this post is to record some of the ideas that I hope to at least touch on in the course.

Cryptocurrency (1) RSA Scheme

Of course the Bitcoin blockchain is a splendid advertisement for hash functions. But here I want to give a much simpler example of a toy cryptocurrency in the RSA setup.

TTTT (Totally Trusted Transmission Technology) is going into the cryptocurrency game. In readiness, its RSA modulus N and encryption exponent e are prominently posted on the corporate website. TTTT issues currency units by signing the string ‘TTTT promises faithfully to redeem BitFlop x for £100′, where x is the serial number of the relevant BitFlop. Let f(x) \in \{0,1,\ldots, N-1\} be the number representing this string for BitFlop x. (The function f is public knowledge and is injective.) We say f(x) is an unsigned BitFlop.

In the proposed protocol, a customer wishing to buy a BitFlop sends TTTT a self-addressed envelope stuffed with 20 used fivers. TTTT removes the fivers and inserts a signed Bitflop f(x)^d mod N, neatly typed out on letterhead paper. The serial number x is inscribed in red ink in the company’s ledgers. Alice, as well as anyone she gives the signed BitFlop to, can calculate (f(x)^d)^e = f(x), read the string containing the serial number, and know they have a legitimate BitFlop signed by TTTT. To redeem a BitFlop the postal process is reversed, and x is crossed off from the ledger.

One of the many problems with this scheme is that there is nothing to stop a nefarious Alice (with access to a photocopier) from passing the same signed BitFlop onto both Bob and Charlie. To get round this, TTTT decides to publish its ledger of issued BitFlop serial numbers on the web, reasoning that Bob can then check that he has received an unredeemed BitFlop.

Another drawback is that if Alice buys a BitFlop from TTTT and then gives it to Bob, who then redeems it, TTTT can connect Alice and Bob. (Admittedly not necessarily as parties in the same transaction because of the possibility of a chain Alice to Charlie to Bob.)

Blind signing

To get around the second drawback, TTTT decides to use blind signing.

In the new protocol, if Alice wishes to transfer money to Bob, Bob (not Alice) gets TTTT to send him f(x), representing a `candidate-BitFlop’ with serial number x. TTTT enters x in a new ledger. Bob passes f(x) to Alice, who calculates the product a^e f(x) mod N for some privately chosen a. Alice then sends TTTT this number, along with the usual envelope of used fivers, and receives by return

(a^e f(x))^d = a^{ed}f(x)^d = a f(x)^d \text{ mod } N.

To transfer the BitFlop, Alice divides by a and gives the signed BitFlop f(x)^d to Bob. Bob can check that x appears in the public ledger and later redeem the BitFlop. Over the course of the protocol:

  • Alice learns: a, f(x), f(x)^d
  • Bob learns: f(x), f(x)^d
  • TTTT learns f(x), that Alice has paid for a signature of a^e f(x), that Bob has redeemed the signed BitFlop f(x)^d.

Assumingly (generously enough) that TTTT has many customers, there is no way for TTTT to associate f(x) with a^e f(x)^e, or Alice’s transaction with Bob’s.


  1. What protection does BitFlop have against double spending?
  2. TTTT decides it would be a nice touch to allow customers to choose their preferred serial number and encryption exponent. (TTTT knows the factorization N = pq, so can easily compute the decryption exponent d' for any invertible e'. What could possibly go wrong?
  3. While still depending on the postal system to receive used fivers, TTTT decides that email could be used for all other customer communications. Emails have to be encrypted, but no problem, TTTT already has N and e published on its website. What could possibly go wrong, under (a) the blind-signing scheme, (b) the original scheme?

Answers and discussion

  1. TTTT is protected by its ledger system. Using a private ledger there is nothing to stop Alice passing the same BitFlop to both Bob and Charlie; if Bob redeems the BitFlop first, he gets its full value, while Charlie gets a nasty surprise later. Using a public ledger Bob and Charlie can verify the BitFlop is unspent at the time they accept it, but are then in a race to redeem it. (Switching from post to email doesn’t really help: it just means the race is run faster.)

    This might be acceptable: for instance if Alice is in debt to Bob, and pays Bob by BitFlop, then Bob can simply wait for the envelope of used fivers to arrive before agreeing the debt is settled. Alice cannot settle two debts in this way, while Bob cannot plausibly claim to have had the transaction refused by TTTT when x is removed from the public ledger.

    In other cases, for instance if Alice wishes to purchase a physical object from Bob, some escrow system or degree of trust is needed even for less dubious schemes than BitFlop.

  2. When Alice’s request for an encryption exponent e such that (e, \phi(N)) \not= 1 is politely refused by TTTT, she learns a prime factor of p-1 or q-1. This could be useful in a Pollard \rho attack.
  3. (a) Eve who is snooping on communications between Bob and TTTT, intercepts an encrypted message M^e mod N from Bob to TTTT. Eve sends M^e mod N to TTTT, along with the usual envelope of used fivers. TTTT, believing that M^e mod N is an obfuscated unsigned BitFlop a f(x) mod N, happily sends back (M^e)^d = M to Eve. So for the price of one BitFlop, Eve has obtained the supposedly confidential message M.

    (b) One day Alice asks TTTT to start signing its messages. Clearly authentication is good practice, so TTTT agrees. A typical message from TTTT to Alice, signed using the hash function h, is a pair (M^{e_A} \text{ mod } N_A,h(M)^d \text{ mod } N). Alice notes that any email to TTTT is immediately bounced back with response `Dear Ms. Alice, your custom is important to us. TTTT will reply within four months …’. She carefully crafts a message from a Ms. TYH!ubN(CZ…, such that TTTT’s automatic response M has hash h(M) = f(x) for some serial number x. (Again this is rather easy if f is bijective.) The signed message is then (?, f(x)^d \text{ mod } N), giving Alice (and anyone else snooping) the signed BitFlop with serial number x. Of course this x does not appear in the company’s ledger, so this is most obviously a problem for the first scheme. But even with the public ledger, a malicious Alice can bombard TTTT with emails and so acquire a large number of ‘protoBitFlops’. Publishing each f(x)^d destroys an unledgered BitFlop x, and creates a race to redeem if x is already in the ledger.

The Boomerang Attack

For the moment this is just a link to Wagner’s original paper.

Cryptocurrency (2) Block chain

Linked lists of type Int may be defined in Haskell by Data List = Empty | Cons Int List. For example Cons 2 (Cons 1 Empty) represents the list [2,1]. We define an analogous data type HashList in which each a cons-cell with data x is argumented by a hash value h. At the level of types h could be any Int, but in a well-formed hash list, we will suppose that h is the sum of the hash of x and the hash of the tail of the list. Assume that hashInt :: Int -> Int has already been defined.

   type Hash = Int
   data HashList = Empty | Cons Int Hash HashList

   hash :: HashList -> Hash
   hash Empty = 0
   hash (Cons _ hashValue _) = hashValue

   cons :: Int -> HashList -> HashList
   cons x tail = Cons x h tail 
      where h = hashInt x + hash tail

For example, if (to make a simple illustration) hashInt x = x*x then the hash list with data 3, 2, 1 is

   Cons 3 14 (Cons 2 5 (Cons 1 1 Empty))

the hash values being 1^2 = 1, 2^2 + 1 = 5 and 3^2 + 5 = 14. Note that data values contribute to the hash (albeit at the head only) as soon as they enter a hash list.

As defined above, hash lists provide warning of accidental corruption. But they offer no protection against malicious corruption. Indeed, the function cons allows anyone to create well-formed hash lists having arbitrary values.

In the Bitcoin blockchain, malicious corruption is prevented—or at least, made as hard as inverting a one-way function—by digital signatures. Again I find the Haskell code the clearest way to present the idea. We assume there is a basic public key infrastructure in place, including functions with the types below such that (unsign m) . (sign m) is the identity for each person m.

   type Person = Int

   sign   :: Person -> Int -> Int
   unsign :: Person -> Int -> Int

   type SignedHash = (Person, Int)
   data BlockChain = Empty 
                   | Cons Int SignedHash BlockChain

   signedHash :: BlockChain -> SignedHash
   signedHash Empty = 0
   signedHash (Cons _ signedHash _) = signedHash

   cons :: Person -> Int -> BlockChain -> BlockChain
   cons m x tail = Cons x (m, (sign m h)) tail 
      where h = let (_, h') = signedHash tail
                in  hashInt m + hashInt x + h'

It is a simple exercise to write a function verify :: BlockChain -> Bool that checks a block chain is valid.

    verify Empty = True
    verify (Cons x (m, h) tail) =    
      let (_, h') = signedHash tail
      in  unsign m h ==    hashInt m + hashInt x + h' 
                        && verify tail 

Note that verify only calls the publically known function unsign m. For example, if (thanks to an appalling programming error swapping encryption and decryption functions in Diffie–Hellman), Person 0's signature function turns out to be x \mapsto x^3 (modulo some large RSA modulus) then the block chain with data 3, 2, 1 is

   Cons 3 (0, 134*134*134) (Cons 2 (0, 5*5*5) 
                          (Cons 1 (1, 1*1*1) Empty))

which passes verification. (Person 0 is particularly unfortunate in that his identity enters the hash by adding 0^2.) A well-formed length one block chain Cons x (m, sign m (hash x)) Empty is equivalent to a signed x from Person m.

Using BlockChain one could implement a simple cryptocurrency, along the lines of the 'ScroogeCoin' considered in the recent book Bitcoin and cryptocurrency technologies (draft available online). Here Scrooge is a trusted party who must sign every transaction, and has sole responsibility for the integrity of the currency. The brilliance of Bitcoin lies in how it achieves decentralization by rewarding (with Bitcoins) anyone willing to play the role of an honest Scrooge, while still thwarting double spending.

A minimal working implementation of BlockChain is available here.

Generalizations of Nim

June 18, 2017

Probably everyone knows that the P-positions (previous player has won) in Nim are exactly those (x_1,\ldots,x_r) such that x_1 \oplus \cdots \oplus x_r = 0, where \oplus denotes bitwise XOR. For example, (3,2,1) is a P-position, because 3 \oplus 2 \oplus 1 = 11_2 \oplus 10_2 \oplus 01_2 = 00_2 = 0, whereas one glance at (9,7,5,1) shows it is an N-position (next player wins), because only the pile of size 9 contains 2^3 in binary. Any winning move must take at least 2 counters from this pile: doing so leaves a pile of size 7 = 4+2+1, which is ideal for balancing the contribution 7 \oplus 5 \oplus 1 = 3 from the other piles. Therefore we leave 3 counters, taking 6, reaching the P-position (3,7,5,1).

Every finite impartial game is equal to a nimber. Nim is unusual in that knowing the P-positions immediately determines all its nimbers: (x_1,\ldots,x_r) = m \star if and only if (x_1,\ldots,x_r) + m\star = 0 (disjoint sum of games) if and only if (x_1,\ldots,x_r,m) is a P-position.

Theorem. The nimber of (x_1,\ldots, x_r) is (x_1 \oplus \cdots \oplus x_r)\star.

Proof. By the previous paragraph, it suffices to find a winning move when x_1 \oplus \cdots \oplus x_r = 2^s + M with M < 2^s. By induction, we may assume that all options of (x_1,\ldots,x_r) have their expected nimbers. By reordering piles we may assume without loss of generality that x_1 = 2^{s+1}A + 2^s + B where A \in \mathbb{N}_0 and B < 2^s. We take all but 2^{s+1}A + 2^s-1 counters from the first pile, pause to admire the situation, and then take further counters to leave exactly 2^{s+1}A + (2^{s+1}A \oplus x_2 \oplus \cdots \oplus x_r). The parenthetical quantity is

2^s \oplus B \oplus x_1 \oplus \cdots \oplus x_r = 2^s \oplus B \oplus 2^s \oplus M = B \oplus M < 2^s,

so this is always possible. \Box

A small generalization of this argument finds explicit moves giving options of (x_1,\ldots,x_r) with all nimbers m' \star with m' < x_1 \oplus \cdots \oplus x_r.

Example. Before I learned about Combinatorial Game Theory I got thrashed at Nim by someone whose main tactic was to reduce to the zero position (3,2,1). (Of course my opponent could also win any non-balanced position with two piles.) In memory of this occasion, consider the position (19,13,3,2,1) = 19\star + 13\star = 30 \star. To reach (16 + c) \star with 0 \le c \le 7, we observe that 30 and 16 + c both have 16 in their binary expansion, but only 30 has 8 (present in the pile 13). We mentally cancel out the 16s, leaving (3,13,3,2,1) with target s \star, and play, as a first step, to (3,7,3,2,1). We then calculate that

3 \oplus 3 \oplus 2 \oplus 1 \oplus c = 3 \oplus c.

We must leave 3 \oplus c counters in the half-eaten pile of size 7; since 0 \le c \le 7, this is always possible. For instance, if c = 5 then since 3 \oplus 5 = 6, we leave 6 counters. Since 3 \oplus 6 \oplus 3 \oplus 2 \oplus 1 = 6, on restoring the cancelled subpiles, we reach a position of nimber (19 \oplus 6 \oplus 3 \oplus 2 \oplus 1)\star = 21 \star, as required.

Strong players of games instinctively maximize their mobility. This principle gives a decent strategy for games ranging from Settlers of Catan (a resource acquisition game in which early mobility is essential) and Othello (in which weak players often minimize their mobility, by playing to maximizing their counters in the early game; typically this creates multiple good moves for the opponent) and even a reasonable heuristic for chess, albeit one that has been criticized by Turing. It can be seen at work in the proof and example above, where the half-eaten pile of size

\begin{aligned} 2^{s+1}A + 2^s-1 &= 2^{s+1}A + 1 + 2 + \cdots + 2^{s-1} \\ &= 2^{s+1}A \oplus 1 \oplus 2 \oplus \cdots \oplus 2^{s-1} \end{aligned}

gives us ample options.


Nim boils down to a fight over the nimber x_1 \oplus \cdots \oplus x_r. With the end firmly in mind, let us ask for a generalization in which \oplus is replaced with \oplus_p, the analogue of \oplus defined with addition modulo p. For instance,

12 \oplus_3 3 \oplus_3 2 \oplus_3 1 = 110_3 \oplus_3 010_3 \oplus_3 002_3 \oplus_3 001_3 = 120_3 = 15,

and in the hoped for generalization, (12,3,2,1) will have nimber 15{}\star and (this is just a restatement) options with nimbers 0,\star,\ldots, 14{}\star, but not 15{}\star. Consider the option 9\star. This can be reached only by removing 3 counters from both the piles of sizes 12 and 3. So it seems we have to permit moves in multiple piles. The position (1,1,\ldots,1) with p-1 singleton piles shows that moves in up to p-1 piles may be required. So, as a first try, we make the following definition.

Definition. Naive p-Nim is the impartial game in which each move consists of up to p-1 moves in Nim.

Thus, writing d for Hamming distance, a position x \in \mathbb{N}_0^r in naive p-Nim has y \in \mathbb{N}_0^r as an option if and only if y_i \le x_i for each i and 1 \le d(x,y) \le p-1.

Suppose, inductively, that the options of (x_1,\ldots, x_r) have the required nimbers. Then x has as options all m' \star with m' < m.

Sketch proof. Take s maximal such that p^s appears more often in m than m'. Cancelling as in the example, we may assume that p^s is the greatest power of p appearing in the base p-forms of x_1, \ldots, x_r. So m = ap^s + \cdots and m' = a'p^s + \cdots . Take a-a'-1 subpiles p^s from piles containing p^s in p-ary, and in a remaining pile of size \alpha p^s + B take all but (\alpha-1)p^s + p^s-1 of its counters; then use its subpile of size (p-1) + (p-1)p + \cdots + (p-1)p^{s-1} to reach m' \star. \Box

However, (x_1,\ldots, x_r) may well have further options. By the mex rule, the problem occurs when there is an option (y_1,\ldots,y_r) with y_1 \oplus_p \cdots \oplus_p y_r = x_1 \oplus_p \cdots \oplus_p x_r. For instance, (3,2,1) has the option (3,0,0), which has nimber 3 \star. This destroys the inductive foundations for the sketch proof above. In fact (3,2,1) = 6\star in naive 3-Nim.

The obvious fix is to bar all moves taking counters [c_1,\ldots,c_r] in which c_1 \oplus_p \cdots \oplus_p c_r = 0. (The square brackets are used to distinguish moves from positions.) But still there are too many options: for instance (6,3) should be a P-position with nimber 0 but, according to the current rules, we can move by [4,2] or [5,1] to the P-positions (2,1) and (1,2) (which really do have nimber 0). A computer search finds the following illustrative examples, listed with their intended nimber and the additional moves that must be made illegal:

  • (3,6), 0, \{[2,4],[1,5]\},
  • (3,7), 1, \{[2,7],[1,5]\},
  • (4,8), 0, \{[2,7]\},
  • (5,6), 2, \{[5,4],[4,5]\},
  • (5,7), 0, \{[4,5]\},
  • (6,6), 3, \{[5,1],[4,2],[2,4],[1,5]\}
  • (1,5,7), 1, \{[0,4,5]\},
  • (2,3,7), 0, \{[2,1,0],[2,2,2]],[0,2,7],[0,1,5]\}.

Let v_p(x) denote the highest power of p dividing x \in \mathbb{N}, and let v_p(0) = \infty. Some inspection of these examples may suggest the following definition.

Definition. p-Nim is naive p-Nim barring any move taking counters [c_1,\ldots,c_r] such that

v_p(c_1 \oplus_p \cdots \oplus_p \cdots \oplus_p c_r) > \mathrm{min} \{v_p(c_1), \ldots, v_p(c_r) \}.

By the definition of v_p(0), this generalizes our first attempted rule. Moreover, it bars any move not changing the purported nimber x_1 \oplus_p \cdots \oplus_p x_r of (x_1,\ldots,x_r). The following result is a corollary of Lemmas 3.1 and 3.2 in this paper of Irie.

Theorem [Irie] The nimber of the p-Nim position (x_1,\ldots,x_r) is (x_1 \oplus_p \cdots \oplus_p x_r)\star.

Proof. Let m = x_1 \oplus_p \cdots \oplus_p x_r and let m' < m. Let p^b = \nu_p(m-m') and let p^s be the greatest power of p appearing in m-m' in p-ary. Cancelling as before, we may assume that p^s is the greatest power of p appearing in the base p-forms of x_1, \ldots, x_r. Similarly, we may cancel the powers p^a with a < b between m and m'. So the only powers of p that appear are between p^b and p^s. (This will guarantee that our move satisfies the valuation condition.) Now play as in the sketch proof, replacing p^s - 1 with

p^s - p^b = (p-1)p^b + (p-1)p^{b+1} + \cdots + (p-1)p^{s-1}.

Since m \star is not an option of (x_1,\ldots,x_r), the mex rule implies that the nimber of (x_1,\ldots,x_r) is m \star, as required. \Box

In fact Irie’s result is more general: a surprising effect of the valuation condition is that allowing moves in p or more piles does not create options with new nimbers. This leads to the notion of the p-saturation of a game. The main focus of Irie’s paper is the p-saturation of Welter’s game, which is shown to have a remarkable connection with the representation theory of the symmetric group.

Back to naive p-Nim

It seems that in many cases the nimbers in naive p-Nim are given by ordinary (naive?) addition. For example, this is true for all k-pile positions whenever p > k. When p =3, the exceptions for 3 pile positions (in increasing order) with a small first pile are listed below.

  • (1,x,y): (1,1,1) = 0,
  • (2,x,y): (2,2,2) = 0, (2,2,3) = \star,
  • (3,x,y): (2,2,3) = \star, (3,3,3) = 0, (3,3,4) = 2\star,
  • (4,x,y): (3,3,4) = 2\star, (4,4,4) = 0, (4,4,5) = \star, (4,4,6) = 3\star, (4,5,5) = 3\star.

For instance, the matrix below shows nimbers for (2,x,y) with 0\le x,y \le 4.

\left(\begin{matrix} 2 & 3 & 4 & 5 & 6 \\ 3 & 4 & 5 & 6 & 7 \\ 4 & 5 & 0 & 1 & 8 \\ 5 & 6 & 1 & 8 & 9 \\ 6 & 7 & 8 & 9 & 10 \end{matrix} \right)

Once the pattern (2,x,y) = (2+x+y)\star is established (this happens by (2,4,4)), the mex rule implies that it continues.

Burnside’s method

April 24, 2017

Burnside proved in 1901 that if p is an odd prime then a permutation group containing a regular subgroup isomorphic to C_{p^2} is either imprimitive or 2-transitive. His proof was an early application of character theory to permutation groups. Groups with this property are now called B-groups.

Burnside attempted to generalize his 1901 result in two later papers: in 1911, he claimed a proof that C_{p^n} is a B-group for any prime p and any n \ge 2, and in 1921, he claimed a proof that all abelian groups, except for elementary abelian groups, are B-groups. The first claim is correct, but his proof has a serious gap. This error appears to have been unobserved (or, just possibly, observed but ignored, since the result was soon proved in another way using Schur’s theory of S-rings) until 1994 when it was noted by Peter Neumann, whose explication may be found in his introduction to Burnside’s collected works. In 1995, Knapp extended Burnside’s argument to give a correct proof. Burnside’s second claim is simply false: for example, S_4 \wr S_2 acts primitively on \{1,2,3,4\}^2, and has a regular subgroup isomorphic to C_4 \times C_4. In one of my current projects, I’ve simplified Knapp’s proof and adapted Burnside’s character-theoretic methods to show, more generally, that any cyclic group of composite order is a B-group.

The purpose of this post is to record some proofs omitted for reasons of space from the draft paper. This companion post has some notes on B-groups that may be of more general interest.

Sums over roots of unity

Let \xi be a primitive nth root of unity. Define

R(r) = \{ r, r + p^{n-1}, \ldots, r+ (p-1)p^{n-1} \}

for 0 < r < p^{n-1}. Define a subset Z of \{1,\ldots,p^n-1\} to be null if there exists s \in \mathbb{N}_0 and distinct r_{ij} \in \{1,\ldots, p^{n-1}-1\} for 0 \le i \le p-1 and 1 \le j \le s such that r_{ij} \equiv i mod p for each i and j and

Z = \bigcup_{i=0}^{p-1} \bigcup_{j=1}^s R(r_{ij}).

Proposition 6.2 Let \omega = \zeta^{p^{n-1} c} where c is not divisible by p. Let \mathcal{O} \subseteq \{1,\ldots,p^n-1\}. Then

\sum_{i \in \mathcal{O}} \zeta^i = \sum_{i \in \mathcal{O}} \omega^i

if and only if either

  1. \mathcal{O} is null; or
  2. \mathcal{O} = \{p^{n-1}, \ldots, (p-1)p^{n-1} \} \; \cup \; \bigcup_{i=1}^{p-1} R(r_i) \; \cup \; Z where Z is a null set, the r_i are distinct elements of \{1,\ldots,p^{n-1}-1\}\backslash Z and r_i \equiv i mod p for each i.

Proof. Since the minimum polynomial of \zeta is

1+X^{p^{n-1}} + \cdots + X^{(p-1)p^{n-1}}

we have \sum_{i \in R(r)} \zeta^i = 0. Since \xi^{p^{n-1}} = \omega, we have \sum_{i \in R(r)} \omega^i = p\omega^r. It follows that \sum_{i \in Z} \xi^i = \sum_{i \in Z} \omega^i = 0 if Z is a null set. (For the second equality, note the contributions from the R(r_{ij}) for fixed j combine to give p + p\omega + \cdots + p\omega^{p-1} = 0.) For (2) we have \sum_{i=1}^{p-1} \xi^{i p^{n-1}} = \omega + \cdots + \omega^{p-1} = -1, \sum_{i=1}^{p-1} \omega^{i p^{n-1}} = (p-1) and \sum_{i=1}^{p-1} p \omega^i = - p. This proves the ‘if’ direction.

Conversely, by Lemma 2.1 in the paper, \mathcal{O} \backslash \{p^{n-1},\ldots,(p-1)p^{n-1} \} is a union of some of some of the sets R(r). There exists a unique subset A of \{1,\ldots,p-1\} and unique j_i \in \mathbb{N} for 0 \le i \le p-1 and unique r_{ij} \in \{1,\ldots, p^{n-1}-1\} for 1 \le j \le s and 0 \le i \le j_i such that

\mathcal{O} = \{ p^{n-1} i : i \in A \} \cup \bigcup_{i=0}^{p-1}\bigcup_{j=1}^{i_j} R(r_{ij}).

We have \sum_{i \in \mathcal{O}} \zeta^i = \sum_{i \in A} \omega^i and \sum_{i \in \mathcal{O}} \omega^i = |A| + \sum_{i=0}^{p-1} pj_i \omega^i. Therefore

|A| + \sum_{i=0}^{p-1} (pj_i - [i \in A]) X^i

has \omega as a root. Since this polynomial has degree at most p-1 and the minimal polynomial of \omega is 1+X +\cdots + X^{p-1}, it follows that the coefficients are constant. Hence

|A| + pj_0 = pj_i - [i \in A]

for 1\le i \le p-1. If A = \varnothing then j_0 = j_1 = \ldots = j_{p-1} and \mathcal{O} is null. Otherwise, taking the previous displayed equation mod p we see that A = \{1,\ldots,p-1\}, and, moreover, the j_i are constant for 1 \le i \le p-1. (This holds even if p=2.) Set

s = j_1 = \ldots = j_{p-1}.

We have j_0 = s-1. Hence, s \in \mathbb{N} and choosing in any way r_i \in \{1,\ldots,p^{n-1}-1\} for 1 \le i \le p-1 such that r_i \in \mathcal{O}, we see that \mathcal{O} is the union of \{p^{n-1}, \ldots, (p-1)p^{n-1}\}, the sets R(r_i) for 1 \le i \le p-1 and a null set. \Box.

Ramanujan matrices.

For p prime and n \in \mathbb{N} we define R(p^n) to be the matrix

\left( \begin{matrix} \scriptstyle 1 &  \scriptstyle1 & \scriptstyle 1  &  \scriptstyle\ldots & \scriptstyle 1 &  \scriptstyle1 &  \scriptstyle1 \\  \scriptstyle-1 & \scriptstyle p-1 & \scriptstyle p-1  & \scriptstyle \ldots & \scriptstyle p-1 & \scriptstyle p-1 & \scriptstyle p-1 \\  \scriptstyle0 & \scriptstyle -p & \scriptstyle p(p-1)  & \scriptstyle \ldots &  \scriptstyle p(p-1) & \scriptstyle p(p-1) & \scriptstyle p(p-1) \\  \scriptstyle 0 & \scriptstyle 0 & \scriptstyle -p^2  & \scriptstyle \ldots & \scriptstyle p^2(p-1) & \scriptstyle p^2(p-1) & \scriptstyle p^2(p-1) \\  \scriptstyle\vdots & \scriptstyle \vdots & \scriptstyle \vdots  & \scriptstyle \ddots & \scriptstyle \vdots & \scriptstyle \vdots & \scriptstyle \vdots \\  \scriptstyle 0 & \scriptstyle 0 & \scriptstyle 0 & \scriptstyle \ldots & \scriptstyle -p^{n-2} & \scriptstyle p^{n-2}(p-1) & \scriptstyle  p^{n-2}(p-1) \\  \scriptstyle 0 & \scriptstyle 0 & \scriptstyle 0 & \scriptstyle \ldots & \scriptstyle 0 & \scriptstyle -p^{n-1} & \scriptstyle p^{n-1}(p-1) \end{matrix} \right).

More generally, if d has prime factorization p_1^{a_1} \ldots p_s^{a_s}, we define R(d) = R(p_1^{a_1}) \otimes \cdots \otimes R(p_s^{a_s}). The rows and columns of R(d) are labelled by the divisors D of 2^m p^n, as indicated below for the case d = 2^3 p, with p and odd prime.

R(2^3 p) = \begin{matrix} 1 \\ 2 \\ 2^2 \\ 2^3 \\ p \\ 2p \\ 2^2p \\ 2^3p \end{matrix} \left( \begin{array}{cccc|cccc}  1 & 1 & 1 & 1  & 1 & 1 & 1 & 1 \\ -1 & 1 & 1 & 1 & -1 & 1 & 1 & 1 \\ 0  & -2  & 2 &2 & 0 & -2 & 2  & 2 \\ 0 & 0 & -2^2 & 2 & 0 & 0 & -2^2 & 2^2\\ \hline -1 & -1 & -1 & -1  & 1 & 1 & 1 & 1 \\ 1 & -1 & -1 & -1 & -1 & 1 & 1 & 1 \\ 0  & 2  & -2 & -2 & 0 & -2 & 2 & 2 \\ 0 & 0 & 2^2 & -2^2 & 0 & 0 & -2^2 & 2^2 \end{array} \right).

Say that a partition of D \backslash \{ d\} is coprime if the highest common factor of the numbers in each part is 1. The aim of the game is to find a non-empty set of rows, X say, of R(d) such that the subsets of columns (excluding column d) on which the sum of the rows in X are equal form a coprime partition of the divisors.

There is an application to B-groups only when d is even, in which case we may assume that 2 \in X. Proposition 6.7 in the paper states that in this case, the only way to win this game when d = 2^n, d = 2^n p or d = 2 p^n, where p is an odd prime, is to put every single row in X. This implies that C_{2^m p^n} is a B-group when n \le 1 or m \le 1. However the result on the game may well hold more generally.

Constant sums for p^n

The only coprime partition of D \backslash \{p^n\} has a singleton part, so the row sums are all equal. By adding p^{e-1} to each entry in row p^e for e \in \{1,\ldots, n\}, we obtain the matrix below.

\left( \begin{matrix} \scriptstyle 1 &  \scriptstyle1 & \scriptstyle 1  &  \scriptstyle\ldots & \scriptstyle 1 &  \scriptstyle1 &  \scriptstyle1 \\  \scriptstyle 0 & \scriptstyle p & \scriptstyle p  & \scriptstyle \ldots & \scriptstyle p & \scriptstyle p & \scriptstyle p \\  \scriptstyle p & \scriptstyle 0 & \scriptstyle p^2  & \scriptstyle \ldots &  \scriptstyle p^2 & \scriptstyle p^2 & \scriptstyle p^2  \\  \scriptstyle p^2 & \scriptstyle p^2 & \scriptstyle 0  & \scriptstyle \ldots & \scriptstyle p^3  & \scriptstyle p^3  & \scriptstyle p^3 \\  \scriptstyle\vdots & \scriptstyle \vdots & \scriptstyle \vdots  & \scriptstyle \ddots & \scriptstyle \vdots & \scriptstyle \vdots & \scriptstyle \vdots \\  \scriptstyle p^{n-2} & \scriptstyle p^{n-2} & \scriptstyle p^{n-2} & \scriptstyle \ldots & \scriptstyle 0 & \scriptstyle p^{n-1} & \scriptstyle  p^{n-1} \\  \scriptstyle p^{n-1} & \scriptstyle p^{n-1} & \scriptstyle p^{n-1} & \scriptstyle \ldots & \scriptstyle p^{n-1} & \scriptstyle 0 & \scriptstyle p^n \end{matrix} \right)

which still has constant sums over the rows in X. Let x_i = 1 if p^i \in X and let x_i = 0 otherwise. Writing numbers in base p, the row sums over X are, for each column,

\begin{aligned} 1 \quad &\quad x_nx_{n-1} \ldots x_3x_2x_0 \\ p \quad &\quad  x_n x_{n-1} \ldots x_3x_1x_0 \\ p^2\quad &\quad  x_n x_{n-1} \ldots x_2x_1x_0 \\ \vdots \; \quad &\quad  \qquad \vdots \\ p^{n-2} \quad &\quad  x_nx_{n-2} \ldots x_2x_1x_0 \\ p^{n-1} \quad &\quad  x_{n-1}x_{n-2} \ldots x_2x_1x_2. \end{aligned}

(Note in each case there are n digits, the most significant corresponding to p^{n-1}.) The numbers in the right column are equal, hence all the x_i are equal and X = D, as required. Taking p=2 this gives an alternative proof of Proposition 6.7(i).

Proofs of further claims on the Ramanujan matrices

Let R(d)^{\circ} denote R(d) rotated by a half turn. The following result was stated without proof in the paper; it implies that each R(d) is invertible, with inverse \frac{1}{d}R(d)^\circ. (Update: I later found a much better proof, now outlined in the paper, using a lower-upper decomposition of R(d).)

Proposition. We have R(p^n)^{-1} = \frac{1}{p^n} R(p^n)^\circ and \det R(p^n) = p^{n(n+1)}/2.

Proof. Since R(p^n)^\circ_{p^sp^t} = R(p^n)_{p^{n-s}p^{n-t}}, we have R(p^n)^\circ_{p^np^t} = 1 for all t \in \{0,\ldots, n\} and

\begin{aligned} R(p^n)^\circ_{p^sp^t} &= \begin{cases} (p-1)p^{n-s-1} & n-t \ge n-s  \\ -p^{n-s-1} & n-t=n-s-1 \\ 0 & n-t \le n-s-2 \end{cases} \\ &= \begin{cases} (p-1)p^{n-s-1} & s \ge t \\ -p^{n-s-1} & s = t-1 \\ 0 & s \le t-2 \end{cases} \end{aligned}

for s \in \{0,\ldots, n-1\} and t \in \{0,\ldots,n\}. We use this to show that

\sum_{c=0}^n R(p^n)_{p^rp^c} R(p^n)^{\circ}_{p^c p^{r'}} = p^n [r=r']

for r,r' \in \{0,\ldots, n\}. When r=0 the first term in each product is 1, so the left-hand side is the column sum of the n-r'-th of R(p^n); this is p^n if r'=0 and 0 otherwise, as required. Now suppose that r \ge 1. Since R(p^n)_{p^rp^c} vanishes when c \le r-2, the left-hand side is

-p^{r-1}R(p^n)^\circ_{p^{r-1}p^{r'}} + \sum_{c=r}^n p^{r-1}(p-1) R(p^n)_{p^cp^{r'}}.

Take out a factor p^{r-1} to define L. Substitute for R(p^n)^\circ_{p^cp^{r'}}, and split off the summand from R(p^n)^\circ_{p^np^{r'}} = 1 to get

\begin{aligned} L &= \begin{cases} (p-1)p^{n-r} & r' \le r-1 \\ -p^{n-r} & r' = r \\ 0 & r' \ge r+1 \end{cases} \\ & \qquad + (p-1) \sum_{c=r}^{n-1} (p-1) \begin{cases} (p-1)p^{n-c-1} & c \ge r' \\ -p^{n-c-1} & c = r'-1 \\ 0 & c \le r'-2 \end{cases} + p-1.\end{aligned}

We must now consider three cases. When r=r' we get

\begin{aligned} L &= p^{n-r} + (p-1)\sum_{c=r}^{n-1} (p-1)p^{n-c-1} + p-1 \\ &= p^{n-r} + (p-1)(p^{n-r}-1) + p-1 \\ &= p^{n-r+1}  \end{aligned}

as required. When r \ge r'+1 we have c > r' in all summands so

\begin{aligned} L &= -(p-1)p^{n-r} + (p-1)\sum_{c=r}^{n-1} (p-1)p^{n-c-1} + p^{r-1}(p-1) \\ &= (p-1) \bigl( -p^{n-r} + (p^{n-r}-1) + 1 \bigr)  \\ &= 0. \end{aligned}

When r \le r'-1 the first non-zero summand occurs for c=r'-1 so we have

\begin{aligned} L  &= - (p-1)p^{n-r'} + (p-1) \sum_{c=r'}^{n-1} (p-1)p^{n-c-1} + p^{r-1}(p-1) \\ &= (p-1) \bigl( -p^{n-r'} + (p^{n-r'}-1) + 1 \bigr) \\ &=0. \end{aligned}

Now taking determinants, using that \det R(p^n)^\circ = \det R(p^n) (the matrices are conjugate by the matrix with 1s on its antidiagonal and 0s elsewhere), we get 1 / \det R(p^n) = (p^n)^{n+1} \det R(p^n). Hence \det R(p^n) = p^{n(n+1)/2}, as required.

Jordan block matrices.

Let J be the m \times m unipotent upper-triangular Jordan block matrix over \mathbb{F}_p. We have (J-I)^m = 0, hence (J-I)^{p^r} = 0 whenever p^r \ge m. On the other hand if k \le m-1 then J^k has a 1 in position (1,k). Therefore J^{p^r} = I if and only if p^r \ge m. We also need a result on the relative trace matrix I + J + \cdots + J^{p^{r}-1}. Note that \binom{p}{k} is divisible by p for k \in \{1,\ldots, p-1\}. (For instance, a p-cycle acts freely on the set of k-subsets of \{1,\ldots, p\}.) An easy inductive argument using \binom{p-1}{k-1} + \binom{p-1}{k} = \binom{p}{k} shows that \binom{p-1}{k} \equiv (-1)^k mod p for all k. Now Lucas’ Theorem implies that \binom{p^r-1}{k} \equiv (-1)^k mod p. Hence

(J-I)^{p^r-1} = I + J + \cdots + J^{p^r-1}.

It follows that the right-hand side is 0 if and only if p^r > m.

Model characters for wreath products with symmetric groups

April 24, 2017

Let G be a finite group. The model character for G is \sum_{\chi \in \mathrm{Irr}(G)} \chi. A nice short paper by Inglis, Richardson and Saxl gives a self-contained inductive proof that if \pi_{2r} is the permutation character of \mathrm{Sym}_{2r} acting by conjugacy on its set of fixed-point-free involutions then

\sum_r (\pi_{2r} \times \mathrm{sgn}_{n-2r}) \bigl\uparrow^{\mathrm{Sym}_n}

is the model character for \mathrm{Sym}_n. Assuming Pieri’s rule, that if \alpha is a partition of r then \chi^\alpha \times \mathrm{sgn}_{\mathrm{Sym}_t} \uparrow^{\mathrm{Sym}_{r+t}} = \sum \chi^\lambda, where the sum is over all partitions \lambda obtained from \alpha by adding t boxes, no two in the same row, this follows from the well-known fact (proved inductively in the paper) that \pi_{2r} = \sum_{\gamma \in \mathrm{Par}(r)} \chi^{2\gamma}.

Note that \pi_{2r} \times \mathrm{sgn}_{n-2r}\bigl\uparrow^{\mathrm{Sym}_n} is the induction of a linear character from the centralizer of the involution (1,2)\ldots (2r-1,2r) \in \mathrm{Sym}_n. (When r=0 we count the identity as an involution as an honorary involution.) Up to conjugacy, each involution is used exactly once to define the model character.

In an interesting paper, Baddeley generalizes the Inglis–Richardson–Saxl result to a larger class of groups. He makes the following definition.

Definition. An involution model for a finite group G is a collection \{ (\tau_1, \rho_1), \ldots, (\tau_c, \rho_c) \} such that \{\tau_1,\ldots, \tau_c\} is a set of conjugacy-class representatives for the involutions of G and \rho_i : C_G(u_i) \rightarrow \mathbb{C} is a linear character for each i \in \{1,\ldots, c\}, chosen so that

\sum_{\chi \in \mathrm{Irr}(G)} \chi = \sum_{i=1}^c \rho_i \bigl\uparrow_{\mathrm{Cent}_G(\tau_i)}^G

For example, if an abelian group A has an involution model then, since each centralizer is A itself, comparing degrees shows that c = |A|, and so A is an elementary abelian 2-group. Conversely, any such group clearly has an involution model. By the Frobenius–Schur count of involutions, a necessary condition for a group to have an involution model is that all its irreducible representations are defined over the reals.

Baddeley’s main theorem is as follows.

Theorem. [Baddeley] If a finite group H has an involution model then so does H \wr \mathrm{Sym}_n.

The aim of this post is to sketch my version of Baddeley’s proof of his theorem in the special case when H = C_2. Some familiarity with the theory of conjugacy classes and representations of wreath products in Chapter 4 of James & Kerber, Representation theory of the symmetric group is assumed. The characters \rho_i defined below differ from Baddeley’s by a factor of \mathrm{Inf}_{\mathrm{Sym}_n}^{H \wr \mathrm{Sym}_n} \mathrm{sgn}_{\mathrm{Sym}_n}; this is done to make \phi_r (defined below) a permutation character, in analogy with the Inglis–Richardson–Saxl character \pi_r.

Aside: the Hyperoctahedral group

The group of all n \times n matrices with entries \pm 1 that become permutation matrices when all -1 entries are changed to 1 is isomorphic to H \wr S_n. Thus C_2 \wr \mathrm{Sym}_n is the hyperoctahedral group of symmetries of the n-hypercube. It is a nice exercise to identify \mathrm{Sym}_4, the rotational symmetry group of the cube, as an explicit index 2 subgroup of H \wr \mathrm{Sym}_3.


From now on let H = \langle h \rangle \cong C_2. The group \mathrm{Sym}_n acts on H^{\times n} by

(b_1,\ldots,b_n)^\sigma = (b_{1\sigma^{-1}},\ldots,b_{n\sigma^{-1}}).

This is a place permutation: the element b_i, in position i on the left-hand side, occupies position b_{i\sigma} on the right-hand side. Let G = H^{\times n} \rtimes \mathrm{Sym}_n \cong H \wr \mathrm{Sym}_n.

We write elements of G as (b_1,\ldots,b_n;t) where each b_i \in \{1,h\}.

Imprimitive action of G

For each i \in \{1,\ldots, n\} introduce a formal symbol \overline{i}. (This could be thought of as -i, but I find that bar makes for a more convenient notation.) Let \Omega = \{1,  \ldots, n, \overline{1}, \ldots, \overline{n} \}. Given \sigma \in \mathrm{Sym}_{\{1,\ldots,n\}}, we define \overline{\sigma} \in \mathrm{Sym}_{\{\overline{1},\ldots,\overline{n}\}} by i \overline{\sigma} = i for all i \in \{1,\ldots, n\} and \overline{i} \overline{\sigma} = \overline{i\sigma} for all \overline{i} \in \{\overline{1},\ldots,\overline{n} \}. Then G is isomorphic to the subgroup G_n \le \mathrm{Sym}_\Omega defined by G_n = B_n \rtimes T_n where

B_n =  \langle (1, \overline{1}), \ldots, (n, \overline{n}) \rangle


T_n = \langle \sigma \overline{\sigma} : \sigma \in \mathrm{Sym}_n \rangle.

So G_n acts imprimitively on \Omega with blocks \{1, \overline{1}\}, \ldots, \{n,\overline{n}\}.

Irreducible representations of G

Let \epsilon : \{1,h\} \rightarrow \{\pm 1\} be the faithful character of H \cong C_2 and let \widetilde{\epsilon}^{\times s} denote the linear character of H \wr \mathrm{Sym}_s on which each of the s factors of H in the base group factor acts as \epsilon. Given a bipartition (\lambda|\mu) \in \mathrm{BPar}(n) with \lambda \in \mathrm{Par}(t) and \mu \in \mathrm{Par}(n-t), we define

\chi^{(\lambda|\mu)} = \bigl(  \mathrm{Inf}_{\mathrm{Sym}_t}^{H \wr \mathrm{Sym}_t} \chi^\lambda \times \widetilde{\epsilon}^{\times (n-t)} \mathrm{Inf}_{\mathrm{Sym}_{n-t}}^{H \wr \mathrm{Sym}_{n-t}} \chi^\mu \bigr) \bigl\uparrow^{H \wr \mathrm{Sym}_n}.

Basic Clifford theory shows that the characters \chi^{(\lambda|\mu)} for (\lambda|\mu) \in \mathrm{BPar}(n) form a complete irredundant set of irreducible characters of G. For example, the n-dimensional representation of G as the symmetry group of the n-hypercube has character labelled by the bipartition \bigl((n-1)|(1)\bigr).

Conjugacy classes of involutions in G

Since G_n/B_n \cong T_n \cong \mathrm{Sym}_n, any involution in G is of the form (b_1,\ldots,b_n ; \sigma) where \sigma \in \mathrm{Sym}_n is an involution. Moreover, as the calculation (h,1;(1,2))^2 = (h,h) suggests, the place permutation action of \sigma on (b_1,\ldots,b_n) permutes amongst themselves the indices i such that b_i = h. By applying a suitable place permutation we may assume that b_1,\ldots b_{n-s} = 1 and b_{n-s+1} = \ldots = b_n = h, for some s \in \{0,\ldots, n\}. Now using that (h,h ; (1,2)) is conjugate, by (h,1), to (1,1;(1,2)), we see that a set of conjugacy class representatives for the involutions in G is

\{ (\stackrel{n-s}{\overbrace{1, \ldots, 1}} , \stackrel{s}{\overbrace{h, \ldots, h}} ; (1,2) \ldots (2r-1,2r) \}

for r and s such that 2r+s \le n. The generalized cycle-type invariant defined in James–Kerber can be used to show no two of these representatives are conjugate.

Centralizers of involutions in G

As an element of G_n, the involution defined above is

\tau_s^{(r)} = (n-s+1,\overline{n-s+1}) \ldots (n,\overline{n}) \theta_r\overline{\theta_r}

where, by definition, \theta_r = (1,2)\ldots (2r-1,2r) \in T_{2r} \le \mathrm{Sym}_r. If (b_1,\ldots,b_n;\sigma) commutes with \tau_s^{(r)} then, passing to the quotient, \sigma commutes with \theta_r. Therefore the non-singleton orbits

\{1,2\}, \{\overline{1},\overline{2}\}, \ldots, \{2r-1,2r\}, \{\overline{2r-1},\overline{2r}\}

of \tau_s^{(r)} are permuted by (b_1,\ldots,b_n;\sigma), as are the remaining non-singleton orbits \{n-s+1, \overline{n-s+1}\}, \ldots, \{n,\overline{n} \}.


\mathrm{Cent}_{G_n}(\tau_s^{(r)}) = \mathrm{Cent}_{G_{2r}}(\theta_r\overline{\theta_r}) \times G_{n-(2r+s)} \times G_s

where G_{n-(2r+s)} acts on \{2r+1, \ldots, n-s, \overline{2r+1}, \ldots, \overline{n-s}\} and G_s acts on \{n-s+1,\ldots,n,\overline{n-s+1},\ldots,\overline{n}\}. Clearly (b_1,b_2,\ldots, b_{2r-1},b_{2r}) \in G_r commutes with

(1,2)\ldots (2r-1,2r) (\overline{1},\overline{2})\ldots(\overline{2r-1},\overline{2r})

if and only if b_1 = b_2, \ldots, b_{2r-1} = b_{2r}. Therefore the first factor is permutation isomorphic to

D_r \rtimes \mathrm{Cent}_{T_{2r}}(\theta_r\overline{\theta_r})


D_r = \{(b_1,b_1,\ldots,b_r,b_r) : b_1, \ldots, b_r \in H \}.

Set E_r = \mathrm{Cent}_{T_{2r}}(\theta_r\overline{\theta_r}). Note that E_r is permutation isomorphic to C_2 \wr \mathrm{Sym}_r, acting with one orbit on \{1,\ldots,r\} and another on \{\overline{1},\ldots,\overline{r}\}. (One has to get used to the two different ways in which the group C_2 \wr \mathrm{Sym}_r arises; in this post I’ve used H when the C_2 comes from the base group.)

For example, if n=7 then

\tau_2^{(2)} = (1,2)(\overline{1},\overline{2})(3,4)(\overline{3},\overline{4})(6,\overline{6})(7,\overline{7}) \in G_7

and the centralizer of \tau_2^{(2)} is generated by (1, \overline{1})(2, \overline{2}), (3,\overline{3})(4,\overline{4}), (5, \overline{5}), (6,\overline{6}), (7,\overline{7}) in the base group B_7 and (1,2)(\overline{1},\overline{2}), (1,3)(2,4)(\overline{1},\overline{3})(\overline{2},\overline{4}) and (6,7)(\overline{6},\overline{7}) in the top group T_7. The first two top group generators generate E_2.

Definition of the linear representations and reduction

The second and third factors of the centralizer are both complete wreath product, so, by analogy with the Inglis–Richardson–Saxl paper, it is natural guess to define \rho_s^{(r)}: \mathrm{Cent}_{G_n}(\tau_s^{(r)}) \rightarrow \mathbb{C} so that \rho_s^{(r)} restricts to:

  • The trivial character on D_r \rtimes \mathrm{Cent}_{T_{2r}}(\theta_r\overline{\theta_r}) = D_r \rtimes E_r;
  • \mathrm{Inf}_{\mathrm{Sym}_{n-(2r+s)}}^{H \wr \mathrm{Sym}_{n-(2r+s)}} \mathrm{sgn}_{\mathrm{Sym}_{n-(2r+s)}} on G_{n-(2r+s)};
  • \widetilde{\epsilon}^{\times s} \mathrm{Inf}_{\mathrm{Sym}_s}^{H \wr \mathrm{Sym}_s} \mathrm{sgn}_{\mathrm{Sym}_s} on G_s.

That is (omitting the details of the inflations for brevity),

\rho_s^{(r)} = 1_{D_r \rtimes E_r} \times \mathrm{Inf} \;\mathrm{sgn}_{\mathrm{Sym}_{n-(2r+s)}} \times \widetilde{\epsilon}^{\times s} \mathrm{Inf}\; \mathrm{sgn}_{\mathrm{Sym}_s}.


\phi_r = 1_{D_r \rtimes E_r}\bigl\uparrow^{G_{2r}}.

Since \widetilde{\epsilon}^{\times 2r} restricts to the trivial character of D_r \rtimes E_r, we have \phi_r = \widetilde{\epsilon}^{\times r}\phi_r. The definition of \rho_s^{(r)} above is therefore symmetric with respect to 1_H and \epsilon. Moreover, if

\phi_r = \sum_{(\alpha|\beta) \in \mathrm{BPar}(2r)} m_{\alpha\beta} \chi^{(\alpha|\beta)}

then, by Pieri’s rule for the hyperoctahedral group (this follows from Pieri’s rule for the symmetric group in the same way as the hyperoctahedral branching rule follows from the branching rule for the symmetric group — for the latter see Lemma 4.2 in this paper),

\rho_s^{(r)} \bigl\uparrow^{G_n} = \sum_{(\alpha|\beta) \in \mathrm{BPar}(2r)} \sum_{(\lambda|\mu)} m_{\alpha\beta} \chi^{(\lambda|\mu)}

where the second sum is over all bipartitions (\lambda|\mu) of n such that \lambda is obtained by adding n-(2r+s) boxes, no two in the same row, to \alpha and \mu is obtained by adding s boxes, again no two in the same row, to \beta. Therefore Baddeley’s theorem holds if and only if \phi_r is multiplicity-free, with precisely the right constituents for the Pieri inductions as s varies to give us every character of H \wr \mathrm{Sym}_n exactly once. This is the content of the following proposition.

Proposition. \phi_r = \sum_{(\gamma|\delta) \in \mathrm{BPar}(r)} \chi^{(2\gamma|2\delta)}

Proof of the proposition

To avoid some messy notation I offer a ‘proof by example’. I believe it shows all the essential ideas of the general case.

Proof by example. Take r=3. We have

D_3 = \langle (1,\overline{1})(2, \overline{2}), (3,\overline{3})(4,\overline{4}), (5, \overline{5})(6, \overline{6}) \rangle \le B

and so \phi_3 is induced from the trivial character of

\mathrm{Cent}_{G_3}(\tau_3^{(0)}) = D_3 \rtimes E_3

(Recall that \tau_3^{(0)} = \theta_3 \overline{\theta_3} where \theta_3 = (1,2)(3,4)(5,6) and E_3 = \mathrm{Cent}_{T_6}(\theta_3\overline{\theta_3}).) To apply Clifford theory, it would be much more convenient if we induced from a subgroup of G_6 = B_6 \rtimes T_6 containing the full base group B_6. We arrange this by first inducing up to B_6 \rtimes E_3. (For the action of E_3, it is best to think of B_6 as (H \times H)^{\times 3}.) The calculation

1_{D_1}\bigl\uparrow^{G_2} = \widetilde{1_H}^{\times 2} + \widetilde{\epsilon}^{\times 2}

shows that, on restriction to B_6, the induced character

1_{D_3 \rtimes E_3} \bigl\uparrow^{B_6 \rtimes E_3}

is the sum of all products \psi_1 \times \psi_2 \times \psi_3 where each \psi_i is one of the irreducible characters 1_{H \wr C_2} = \widetilde{1_H}^{\times 2} or \eta_{H \wr C_2} = \widetilde{\epsilon}^{\times 2} on the right-hand side above. The centralizer \mathrm{Cent}_{T_6}(\theta_3 \overline{\theta_r}) acts transitively on the 3 factors: glueing together the products in the same orbits into induced characters we get that 1_{D_3 \rtimes E_3} \uparrow^{B_6 \rtimes E_3} has the following irreducible constituents:

  • \widetilde{1_{H \wr C_2}}^{\times 3} 1_{S_3}
  • \bigl( \widetilde{1_{H \wr C_2}}^{\times 2} 1_{S_2} \times \eta_{H \wr C_2}  \bigr) \bigl\uparrow_{(B_4 \rtimes E_2) \times (B_2 \rtimes E_1)}^{B_6 \rtimes E_3}
  • 1_{H \wr C_2} \times \widetilde{\eta_{H \wr C_2}}^{\times 2} 1_{S_2} \bigl\uparrow_{(B_2 \rtimes E_1) \times (B_4 \rtimes E_2)}^{B_6 \rtimes E_3}
  • \widetilde{\eta_{H \wr C_2}}^{\times 3} 1_{S_3}.

Note that the ’tilde-construction’ enters in two ways: once to combine characters of each two H-factors in the same orbit of \tau_r, and then again to combine the characters obtained in this way. As a small check, observe that the sum of degrees is 1 + 3 + 3 + 1 = 8, which is the index of D_3 \rtimes E_3 in B_6 \rtimes E_3.

Reflecting the isomorphisms

(H \wr C_2) \wr S_3 \cong H \wr (C_2 \wr S_3) \cong H \wr E_3 \cong B_6 \rtimes E_3

we rewrite these characters as follows:

  • \widetilde{1_H}^{\times 6} 1_{E_3}
  • \widetilde{1_H}^{\times 4} 1_{E_2} \times \widetilde{\epsilon}^{\times 2} 1_{E_1} \uparrow^{B_6 \rtimes E_3}
  • \widetilde{1_H}^{\times 2} 1_{E_1} \times \widetilde{\epsilon}^{\times 4} 1_{E_2} \uparrow^{B_6 \rtimes E_3}
  • \widetilde{\epsilon}^{\times 6} 1_{E_3}.

It is now routine to induce ‘in the top group’ up to

G_6 = C_2 \wr S_6 = B_6 \rtimes T_6

using the decomposition of \pi_r into characters labelled by even partitions. For the second summand we use transitivity of induction, starting at the subgroup (B_4 \rtimes E_2) \times (B_2 \rtimes E_1) and going via (B_4 \rtimes T_4) \times (B_2 \rtimes T_2). The third summand is dealt with similarly. Thus \phi_3 is the sum of the \chi^{(\lambda|\mu)} for the following bipartitions (\lambda|\mu):

  • \bigl((6)|\varnothing\bigr), \bigl((4,2)|\varnothing\bigr), \bigl((2,2,2)|\varnothing\bigr)
  • \bigl((4)|(2)\bigr), \bigl((2,2)|(2)\bigr)
  • \bigl((2)|(4)\bigr), \bigl((2)|(2,2)\bigr)
  • \bigl(\varnothing|(6)\bigr), \bigl(\varnothing|(4,2)\bigr), \bigl(\varnothing|(2,2,2)\bigr).

as required. \Box

Some Stirling Number identities by differentiation

March 20, 2017

Let G(z) be the exponential generating function enumerating a family of combinatorial objects. For example, -\log(1-z) = \sum_{n=1}^\infty z^n/n is the e.g.f. enumerating cycles (there are (n-1)! cycles on \{1,\ldots,n\}) and \mathrm{e}^z -1 = \sum_{n=1}^\infty z^n/n! is the e.g.f enumerating non-empty sets. Then \exp G(z) is the e.g.f. enumerating set partitions where each part carries a G-structure. For example,

\exp (\mathrm{e}^z - 1) = \sum_{n=0}^\infty B_n \frac{z^n}{n!}

where the Bell Number B_n is the number of set partitions of \{1,\ldots, n\}. We can keep track of the number of parts with a further variable. For example

\exp \bigl( -x \log (1-z) \bigr) = \sum_{n=0}^\infty \sum_{m=0}^\infty \genfrac{[}{]}{0pt}{}{n}{m} x^m \frac{z^n}{n!}

where the (unsigned) Stirling Number of the First Kind \genfrac{[}{]}{0pt}{}{n}{m} is the number of permutations of \{1, \ldots, n\} having exactly m disjoint cycles. Similarly

\exp \bigl( x(\mathrm{e}^z - 1) \bigr) = \sum_{n=0}^\infty \sum_{m=0}^\infty  \genfrac{\{}{\}}{0pt}{}{n}{m} x^m \frac{z^n}{n!}

where the Stirling Number of the Second Kind \genfrac{\{}{\}}{0pt}{}{n}{m} is the number of set partitions of \{1, \ldots, n\} into exactly m parts.

All this is explained beautifully in Chapter 3 of Wilf’s book generating functionology, in a way that leads readily into the high-brow modern take on these ideas using combinatorial species. For my planned combinatorics textbook I expect to deal with products of exponential generating functions in an ad-hoc way, and probably not do much more, since it’s impossible to compete with Wilf’s exposition.

Two part structures where one part is boring

A special case of the multiplication rule is that \exp(z) G(z) is the e.g.f enumerating two-part set compositions (A,B) where A carries no extra structure and B carries a G-structure. For example, if d_n is the number of derangements of \{1,\ldots, n\} then, since any permutation is uniquely determined by its set of fixed points and the derangement it induces on the remaining points, we have \exp(z) G(z) = \sum_{n=0}^\infty n! z^n/n! = 1/(1-z), giving

G(z) = \exp(-z)/(1-z).

For another example, observe that the e.g.f enumerating the r^n functions f : \{1,\ldots, n\} \rightarrow \{1,\ldots, r\} is

\sum_{r=0}^\infty r^n \frac{w^r}{r!} \frac{z^n}{n!} = \sum_{r=0}^\infty \frac{w^r}{r!} \sum_{n=0}^\infty \frac{(rz)^n}{n!} = \sum_{r=0}^\infty \frac{w^r}{r!} \mathrm{e}^{rz} = \exp (w \mathrm{e}^z).

Each such function is uniquely determined by a pair (A,B) where A = \{1,\ldots,r\} \backslash \mathrm{im} f carries no extra structure, and B = \mathrm{im} f carries a set composition of \{1,\ldots, n\} into |B| parts. Therefore the e.g.f. for set compositions is

\exp(-w) \exp (w\mathrm{e}^z) = \exp(w (\mathrm{e}^z-1)).

(Note this series is doubly exponential: the number of set compositions of \{1,\ldots,n\} into exactly m parts is the coefficient of w^m/m! z^n/n!.) Since there are m! set compositions associated to each set partition into m parts, we get

\exp(w (\mathrm{e}^z-1) = \sum_{s=0}^\infty \sum_{n=0}^\infty  \genfrac{\{}{\}}{0pt}{}{n}{m} w^m \frac{z^n}{n!}.

as claimed earlier.

Binomial inversion

Multiplication by an exponential series is closely related to binomial inversion. Let G(z) = \sum_{n=0}^\infty a_n z^n/n!. Then

\exp(z) G(z) = \sum_{n=0}^\infty b_n z^n/n!

if and only if b_n = \sum_{m=0}^n \binom{n}{m} a_m.

This gives a very quick and elementary proof of the derangements formula: enumerating permutations by their number of fixed points we have n! = \sum_{m=0}^n \binom{n}{m} d_m, so

\sum_{n=0}^\infty d_n z^n/n! = \exp(-z) \sum_{n=0}^\infty n! z^n/n! = \exp(-z)/(1-z);

now take the coefficient of z^n to get

d_n = n!\bigl( 1-\frac{1}{1!} + \frac{1}{2!} - \cdots + \frac{(-1)^n}{n!} \bigr).


Let G(z) = \sum_{n=1}^\infty a_n z^n/n! be the exponential generating function for G-structures. We have seen that the weighted exponential generating function for G-structured set partitions is

\sum_{m=0}^\infty \sum_{n=0}^\infty a^{(m)}_n x^m \frac{z^n}{n!} = \exp x G(z).

Differentiating k times with respect to x, dividing by k!, and then setting x=1 we get

\sum_{n=0}^\infty \sum_{m=0}^n a^{(m)}_n \binom{m}{k} \frac{z^n}{n!} = G(z)^k/k! \exp G(z).

Now G(z)^k/k! is the exponential generating function for G-structured set partitions into exactly k parts, so, taking coefficients of z^n/n!, we get

\begin{aligned} \sum_{m=0}^n a^{(m)}_n \binom{m}{k} &= \Bigl[ \frac{z^n}{n!}\Bigr] \sum_{r=0}^\infty  a^{(k)}_r \frac{z^r}{r!} \exp G(z) \\ &= \sum_{r=0}^n a^{(k)}_r \frac{n!}{r!} [z^{n-r}] \exp G(z) \\ &= \sum_{r=0}^n \binom{n}{r} a^{(k)}_r  \bigl[\frac{z^{n-r}}{(n-r)!}\Bigr] \exp G(z) \\ &= \sum_{r=0}^n \binom{n}{r} a^{(k)}_r b_{n-r}   \end{aligned}

where b_n is the number of G-structured set partitions of \{1,\ldots,n\}.

This identity gives uniform proofs of two Stirling Number identities.

Taking G(z) = -\log (1-z) to enumerate Stirling Numbers of the First Kind we have \exp G(z) = 1/(1-z) and b_n = n! so

\sum_{m=0}^n  \genfrac{[}{]}{0pt}{}{n}{m} \binom{m}{k} = \sum_{r=0}^n \binom{n}{r}  \genfrac{[}{]}{0pt}{}{r}{k} (n-r)! = \genfrac{[}{]}{0pt}{}{n+1}{k+1}

where the final equality holds because the middle sum counts triples (X, \sigma, \tau) where X is an r-subset of \{1,\ldots, n\}, \sigma is a permutation of X having exactly k disjoint cycles and \tau is a cycle on \{1,\ldots,n,n+1\}\backslash X; such triples are in obvious bijection with permutations of \{1,\ldots,n ,n+1\} having exactly k+1 disjoint cycles.

Taking G(z) = \mathrm{e}^z-1 to enumerate Stirling Numbers of the Second Kind we have \exp G(z) = \sum_{n=0}^\infty B_n \frac{z^n}{n!} and b_n = B_n, so

\sum_{m=0}^n  \genfrac{\{}{\}}{0pt}{}{n}{m} \binom{m}{k}  = \sum_{r=0}^n  \binom{n}{r} \genfrac{\{}{\}}{0pt}{}{r}{k}  B_{n-r}.

This, and the analogous identity for the Stirling Numbers of the First Kind, may be compared with

\sum_{m=0}^n \binom{n}{m} \genfrac{\{}{\}}{0pt}{}{m}{k} = \genfrac{\{}{\}}{0pt}{}{n+1}{k+1}

which follows from counting set partitions according to the size of the part containing n+1.

Bijective proofs

The left-hand side of the general identity above counts G-structured set partitions of \{1,\ldots,n\} having exactly k distinguished parts, and maybe some further undistinguished parts. (Typically differentiation transforms generating functions in this way). The right-hand side counts the same objects, by enumerating triples consisting of an r-subset X of \{1,\ldots,n\}, a G-structured set partition of X into k distinguished parts, and an (undistinguished) G-structured set partition of \{1,\ldots,n\}\backslash X. This gives fully bijective proofs of both identities. So maybe they are not so deep: that said, even the special case k=1 of the first, namely

\sum_{m=0}^n  \genfrac{[}{]}{0pt}{}{n}{m} m =  \genfrac{[}{]}{0pt}{}{n+1}{2}

seemed non-obvious to me.

Regular abelian subgroups of permutation groups

March 15, 2017

A B-group is a group K such that if G is a permutation group containing K as a regular subgroup then G is either imprimitive or 2-transitive. (Regular subgroups are always transitive in this post.) The term ‘B-group’ was introduced by Wielandt, in honour of Burnside, who showed in 1901 that if p is an odd prime then C_{p^2} is a B-group. This is a companion to his important 1906 theorem that a transitive permutation group of prime degree is either solvable or 2-transitive.

One might imagine that, post-classification, it would be clear which groups are B-groups, but this is far from the case. The purpose of this post is to collect some ancillary results with a bearing on this question.


When Burnside’s argument succeeds in proving that G is imprimitive, it does so by showing that the kernel N of an irreducible non-trivial constituent of the permutation character of G is intransitive. The orbits of N are then a non-trivial block system for G. Thus G is not even quasiprimitive. That one always gets this stronger result is explained by Corollary 3.4 in this paper by Li. The details are spelled out below.

Claim. If G is a permutation group containing a regular abelian subgroup K then G is primitive if and only if G is quasiprimitive.

Proof. Suppose B is part of a non-trivial block system \mathcal{B} for G. Let L be the kernel of the induced action of K on \mathcal{B}. The key observation is that K/L acts regularly on \mathcal{B}: it acts semiregularly, since K/L is abelian, and since K is transitive, K/L is transitive on \mathcal{B}. Similarly, since K is transitive, L is transitive on the block B (given \alpha, \beta \in B, there exists k \in K such that \alpha^k = \beta; then B^k = B and because k acts regularly on \mathcal{B}, we have k \in L). So we can factor K as a top part K/L, acting regularly on \mathcal{B}, and a bottom part L acting regularly on each block. In particular, the kernel of G acting on \mathcal{B} contains L and L has \mathcal{B} as its orbits. Therefore \mathcal{B} is the set of orbits of a normal subgroup of G, and G is not quasiprimitive. \Box

Dropping the condition that K be abelian, one finds many imprimitive but quasiprimitive groups: they can be obtained from factorizations G = HK of a finite simple group G where H \cap K = 1 and H is not maximal. One nice example is A_7 = F_{21}S_5 where F_{21} is the transitive Frobenius group of order 21 inside A_7 and S_5 is generated by (1,2,3,4,5) and (1,2)(6,7). Here F_{21} is not maximal because it is a subgroup of \mathrm{GL}_3(\mathbb{F}_2) in its action on the 7 non-zero vectors of \mathbb{F}_2^3. For an easier example, take A_5 = \langle (1,2,3,4,5) \rangle A_4.

Factorizable groups

Say that a finite group K is factorizable if there exists t > 1 and groups K_1, \ldots, K_t such that K \cong K_1 \times \cdots \times K_t where |K_1| = \ldots = |K_t| \ge 3. If K is factorizable with |K_i| = m for each i, then, since each K_i acts as a regular subgroup of S_m, K is a regular subgroup of S_m \wr S_t in its primitive action on \{1,\ldots,m\}^t. Therefore no factorizable group is a B-group. This example appears as Theorem 25.7 in Wielandt Permutation Groups.

Affine groups

An interesting source of examples is affine groups of the form G = V \rtimes H where V = \mathbb{F}_p^n for some prime p and H \le \mathrm{GL}(V). Here V acts by translation on itself and H is the point stabiliser of 0. Any non-trivial block is invariant under the action of V and so is a linear subspace of V. Therefore G is primitive if and only if H acts irreducibly on V. The action of G is 2-transitive if and only if the point stabiliser H is transitive. Therefore if there exists a linear group H acting irreducibly but not transitively on \mathbb{F}_p^n, then C_p^n is not a B-group.

Symmetric group representations

Consider the symmetric group S_n acting on \langle v_1, \ldots, v_n \rangle by permutation matrices. Let V = \langle v_i - v_j : 1 \le i  2 and n \ge 2 then the orbit of v_1-v_2 does not contain v_2-v_1, and hence C_p^n is not a B-group in these cases. If p = 2 and n \ge 4 then the orbit of v_1+v_2 does not contain v_1+v_2+v_3+v_4. Therefore C_2^m is not a B-group when m is even and m \ge 4.

Exotic regular abelian subgroups of affine groups

An interesting feature of these examples, noted by Li in Remark 1.1 following the main theorem, is that V \rtimes H may have regular abelian subgroups other than the obvious translation subgroup V. Let t_v : V \rightarrow V denote translation by v \in V.
Take n=2r+1, fix s \le r, and consider the subgroup

K = \langle (2,3)t_{v_1+v_2}, \ldots, (2s,2s+1)t_{v_1+v_{2s}}, t_{v_{2s+2}}, \ldots, t_{v_{2r+1}} \rangle .

By the multiplication rule

h t_v h' t_{v'} = h h' t_{vh' + v'}

we see that (2t,2t+1)t_{v_1+v_{2t}} has square t_{v_{2t}+v_{2t+1}} and that the generators of K commute. Therefore K \cong C_4^s \times C_2^{2(r-s)}, and no group of this form, with r \ge 2 is a B-group. (Since these groups are factorizable, this also follows from Wielandt’s result above.)

Li’s paper includes this example: the assumption that n is odd, needed to ensure that V is irreducible, is omitted. In fact it is an open problem to decide when C_2^n is a B-group.

Perhaps surprisingly, this example does not generalize to odd primes. To see this, we introduce some ideas from an interesting paper by Caranti, Della Volta and Sala. Observe that if K is a regular abelian subgroup of G then for each v \in V, there exists a unique g_v \in K such that 0g_v = v. There exists a unique h_v \in H such that g_v = h_v t_v. By the multiplication rule above we have

h_u t_u h_v t_v = h_u h_v t_{uh_v + v} = h_vh_u t_{vh_u+u} = h_vt_vh_ut_u

for all u, v \in V. Therefore \{h_v : v \in V\} is an abelian subgroup of H and uh_v + v = vh_u + u for all u,v \in V. Replacing v with v+w, we have

\begin{aligned} uh_{v+w} + (v+w) &= (v+w)h_u + u \\                  &= vh_u + wh_u + u \\                  &= uh_v + v + uh_w + w - u \end{aligned}

and so, cancelling v+w, we get the striking linearity property

h_{v+w} = h_v + h_w - \mathrm{id}.

Equivalently, h_{v+w}-\mathrm{id} = (h_v-\mathrm{id}) + (h_w-\mathrm{id}). Since (h_v - \mathrm{id})(h_w - \mathrm{id}) = (h_vh_w - \mathrm{id}) - (h_v - \mathrm{id}) - (h_w - \mathrm{id}), it follows that the linear maps \{h_v - \mathrm{id} : v \in V \} form a subalgebra of \mathrm{End}(V). (This is essentially Fact 3 in the linked paper.)

Abelian regular subgroups of odd degree affine symmetric groups

Suppose that K is a regular abelian subgroup of V \rtimes S_n and that there exists v \in V such that h_v \not= \mathrm{id}. The matrix M representing h_v in the basis v_2-v_1, \ldots, v_n-v_1 of V is a permutation matrix if 1 h_v = 1. If 1h_v = a > 1 and bh_v = 1 then from

(v_i - v_1)h_v = (v_{ih}-1) - (v_a-v_1)

we see that M has -1 in each entry in the column for v_a-v_1, and the only other non-zero entries are a unique 1 in the row for v_i-v_1, for each i \not= b. For example, (1,2,3,4) is represented by

\left( \begin{matrix} -1 & 1 & 0 \\ -1 & 0 & 1 \\ -1 & 0 & 0 \end{matrix} \right).

By the linearity property above, h_{2v} = 2h_v - \mathrm{id}. But if p > 2 then 2M - I is not of either of these forms. Therefore V is the unique regular abelian subgroup of V \rtimes S_n.

Suppose that p = 2. Suppose that h_v has a cycle of length at least 4. By relabelling, we may assume this cycle is (1,2,\ldots,a) where a is a 2-power. The matrix representing h_v has -1 entries in column 2, and the matrix representing h_v^{-1} has -1 entries in column a. Therefore the matrix representing h_v + h_v^{-1} + \mathrm{id} has -1 entries in both columns 2 and a, and so is not of the permitted form. It follows that, when p=2 and n=2r+1 is odd, the only abelian regular subgroups of V \rtimes S_n are isomorphic to C_4^s \times C_2^{2r-2s}. (In fact it seems they are precisely the subgroups constructed above.)

Abelian regular subgroups of even degree affine symmetric groups

Now suppose that n is even. Let U = V / \langle v_1 + \cdots + v_n \rangle and consider U \rtimes S_n. When n=4 the action of S_4 is not faithful: after factoring out the kernel \langle (1,2)(3,4), (1,3)(2,4)\rangle we get \mathbb{F}_2^2 \rtimes S_3 \cong S_4, which has abelian regular subgroups C_2 \times C_2 and C_4. When n=6, there is an abelian regular subgroup isomorphic to C_8 \times C_2, generated by t_{v_1+v_3}(3,4,5,6) and t_{v_1+v_2}. The obstruction seen above does not apply: for example, in the basis v_1+v_3, v_1+v_4,v_1+v_5,v_1+v_6, we have

(3,4,5,6) + (3,6,5,4) + \mathrm{id} \mapsto \left( \begin{matrix} 1 & 1 & 0 & 1 \\ 1 & 1 & 1 & 0 \\ 0 & 1 & 1 & 1 \\ 1 & 0 & 1 & 1 \end{matrix} \right)

which is the matrix representing (1,2)(3,6,5,4), thanks to the relations present in the quotient module. However, in this case the action of S_6 on \mathbb{F}_2^4 is transitive (any element of U is congruent to some v_i + v_j) and in fact C_8 \times C_2 is a B-group. (A related, remarkable fact, is that the action of A_6 on U extends to a 2-transitive action of A_7, giving an example of a 3-transitive affine group, U \rtimes A_7.) A computer calculation shows that, up to conjugacy, there are also three abelian regular subgroups of U \rtimes S_n isomorphic to C_4 \times C_4 and two isomorphic to C_4 \times C_2 \times C_2. If n \ge 8 then a similar argument to the odd degree case shows that any abelian regular subgroup has exponent at most 4, so we get no new examples.

Elementary abelian subgroups other than V

The following lemma has a simple ad-hoc proof.

Lemma. If n < p then the only regular abelian subgroups of V \rtimes \mathrm{GL}(V) are elementary abelian.

Proof. Suppose that K is an elementary abelian subgroup of V \rtimes \mathrm{GL}(V). Let h_v t_v \in K. Since h_v - 1 is a nilpotent n \times n-matrix and n \textless\; n we have (h_v - 1)^p = 0 The p-th power of h_v t_v \in V is therefore t_w where w = v + vh_v + \cdots + vh_v^{p-1}. Since each Jordan block in h_v has size at most p-1, we have w=0. Therefore K has exponent p. \Box

Note however that there exist elementary abelian subgroups other than the obvious V whenever n \ge 2. Possibly they can be classified by the correspondence with nil-algebras in the Caranti, Della Volta, Sala paper.

Algebra groups

Given n \in \mathbb{N} and a field F, define an algebra group to be a subgroup of \mathrm{GL}_n(F) of the form \{I + X : X \in A \} where A is a nilalgebra of n \times n matrices. (We may assume all elements of A are strictly upper triangular.) The subalgebra property above shows that any abelian regular subgroup of an affine group of dimension n over \mathbb{F}_2 is an extension

1 \rightarrow C_2^r \hookrightarrow K \twoheadrightarrow \{I + X : X \in A \} \rightarrow 1

of an algebra group by an elementary abelian 2-group. If F has characteristic 2 then, since

(I+X)(I+Y) = (I+XY) + (I+X) + (I+Y)

by the dual of the calculation above, a subgroup G of \mathrm{GL}_n(F) is an algebra group if and only if \{ I + g : g \in G \} is additively closed.

An exhaustive search shows that if n \le 4 then every 2-subgroup of \mathrm{GL}_n(\mathbb{F}_2) is an algebra group; note it suffices to consider subgroups up to conjugacy. But when n = 5 there are, up to conjugacy, 110 algebra subgroups of \mathrm{GL}_5(\mathbb{F}_2), and 66 non-algebra subgroups, of which 34 are not isomorphic to an algebra subgroup. Of these:

  • 1 is abelian, namely C_8;
  • 4 are nilpotent of class 2, namely \langle x, c : x^8 = c^2 = 1, x^c = x^5 \rangle, \langle x, y, c : x^4 = y^2 = c^4 = 1, x^c = xy, y^c = y \rangle, \langle x, y, c : x^2 = y^2 = c^8 = 1, x^c = xy, y^c = y \rangle, \langle x,y,c : x^4 = y^2 = c^4 = 1, x^c = xy, y^c = y\rangle;
  • 17 are nilpotent of class 3;
  • 12 are nilpotent of class 4.

The appearance of C_8 is explained by the following lemma, which implies that there is an algebra group isomorphic to C_{2^m} if and only if m \le 2.

Lemma. Let g \in \mathrm{GL}_n(\mathbb{F}_2) have order 2^m. Then \langle g \rangle is an algebra group if and only if all Jordan blocks in g have dimension at most 3.

Proof. Let 1 + X be a Jordan block of g of dimension d and order 2^m. The subalgebra of d\times d-matrices generated by X has dimension d-1, and so has size 2^{d-1}. Therefore if \langle g \rangle is an algebra group then 2^{d-1} \le 2^m. Hence d-1 \le m. On the other hand, since X^{d-1} \not=0, if 2^r < d then (1+X)^{2^r} \not=0, and so the order of 1+X is the minimal s such that 2^s \ge d. Hence 2^{m-1} < d. Combining these inequalities we get

2^{m-1} < d \le m+1

and so m \le 2, and, since d-1 \le m, we have d \le 3. The converse is easily checked. \Box.