Some Stirling Number identities by differentiation

March 20, 2017

Let G(z) be the exponential generating function enumerating a family of combinatorial objects. For example, -\log(1-z) = \sum_{n=1}^\infty z^n/n is the e.g.f. enumerating cycles (there are (n-1)! cycles on \{1,\ldots,n\}) and \mathrm{e}^z -1 = \sum_{n=1}^\infty z^n/n! is the e.g.f enumerating non-empty sets. Then \exp G(z) is the e.g.f. enumerating set partitions where each part carries a G-structure. For example,

\exp (\mathrm{e}^z - 1) = \sum_{n=0}^\infty B_n \frac{z^n}{n!}

where the Bell Number B_n is the number of set partitions of \{1,\ldots, n\}. We can keep track of the number of parts with a further variable. For example

\exp \bigl( -x \log (1-z) \bigr) = \sum_{n=0}^\infty \sum_{m=0}^\infty \genfrac{|}{|}{0pt}{}{n}{m} x^m \frac{z^n}{n!}

where the (unsigned) Stirling Number of the First Kind \genfrac{|}{|}{0pt}{}{n}{m} is the number of permutations of \{1, \ldots, n\} having exactly m disjoint cycles. Similarly

\exp \bigl( x(\mathrm{e}^z - 1) \bigr) = \sum_{n=0}^\infty \sum_{m=0}^\infty  \genfrac{\{}{\}}{0pt}{}{n}{m} x^m \frac{z^n}{n!}

where the Stirling Number of the Second Kind \genfrac{\{}{\}}{0pt}{}{n}{m} is the number of set partitions of \{1, \ldots, n\} into exactly m parts.

All this is explained beautifully in Chapter 3 of Wilf’s book generating functionology, in a way that leads readily into the high-brow modern take on these ideas using combinatorial species. For my planned combinatorics textbook I expect to deal with products of exponential generating functions in an ad-hoc way, and probably not do much more, since it’s impossible to compete with Wilf’s exposition.

Two part structures where one part is boring

A special case of the multiplication rule is that \exp(z) G(z) is the e.g.f enumerating two-part set compositions (A,B) where A carries no extra structure and B carries a G-structure. For example, if d_n is the number of derangements of \{1,\ldots, n\} then, since any permutation is uniquely determined by its set of fixed points and the derangement it induces on the remaining points, we have \exp(z) G(z) = \sum_{n=0}^\infty n! z^n/n! = 1/(1-z), giving

G(z) = \exp(-z)/(1-z).

For another example, observe that the e.g.f enumerating the r^n functions f : \{1,\ldots, n\} \rightarrow \{1,\ldots, r\} is

\sum_{r=0}^\infty r^n \frac{w^r}{r!} \frac{z^n}{n!} = \sum_{r=0}^\infty \frac{w^r}{r!} \sum_{n=0}^\infty \frac{(rz)^n}{n!} = \sum_{r=0}^\infty \frac{w^r}{r!} \mathrm{e}^{rz} = \exp (w \mathrm{e}^z).

Each such function is uniquely determined by a pair (A,B) where A = \{1,\ldots,r\} \backslash \mathrm{im} f carries no extra structure, and B = \mathrm{im} f carries a set composition of \{1,\ldots, n\} into |B| parts. Therefore the e.g.f. for set compositions is

\exp(-w) \exp (w\mathrm{e}^z) = \exp(w (\mathrm{e}^z-1)).

(Note this series is doubly exponential: the number of set compositions of \{1,\ldots,n\} into exactly m parts is the coefficient of w^m/m! z^n/n!.) Since there are m! set compositions associated to each set partition into m parts, we get

\exp(w (\mathrm{e}^z-1) = \sum_{s=0}^\infty \sum_{n=0}^\infty  \genfrac{\{}{\}}{0pt}{}{n}{m} w^m \frac{z^n}{n!}.

as claimed earlier.

Binomial inversion

Multiplication by an exponential series is closely related to binomial inversion. Let G(z) = \sum_{n=0}^\infty a_n z^n/n!. Then

\exp(z) G(z) = \sum_{n=0}^\infty b_n z^n/n!

if and only if b_n = \sum_{m=0}^n \binom{n}{m} a_m.

This gives a very quick and elementary proof of the derangements formula: enumerating permutations by their number of fixed points we have n! = \sum_{m=0}^n \binom{n}{m} d_m, so

\sum_{n=0}^\infty d_n z^n/n! = \exp(-z) \sum_{n=0}^\infty n! z^n/n! = \exp(-z)/(1-z);

now take the coefficient of z^n to get

d_n = n!\bigl( 1-\frac{1}{1!} + \frac{1}{2!} - \cdots + \frac{(-1)^n}{n!} \bigr).


Let G(z) = \sum_{n=1}^\infty a_n z^n/n! be the exponential generating function for G-structures. We have seen that the weighted exponential generating function for G-structured set partitions is

\sum_{m=0}^\infty \sum_{n=0}^\infty a^{(m)}_n x^m \frac{z^n}{n!} = \exp x G(z).

Differentiating k times with respect to x, dividing by k!, and then setting x=1 we get

\sum_{n=0}^\infty \sum_{m=0}^n a^{(m)}_n \binom{m}{k} \frac{z^n}{n!} = G(z)^k/k! \exp G(z).

Now G(z)^k/k! is the exponential generating function for G-structured set partitions into exactly k parts, so, taking coefficients of z^n/n!, we get

\begin{aligned} \sum_{m=0}^n a^{(m)}_n \binom{m}{k} &= \Bigl[ \frac{z^n}{n!}\Bigr] \sum_{r=0}^\infty  a^{(k)}_r \frac{z^r}{r!} \exp G(z) \\ &= \sum_{r=0}^n a^{(k)}_r \frac{n!}{r!} [z^{n-r}] \exp G(z) \\ &= \sum_{r=0}^n \binom{n}{r} a^{(k)}_r  \bigl[\frac{z^{n-r}}{(n-r)!}\Bigr] \exp G(z) \\ &= \sum_{r=0}^n \binom{n}{r} a^{(k)}_r b_{n-r}   \end{aligned}

where b_n is the number of G-structured set partitions of \{1,\ldots,n\}.

This identity gives uniform proofs of two Stirling Number identities.

Taking G(z) = -\log (1-z) to enumerate Stirling Numbers of the First Kind we have \exp G(z) = 1/(1-z) and b_n = n! so

\sum_{m=0}^n  \genfrac{|}{|}{0pt}{}{n}{m} \binom{m}{k} = \sum_{r=0}^n \binom{n}{r}  \genfrac{|}{|}{0pt}{}{r}{k} (n-r)! = \genfrac{|}{|}{0pt}{}{n+1}{k+1}

where the final equality holds because the middle sum counts triples (X, \sigma, \tau) where X is an r-subset of \{1,\ldots, n\}, \sigma is a permutation of X having exactly k disjoint cycles and \tau is a cycle on \{1,\ldots,n,n+1\}\backslash X; such triples are in obvious bijection with permutations of \{1,\ldots,n ,n+1\} having exactly k+1 disjoint cycles.

Taking G(z) = \mathrm{e}^z-1 to enumerate Stirling Numbers of the Second Kind we have \exp G(z) = \sum_{n=0}^\infty B_n \frac{z^n}{n!} and b_n = B_n, so

\sum_{m=0}^n  \genfrac{\{}{\}}{0pt}{}{n}{m} \binom{m}{k}  = \sum_{r=0}^n  \binom{n}{r} \genfrac{\{}{\}}{0pt}{}{r}{k}  B_{n-r}.

Bijective proofs

The left-hand side of the general identity above counts G-structured set partitions of \{1,\ldots,n\} having exactly m distinguished parts (typically differentiation transforms generating functions in this way). The right-hand side counts the same objects, by enumerating triples consisting of an r-subset X of \{1,\ldots,n\}, a G-structured set partition of X into k distinguished parts, and an (undistinguished) G-structured set partition of \{1,\ldots,n\}\backslash X. This gives fully bijective proofs of both identities. So maybe they are not so deep: that said, even the special case k=1 of the first, namely

\sum_{m=0}^n  \genfrac{|}{|}{0pt}{}{n}{m} m =  \genfrac{|}{|}{0pt}{}{n+1}{2}

seemed non-obvious to me.

Regular abelian subgroups of permutation groups

March 15, 2017

A B-group is a group K such that if G is a permutation group containing K as a regular subgroup then G is either imprimitive or 2-transitive. (Regular subgroups are always transitive in this post.) The term ‘B-group’ was introduced by Wielandt, in honour of Burnside, who showed in 1901 that if p is an odd prime then C_{p^2} is a B-group. This is a companion to his important 1906 theorem that a transitive permutation group of prime degree is either solvable or 2-transitive. Both proofs saw early applications of character theory to permutation groups.

Burnside attempted to generalize his 1901 result in two later papers: in 1911, he claimed a proof that C_{p^n} is a B-group for any prime p and any n \ge 2, and in 1921, he claimed a proof that all abelian groups, except for elementary abelian groups, are B-groups. The first claim is correct, but his proof has a serious gap. This error appears to have been unobserved (or, just possibly, observed but ignored, since the result was soon proved in another way using Schur’s theory of S-rings) until 1994 when it was noted by Peter Neumann, whose explication may be found in his introduction to Burnside’s collected works. In 1995, Knapp extended Burnside’s argument to give a correct proof. Burnside’s second claim is simply false: for example, S_4 \wr S_2 acts primitively on \{1,2,3,4\}^2, and has a regular subgroup isomorphic to C_4 \times C_4. In one of my current projects, I’ve simplified Knapp’s proof and adapated Burnside’s character-theoretic methods to show, more generally, that any cyclic group of composite order is a B-group.

One might imagine that, post-classification, it would be clear which groups are B-groups, but this is far from the case. The purpose of this post is to collect some ancillary results with a bearing on this question.


When Burnside’s method succeeds in proving G is imprimitive, it does so by showing that the kernel N of an irreducible non-trivial constituent of the permutation character of G is intransitive. The orbits of N are then a non-trivial block system for G. Thus G is not even quasiprimitive. That Burnside’s method always gives this stronger result is explained by Corollary 3.4 in this paper by Li. The details are spelled out below.

Claim. If G is a permutation group containing a regular abelian subgroup K then G is primitive if and only if G is quasiprimitive.

Proof. Suppose B is part of a non-trivial block system \mathcal{B} for G. Let L be the kernel of the induced action of K on \mathcal{B}. The key observation is that K/L acts regularly on \mathcal{B}: it acts semiregularly, since K/L is abelian, and since K is transitive, K/L is transitive on \mathcal{B}. Similarly, since K is transitive, L is transitive on the block B (given \alpha, \beta \in B, there exists k \in K such that \alpha^k = \beta; then B^k = B and because k acts regularly on \mathcal{B}, we have k \in L). So we can factor K as a top part K/L, acting regularly on \mathcal{B}, and a bottom part L acting regularly on each block. In particular, the kernel of G acting on \mathcal{B} contains L and L has \mathcal{B} as its orbits. Therefore \mathcal{B} is the set of orbits of a normal subgroup of G, and G is not quasiprimitive. \Box

A result of Newell on plethysms of symmetric functions

January 7, 2017

It’s often said that part of the unique character of mathematics is that it builds on itself. Old results may be forgotten, but are only very rarely found to be incorrect. But changes in the ‘expected general background’ make many old papers impenetrable. For example, a long standing conjecture of Foulkes was introduced in a paper with the title ‘On the concomitants of the quintic and sextic up to degree four in the coefficients of the ground form‘. How many algebraists working today would immediately know even roughly what it is about? Of them, I’m sure still fewer would expect to read his paper with any pleasure. Certainly I cannot.

The purpose of this post is to do a small, but detailed, translation exercise on a paper of Newell from 1951, with the title ‘A theorem on the plethysm of S-functions‘. This is a relatively late publication from the stable of British algebraists working on invariant theory, so one might hope it would be fairly accessible.


Fix m, n \in \mathbb{N} and let \lambda be a partition of mn. Newell’s paper is cited in an important paper of Weintraub for the pair of results stated verbatim below:

\begin{aligned} ([m+1] \odot [1^n],[\lambda_1+1,\ldots,\lambda_n+1]) &= ([m] \odot [n], [\lambda]) \\ ([m+1] \odot [n],[\lambda_1+1,\ldots,\lambda_n+1]) &= ([m] \odot [1^n], [\lambda])  \end{aligned}.

Weintraub defines all his notation (which was standard for the time) clearly. For example, [\lambda] denotes the irreducible representation of the symmetric group labelled by the partition \lambda, and \odot is the analogue of the plethysm product \circ for the symmetric group, with the variables swapped. Writing s_\lambda for the Schur functions labelled by the partition \lambda, an equivalent statement in the language of symmetric functions is:

\begin{aligned} \langle s_{(1^n)} \circ s_{(m+1)}, s_{\lambda + (1^n)} \rangle &= \langle s_{(n)} \circ s_{(m)}, s_\lambda \rangle  \\ \langle s_{(n)} \circ s_{(m+1)}, s_{\lambda + (1^n)} \rangle &= \langle s_{(1^n)} \circ s_{(m)}, s_\lambda \rangle  \end{aligned}.

Even a careful visual inspection of Newell’s paper reveals nothing that looks remotely like Weintraub’s statement (or my restatement). But in fact, the upper displayed equations are a special case of Newell’s Theorem 1, stated verbatim below:

Theorem 1.

If \{m\} \otimes \{n\} = \sum \{\nu\} where m and n are integers then for any integer k \le n

\sum g_{(1^k)\xi\nu} \{\nu\} = [(m-1)\otimes (1^k)][\{m\} \otimes \{n-k\}].

To be fair to Newell, g is defined earlier: one reads ‘… where g_{rst} is defined from the multiplication of S-functions by means of \{r\}\{s\} = g_{rst}\{t\}‘. So alles klar?

Translation up to Theorem 1

Much of Newell’s notation was standard at its time: \{\nu\} is the Schur function s_\nu and \otimes is the plethystic product corresonding to Weintraub’s \odot. (So again, the order is reversed compared to \circ.) The hypothesis \{m\} \otimes \{n\} = \sum \{\nu\} may still seem a little mysterious to modern eyes, since a general plethysm is certainly not multiplicity-free: nowadays we might write ‘s_{(n)} \circ s_{(m)} = \sum c_\nu s_{\nu} where c_\nu \in \mathbb{N}_0 for each partition \nu of mn‘. Only the g remains to be understood: Newell defines g_{rst} for natural numbers r, s, t, but then uses general partitions as the coefficients. Even in the special case, and with the benefit of knowing what is meant, his definition seems highly unclear to me.

A clue to the correct interpretation is given by the start of the sentence defining g quoted above: ‘It is known [(2) 349] that if \{\lambda\} \otimes \{n\} = \sum \{\nu\} then \sum g_{1\xi\nu} \{\xi\} = \{\lambda\} \otimes \{n-1\}[g_{1\mu \lambda} \{\mu\}] …’. This must be parsed bearing in mind the summation convention that the repeated letters \xi and \mu are summed over. So, in modern language, it says: ‘if s_n \circ s_\lambda = \sum c_\nu s_\nu then

\sum_\nu c_\nu \sum_\xi  g_{1\xi\nu} s_\xi = (s_{(n-1)} \circ s_\lambda)\sum_\mu g_{1\mu\lambda} s_\mu.'

(I have included the parentheses around s_{(n-1)} \circ s_\lambda as a small editorial mercy.) After staring at this for a while, I decided it must express the following symmetric function identity:

(s_{(n)} \circ s_\lambda)\!\downarrow = (s_{(n-1)} \circ s_\lambda)(s_\lambda\!\downarrow),

where s_\lambda\!\downarrow is the sum of all Schur functions s_\mu labelled by partitions \mu obtained from \lambda by removing a box. (This identity is proved below, using the symmetric group.) Reciprocally, we have

\langle s_\lambda\!\downarrow, s_\mu\rangle = \langle s_\lambda, s_{(1)}s_\mu \rangle.

Thus \downarrow is the adjoint to multiplication by s_{(1)}, and Newell’s g_{1\mu \nu} are the corresponding coefficients. Generalizing freely, we can guess the correct definition of g_{\theta\xi \nu}.

Definition. Given partitions \theta, \mu, \nu we define

g_{\theta \mu\nu} = \langle s_\nu, s_\theta s_\mu \rangle.

If we believe this is correct, then after one more piece of guesswork where we amend g_{(1^k)\xi\nu}\{\nu\} in the statement of Theorem 1 to g_{(1^k)\xi\nu}\{\xi\} (making it consistent with the above, and also with the analogous Theorem 1A), the conclusion of Theorem 1 becomes

\sum_\nu c_\nu \sum_\xi \langle s_\xi s_{(1^k)}, s_\nu \rangle s_\xi = (s_{(1^k)}\circ s_{(m-1)})(s_{(n-k)} \circ s_{(m)}).

The left-hand side is

\sum_\xi  \langle s_\xi s_{(1^k)}, \sum_\nu c_\nu s_\nu \rangle s_\xi = \sum_\xi \langle s_\xi s_{(1^k)}, s_{(n)} \circ s_{(m)} \rangle s_\xi.

Therefore Newell’s Theorem 1 can be stated as follows:

\sum_{\xi} \langle s_{\xi} s_{(1^k)}, s_{(n)} \circ s_{(m)} \rangle s_\xi = (s_{(1^k)}\circ s_{(m-1)})(s_{(n-k)} \circ s_{(m)}).

If s_\nu appears in s_{(n)} \circ s_{(m)} then \nu has at most n parts. On the other hand, by Pieri’s rule, s_{\xi}s_{(1^k)} is the sum of all partitions \nu obtained from \xi by adding k boxes, no two in the same column. Therefore, in the special case when k=n, we have

\langle s_\xi s_{(1^n)}, s_{(n)} \circ s_{(m)} \rangle = \langle s_{\xi + (1^n)}, s_{(n)} \circ s_{(m)} \rangle.

Since s_{(0)} \circ s_{(m)} = s_{(0)} = 1, the unit symmetric function, we obtain

\langle s_{\xi + (1^n)}, s_{(n)} \circ s_{(m)} \rangle = \langle s_\xi, s_{(1^n)} \circ s_{(m-1)}\rangle.

This is equivalent to the first displayed equation in my restatement of Weintraub’s version of Newell’s result. The second displayed equation can be translated similarly.

Proof of Theorem 1

We prove Newell’s theorem in the symmetric group, replacing m with m+1 for consistency with Weintraub’s statement. Let \Omega be the collection of all set partitions of \{1,\ldots,(m+1)n\} into n sets each of size m+1, and let M denote the corresponding permutation module for \mathbb{C}S_{(m+1)n} with basis \Omega. In Weintraub’s notation, the permutation character of M is [m+1] \odot [n].

Consider the restricted module M\!\downarrow_{S_k \times S_{(m+1)n-k}}, where S_k permutes \{1,\ldots, k\} and S_{(m+1)n-k} permutes \{k+1,\ldots, (m+1)n\}. The maximal summand of M on which S_k acts as the sign representation is spanned by the symmetrized set partitions v_\mathcal{P} for \mathcal{P} \in \Omega, where

v_\mathcal{P} = \mathcal{P} \sum_{\sigma \in S_k} \sigma \mathrm{sgn}(\sigma).

Observe that v_\mathcal{P} = 0 unless \mathcal{P} has at most one entry from \{1,\ldots, k\} in each of its n constituent (m+1)-sets. If X and Y are two such subsets, with entries x_1, \ldots, x_{m} and y_1, \ldots, y_{m} in \{k+1,\ldots,(m+1)n\} respectively, then

v_\mathcal{P} (x_1,y_1)\ldots (x_{m},y_{m}) = -v_\mathcal{P}.

It follows that

\begin{aligned}\quad [m+1] \odot [n] & \!\downarrow_{S_k \times S_{(m+1)n-k}} \\ & \hskip0.1in =  \mathrm{sgn}_{S_k} \times ([m] \odot [(1^k)])([m+1] \odot [n-k]) + \pi \end{aligned}

where \pi is a sum of irreducible characters of S_k \times S_{(m+1)n-k} not of the form \mathrm{sgn}_{S_k} \times \chi^\tau. By Frobenius reciprocity,

\begin{aligned}  \langle [m+1] \odot [n], & \mathrm{sgn}_{S_k} \times \chi^\xi \uparrow^{S_{(m+1)n}} \rangle \\ &\hskip0.2in = \langle ([m] \odot [(1^k)])([m+1] \odot [n-k]), \chi^\xi \rangle \end{aligned}

for each partition \xi of (m+1)n -k. This is equivalent to

\langle s_{(n)} \circ s_{(m+1)}, s_{(1^k)}s_\xi \rangle = \langle (s_{(1^k)} \circ s_m)(s_{(n-k)} \circ s_{(m+1)}, s_\xi \rangle

which is obtained from my restatement of Newell’s theorem by taking the inner product with s_\xi. The companion result follows by a sign twist. \Box

No wonder old results are frequently reproved: its invariably less work than reading the original.

Henry IV: Donmar at King’s Cross

December 2, 2016

This is a well established production so hardly needs a review, least of all from me, but here goes anyway. Harriet Walter’s performance as the titular King was astonishing: graceful, subtle and slightly knowing. Her command of Shakespeare’s language was superb, and orders of magnitude beyond the rest of the cast. Often this didn’t matter: the ruthless editing of the two parts into a single play meant many more complicated speeches were cut, and robust physical theatricality carried many other parts.

Jade Anouka’s Hospur made every scene she was in sparkle (even her dead body had a commanding presence!). Clare Dunne’s performance as Prince Hal seemed to get stronger throughout the evening. Shakespeare’s comparisons to his earlier play Richard II survived the editing and gave some needed depth to the death-bed scene between father and son in the final act. Sophie Stanton’s Falstaff left me a bit cold. The tavern scene where she and Prince Hal imitate Henry IV (and his likely reaction to Hal’s disportations) was filled with the camaraderie and bouyant wit I expected from Falstaff, but at other times, particularly in the scenes with the lesser actors, she seemed less full of her role.

There was an attempt, not entirely successful I think, to give Prince Hal’s final rejection of Falstaff extra weight by use of the prison setting. This framing also helps to excuse the many clunky props, such as a toy dog, or the assorted lollipops the actors sucked on intermittently, or the fake swords that were unrecognisable as weapons of any sort. Perhaps the point was to emphasise the juvenile nature of much of the early goings on? But to pick a tough comparison, in Lilies, the prison setting works brilliantly to emphasise the key themes: here it sometimes seemed unhelpful. Maybe it works better in the companion plays, Julius Ceasar and The Tempest? That said, there was effective use of the theatre space (the audience sits on all sides—the atmosphere was frequently intense), and some nice symbolism, for example, when Hotspur is slain on top of the prison yard’s basketball court’s centre circle, a map of England and Wales spray-painted around him.

The all female cast helped to bring out the gender themes, for example the generally contemptuous treatment of Mistress Quickly, and the marriage between Lady Percy (played by Sheila Atim, showing great promise) and Hotspur. The programme has an interesting article by Charlotte Higgins on the 2:1 problem: of the 10 theatres that received the most money from the Arts Council, women account for about 1/3 of the board members, artistic directors and actors. Higgins makes the interesting point that the colossal gender balance in Shakespeare (of 981 roles, 826 are men, 155 are women) shapes audience expectations of casts, and has knock-on effects for careers throughout British theatre.

Overall: an enjoyable evening, but certainly not one for the purists.

Sylow subgroups of symmetric groups

November 27, 2016

The purpose of this post is to collect some proof of known results on Sylow subgroups of symmetric groups that are scattered across the literature, and in one case, wrongly stated in the standard reference. Throughout permutations act on the right. Let S_{p^r} denote the symmetric group on \{0,1,\ldots,p^r-1\}.

Rooted trees

Let T_r be the rooted p-ary tree with levels numbered from 0 at the root to r at the leaves. Let T_r(\ell) denote the set of p^\ell vertices at level \ell. We label the vertices in T_r(\ell) by \{0,1,\ldots, p^\ell-1\}, so that \beta \in T_r(\ell+1) is a child of \alpha \in T_r(\ell) if and only if \beta \equiv \alpha mod p^\ell. Thus the children of \alpha \in T_r(\ell) are

\alpha, \alpha + p^\ell, \ldots, \alpha + (p-1)p^\ell \in T_r(\ell+1)

and \gamma is a descendant of \alpha if and only if \gamma \equiv \alpha mod p^\ell. For example, the children of the root vertex \varnothing are labelled 0,1,\ldots, p-1 and correspond to congruence classes modulo p, the children of 1 \in T_r(1) are labelled (in base p) by 01, 11, \ldots, (p-1)1, and correspond to the congruence classes modulo p^2 of numbers congruent to 1 modulo p, and so on. All this may be seen from the tree T_3 for p=3 shown below.

Let \mathrm{Aut}(T_r) be the group of automorphisms of T_r. Any automorphism is determined by its action on the p^r leaves of T_r, so, when it is useful to do so, we regard \mathrm{Aut}(T_r) as a subgroup of the symmetric group S_{p^r}.

Let 0 \le \ell < r. For each vertex \alpha \in T_r(\ell), let g(\alpha) \in \mathrm{Aut}(T_r) fix \alpha and all vertices of T_r that are not a descendant of \alpha, and permute the descendants of \alpha (of any generation) by adding p^{\ell} modulo p^{\ell+1}. More formally, the vertex \alpha + ip^\ell + jp^{\ell+1} where 0\le i < p and j \in \mathbb{N}_0 is sent to \alpha + (i+c \ \mathrm{mod}\  p)p^\ell +jp^{\ell+1}. For example, if r=3 then g(01) = (1,10,19) and

g(1) = (1,4,7)(10,13,16)(19,22,25).

(We rely on the length of the p-ary form of the vertex to indicate the intended level.) Let B(\ell) be the subgroup of \mathrm{Aut}(T_r) generated by all the g(\alpha) for \alpha \in T_r(\ell). Each subgroup B(\ell) is normalized by the B(j) for j < \ell, so P_r = B(r-1) B(r-2) \ldots B(0) is a p-subgroup of S_{p^r}. We have |P_r| = p^e where

e = 1 + p + \cdots + p^{r-1}.

It is well known that the highest power of p dividing n! is \sum_{i \ge 1} \lfloor n/p^i \rfloor, so P_r is a Sylow p-subgroup of S_{p^r}.

We now make the connection with the other standard construction of P_r. Let \phi : P_r \rightarrow P_{r-1} be the group homomorphism defined by restricting g \in \mathrm{Aut}(T_r) to T_r(r-1). Thus \phi(g) describes the action of g on the p^{r-1} vertices at level r-1, or, equivalently, on the p^{r-1} congruence classes of numbers in \{0,1,\ldots,p^r-1\} modulo p^{r-1}.

Lemma 1.P_r is permutation isomorphic to the iterated wreath product C_p \wr (C_p \wr \cdots \wr C_p) with r factors, in its intransitive action.

Proof. By induction \phi(P_r) \cong B(r-2)\ldots B(0) is permutation isomorphic to the imprimitive wreath product with r-1 factors. Therefore B(r-1) \cong C_p^{p^{r-1}} is the base group in a semidirect product B(r-1) \rtimes B(r-2) \ldots B(0) permutation isomorphic to C_p \wr (C_p \wr \cdots \wr C_p) in its imprimitive action. \Box

Centre and normalizer

The centre of P_r and its normalizer in S_{p^r} can be described by slightly more involved inductive arguments. The following elements of P_r will be useful: for 0 \le \ell < r, let z_r(\ell) = \prod_{\alpha \in T_r(\ell)} g(\alpha). For example, if p=3 and r=3 then

\begin{aligned} z_3(0) &= (0,1,2)(3,4,5)(6,7,8)(9,10,11) \ldots (24,25,26), \\  z_3(1) &= (0,3,6)(1,4,7)(2,5,8)(9,12,15)\ldots (20,23,26), \\ z_3(2) &= (0,9,18)(1,10,19)(2,11,20)(3,12,21) \ldots (8,17,26). \end{aligned}

Observe that \phi(z_r(\ell)) = z_{r-1}(\ell) for 0 \le \ell < r-1, and that \phi(z_r(r-1)) = \mathrm{id}.

Lemma 2. Z(P_r) = \langle z_r(r-1) \rangle.

Proof. If r=1 then z_1(0) = (0,1,\ldots,p) generates P_1. Let r \ge 2 and let z \in Z(P_r). By induction \phi(z) \in \langle \phi(z_r(r-2))\rangle. Hence z \in \phi^{-1} \langle \phi(z_r(r-2)) \rangle = B_{r-1} \langle z_{r}(r-2) \rangle. Let z = b z_r(r-2)^i where 0 \le i \le p-1 and b \in B_{r-1}. Since B(r-1) is abelian, we see that z_r(r-2)^i commutes with all permutations in B(r-1). If i\not=0 then there exists \alpha \in T_r(r-1) such that \alpha z \not= \alpha and so

\alpha g(\alpha) z_r(r-2)^i = (\alpha +p^n) z_r(r-2)^i = \alpha + p^{r-1} + ip^{r-2}


\alpha z_r(r-2)^i g(\alpha)  = (\alpha +ip^{r-2}) g_x = \alpha + ip^{r-2},

a contradiction. Therefore i=0 and z = \prod_{\beta \in T_r(r-1)} g(\beta)^{i_\beta} where 0 \le i_\beta < p for each \beta. A similar argument considering the actions of the generators g(\alpha) for \alpha \in T_r(r-2) now shows that i_\beta is independent of \beta, and so g \in \langle z_r(r-1)\rangle. \Box

The normalizer N_{S_{p^r}}(P_r) turns out to be a split extension of P_r. Let c be a primitive root modulo p. Let \alpha \in T_r(\ell). Let h(\alpha) fix \alpha and every vertex that is not a descendant of \alpha and permute the descendants of \alpha (of any generation) by sending \alpha + ip^\ell + jp^{\ell+1} where 0 \le i < \ell and 0 \le j < p^{r-\ell-1} to \alpha + (ci \ \mathrm{mod}\ p)p^\ell + jp^{\ell+1}. Thus h(\alpha) fixes the child 0\alpha of \alpha, and permutes the remaining p-1 children 1\alpha \ldots (p-1)\alpha by a (p-1)-cycle, acting compatibly on their descendants in turn. Let k_r(\ell) = \prod_{\alpha \in T_r(\ell)}h(\alpha). For example, if p=3 and r=3 then h(01) = (12,21), h(1) = (4,7)(13,16)(22,25) and

\begin{aligned}\scriptstyle k_3(0) = (1, 2)(4, 5)(7, 8)(10, 11)(13, 14)(16, 17)(19, 20)(22, 23)(25, 26), \\ \scriptstyle k_3(1) = (3, 6)(4, 7)(5, 8)(12, 15)(13, 16)(14, 17)(21, 24)(22, 25)(23, 26), \\ \scriptstyle k_3(2) = (9, 18)(10, 19)(11, 20)(12, 21)(13, 22)(14, 23)(15, 24)(16, 25)(17, 26).  \end{aligned}

The permutation k_3(0) is shown below.

Proposition 3. N_{S_{p^r}}(P_r) = P_r \rtimes \langle k_r(\ell) : 0 \le \ell < r-1 \rangle.

Proof. Let g \in N_{S_{p^r}}(P_r). By a similar argument to Lemma 1, we have g \in B(r-2)\ldots B(0) \langle k_3(0), \ldots, k_3(r-2) \rangle h where h \in \ker \phi. By composing h with a suitable power of z_r(r-1), we may assume that 0h = 0. Let h^\star denote the permutation induced by h on \{p^{\ell-1},\ldots,(p-1)p^{\ell-1}\}. Since Z(P_r) = \langle z_r(r-1) \rangle is a characteristic subgroup of P_r, it is normalized by g. Therefore h^\star normalizes the p-cycle (0,p^{\ell-1},\ldots,(p-1)p^{\ell-1}) forming part of the disjoint cycle form of z_r(r-1). Therefore h^\star acts on \{p^{\ell-1},\ldots,(p-1)p^{\ell-1}\} as some power of k_r(r-1). Since h normalizes Z(P_r), it follows that h acts compatibly on all the other p-cycles in the disjoint cycle form of z_r(r-1), and so h \in \langle k_r(r-1) \rangle, as required. \Box

In particular, a Sylow 2-subgroup of S_{2^r} is self-normalizing. The following corollary is also immediate from Proposition 3.

Corollary 4. The number of Sylow p-subgroups of S_{p^r} is (p^r)!/p^{e}(p-1)^r where e = 1 + p + \cdots + p^{r-1}. \Box

Derived group

Using the transitive action of P_r on the vertices at each level of T_r, it is not hard to prove the following lemma.

Lemma 5. The derived group P_r' of P_r contains the permutations g_\alpha g_\beta^{-1} for all \alpha, \beta vertices at the same level of T_r. \Box

Corollary 6. The permutations g(\alpha) g(\beta)^{-1} for \alpha, \beta vertices at the same level of T_r generate P_r' and P_r/P_r' \cong C_p^r.

Proof. Let \gamma_\ell \in T_r(\ell) be the vertex labelled 0 at level l of T_r. Let H be the subgroup generated by the permutations g(\alpha) g(\beta)^{-1}. By Lemma 5, \mathrm{Aut}(T_r) is generated by the g(\gamma_\ell) and H. Moreover, the g(\gamma_\ell) commute modulo H. Therefore P_r/H is abelian, and so H = P_r', with the permutations g(\gamma_\ell) generating the quotient group. \Box

Weisner’s counting argument

In On the Sylow subgroups of the symmetric and alternating groups,
Amer. J. Math. 47 (1925), 121–124, Louis Weisner gives a counting argument that, specialized to P_r, claims to show that P_r has exactly (p^r)!/p^e (p-1)^e Sylow p-subgroups, where, as above e = 1 + p + \cdots + p^{r-1}. The exponent of p is correct, but by Corollary 4, the exponent of p-1 is too large. Weisner’s error appears to be on page 123, where he implicitly assumes that Sylow subgroups of S_{p^r} with the same base group are equal. This is false whenever p \not= 2 and r \ge 2: for example

\begin{aligned} \langle (036), (147), (258) \rangle \rtimes \langle (012)(345)(678) \rangle, \\ \langle (063), (147), (258) \rangle \rtimes \langle (012)(378)(645) \rangle \end{aligned}

are distinct.

An alternative reference for the correct result is La structure des p-groupes de Sylow des groupes symétriques finis, Léo Kaloujnine, Ann. Sci. École Norm. Sup. (3) 65 (1948), 239–276: see page 262. In the footnote, Kaloujnine notes that, in general, the action of N_{S_{p^r}}(P_r) on P_r does not induce the full automorphism group. Indeed, when p=2 and r=2, the base group \langle (02), (13)\rangle of the wreath product

C_2 \wr C_2 = \langle (02), (13), (01)(23) \rangle

is not even a characteristic subgroup of C_2 \wr C_2. For more on this see On the structure of standard wreath products of groups, Peter M. Neumann, Math. Z. 84 1964, 343–373.

The mistake in Weisner’s paper is pointed out on page 73 of an M.Sc. thesis by Sandra Covello.

Final remarks

Lower central series

The lower central series of P_r was found by Weir in The Sylow subgroups of the symmetric groups, Proc. Amer. Math. Soc. 6 (1955) 534–541. Another description is given in the paper of Kaloujnine. There has been much further related work: see for example Lie algebra associated with the group of finitary automorphisms of p-adic tree, N. V. Bondarenko, C. K. Gupta V. I. Sushchansky, J. Algebra 324 (2010), 2198–2218 and the references to earlier papers therein. Using Lemma 6, one gets the description in Proposition 8 below. (This is almost certainly equivalent to a result in Kaloujnine’s paper stated using his formalism of tableaux polynomials, but the close connection with cyclic codes seems worth noting.)

Definition 7. Let P_r^m be the mth term in the lower central series of P_r, so P_r^1 = P_r' and P_r^m = [P_r, P_r^{m-1}] for m \ge 2. Let B(\ell,m) = P_r^m \cap B(\ell) for 0 \le s < r.

Let V = \mathbb{F}_p^{p^\ell}. Index coordinates of vectors in V by the set \{0,1,\ldots,p^{\ell}-1\} labelling the vertices in T_r(\ell). Given v \in V, we may define a corresponding element of B(\ell) by \prod_{\alpha=0}^{p^\ell-1} g(\alpha)^{v_\alpha}. This sets up a bijective correspondence between subgroups of B(\ell) and subspaces of \mathbb{F}_p^{p^\ell}. In turn, a subspace of \mathbb{F}_2^{p^\ell} invariant under the shift map sending (v_0,\ldots,v_{p^\ell-2},v_{p^\ell-1}) to (v_{p^\ell-1},v_0, \ldots, v_{p^\ell-2}) corresponds to an ideal in \mathbb{F}_p[x]/(x^{p^\ell}-1). (This is a basic observation in the theory of cyclic error-correcting codes.)

Proposition 8. We have

P_r^m = B(r-1,m)B(r-2,m) \ldots B(0,m)

where B(\ell, m) corresponds to the ideal in \mathbb{F}_p[x]/(x^{p^\ell}-1) generated by (x-1)^m.

Proof. Under the correspondence above, multiplication by x corresponds to conjugation by (0,1,\ldots,p^{\ell}-1) \in P_\ell. Each B(\ell,m) is invariant under conjugation by P_r, so we see that the subspaces of V corresponding to the B(\ell,m) are all x-invariant. Hence they are ideals in \mathbb{F}_p[x]/(x^{p^\ell}-1). It is easily seen by induction that (x-1)^m is in the ideal corresponding to B(\ell,m). Since the lower central series is strictly decreasing, and (x-1)^m generates an ideal of codimension m, the proposition follows. \Box

Corollary 9. Let M be the natural permutation module for P_r, defined over a field of characteristic p. Then the socle series for M and the radical series for M coincide; the mth term (from the top) in either is the image of (t-1)^m where t = (0,1,\ldots, p^r). \Box

Thus the radical and socle series of M considered just as an \mathbb{F}_p\langle t \rangle-module coincide with the radical and socle series of M as an \mathbb{F}_pP_r-module. This is not really unexpected: the trivial module is the unique simple \mathbb{F}\langle t \rangle-module, and M is uniserial even when considered as an \mathbb{F}_p\langle t \rangle-module. Still, it seems non-obvious without this injection of theory that \mathrm{im}(t-1)^m = \mathrm{ker}(t-1)^{p^r-m} is P_r-invariant.

It is a nice exercise to find the decomposition of the analogously defined permutation module over \mathbb{C}: it has exactly p-1 summands of dimension p^\ell for each \ell \in \{1,\ldots, r-1\} and p 1-dimensional summands, affording the characters of P_r / B(r-1)\ldots B(1) \cong C_p. The correspondence after Definition 7 can also be used to prove this conjecture.

Proposition 10. There are (p-1)^r conjugacy classes of p^r-cycles in P_r. All p^r-cycles in N_{S_{p^r}}(P_r) are conjugate.

Sketch proof. We take the special case r=2. Let g \in P_2 \cong C_p \wr C_p. In the notation of Chapter 4 of The representation theory of the symmetric group, Encyclopedia of Mathematics and its Applications, vol. 16, Addison-Wesley 1981), we have

g = (h^{a_0},\ldots,h^{a_{p-1}}; t )

where h generates a copy of C_p and t is a non-identity power of (0,1,\ldots, p-1). For simplicity, assume that t = (0,1,\ldots,p-1). The base group part of g corresponds to (a_0,\ldots,a_{p-1}) \in \mathbb{F}_p^p. Let c = a_0 + \cdots + a_{p-1}. The identity

x^k = k^{-1}xk = k^{-1}xkx^{-1} h = k^{-1}k^{x^{-1}} h

implies that

\begin{aligned} (h^{a_0}&,\ldots,h^{a_{p-1}}; t)^{(h,1,\ldots,1; 1)} \\ &= (h^{-1},1,\ldots,1)(h,1,\ldots,1)^{t^{-1}}(h^{a_0},\ldots,h^{a_{p-1}};t) \\        &= (h^{-1},1,\ldots,1)(1,h,\ldots,1)(h^{a_0},\ldots,h^{a_{p-1}}; t) \\        &= (h^{a_0-1},h^{a_1+1},\ldots,h^{a_{p-1}};t). \end{aligned}
The new base group part corresponds to (a_0-1,a_1+1,\ldots,a_{p-1}). Similarly by conjugating by (1,h,\ldots,1), and so on, we can shift any two adjacent positions as above. It follows that the conjugacy class of g contains all elements (h^{b_0},\ldots,h^{b_{p-1}}; t) such that b_0 + \cdots + b_{p-1} = c. By the general theory in the reference above, the conjugacy class is no larger. Extending the action to the normalizing group, Proposition 3 implies that any non-zero sum can be attained. \Box

Automorphism groups of cyclic codes

The previous section implies that the automorphism group of a p-ary cyclic code of length p^r contains a Sylow p-subgroup of S_{p^r}. In fact we get more: provided we label positions in the sensible way, so that the cyclic shift is (0,1,\ldots,p^r-1), the automorphism group contains our distinguished Sylow subgroup S_{p^r}. The results on N_{S_{p^r}}(P_r) above can be used to give a more powerful version of a result of Brand on equivalences of codes.

Proposition 11. Cyclic codes C and C' of length p^r are equivalent if and only if they are equivalent by an element of \langle k_r(\ell) : 0 \le \ell < r-1 \rangle.

Proof. Let Cg= C'. We have seen that P_r \le \mathrm{Aut}(C) and P_r \le \mathrm{Aut}(C'). Hence C'g^{-1}P_rg = C' so P_r^g \le \mathrm{Aut}(C'). By Sylow’s Theorem, there exists h \in \mathrm{Aut}(C') such that P_r^{gh} = P_r. Let gh = k \in N_{S_{p^r}}(P_r). Clearly we have C' = C'h = Cgh = Ck. Now apply Proposition 3 to k. \Box

For comparison, Brand’s result (see Lemma 3.1 in Polynomial isomorphisms of combinatorial objects, Graphs and Combinatorics 7 7–14) says that, for codes of arbitrary length, if P is a Sylow p-subgroup of C, then the equivalence may be assumed to conjugate (0,1,\ldots,p^r-1) to another permutation in P.

Application to Sylow’s Theorem

Let G be a finite group of order n. If G acts transitively on a set of size coprime to p then any point stabiliser contains a Sylow p-subgroup of G. The existence of Sylow subgroups of S_n gives a very convenient setting to exploit this observation. By letting G act regularly on itself, we may regard G as a subgroup of S_n. Fix a Sylow p-subgroup P of S_n. The coset space S_n/P has size coprime to p. Let Pg be in an orbit of G of size coprime to p and let H \le G be the stabiliser of Pg. Now PgH = Pg implies that gHg^{-1} is a subgroup of P, so gHg^{-1} is a p-subgroup of G of order divisible by the highest power of p dividing |G|.

Intransitivity of derived group

By Corollary 6, any permutation in P_r' permutes amongst themselves the odd and even numbers in \{0,1,\ldots,p^r-1\}. Hence P_r' is not transitive. More generally, let Q \le S_n be a transitive p-group. By embedding Q in a Sylow p-subgroup P of S_n, we see that Q' \le P' is not transitive. Of course this can be proved in many other ways: for example, if Q' is transitive then Q = RQ' where R is a point stabiliser. But Q' \le \Phi(Q), where \Phi(G) is the Frattini subgroup of `non-generators’ of Q, so we have Q = R, a contradiction.

Automorphisms of permutation groups

Let G \le S_n be a transitive permutation group. It is natural to ask when every automorphism of G is induced by the action of N_{S_n}(G). For example, this is the case whenever G acts regularly.

Magma code

MAGMA code constructing all the permutations and groups defined in this post is available here.

Algebraic numbers and character tables

November 12, 2016

Let \zeta be a primitive pth root of unity in \mathbb{C}. Let a be a primitive root modulo p, so the field automorphism \sigma \in \mathrm{Gal}(\mathbb{Q}(\zeta):\mathbb{Q}) defined by \zeta^\sigma = \zeta^a generates the Galois group. Fix t dividing p-1, let b = a^t, and let \alpha = \zeta + \zeta^b + \cdots + \alpha^{b^{s-1}} where st = p-1. Thus \mathbb{Q}(\alpha) is the fixed field of \tau = \sigma^{t}, the unique element of the Galois group of order s. We have [\mathbb{Q}(\zeta) : \mathbb{Q}(\alpha)] = t and [\mathbb{Q}(\alpha) : \mathbb{Q}] = s.

A few days ago, while reading Burnside’s proof (in Section 252 of his wonderful book, Theory of groups of finite order) that a permutation group of prime degree p is either 2-transitive or a Frobenius group contained in an affine general linear group \mathrm{AGL}_2(\mathbb{F}_p), I needed the identity

\alpha \alpha^{-\sigma^j} + \alpha^{\sigma} \alpha^{-\sigma^{j+1}} + \cdots + \alpha^{\sigma^{t-1}}\alpha^{-\sigma^{j+(t-1)}} = -s

where 0 < j < t. One rather sneaky proof is to look at the intended conclusion of Burnside's argument, namely that G is Frobenius of order ps, and work back. I've put in details of the construction of the character table below, but taking this as read, the proof is essentially one line.

Character tables of affine Frobenius groups

Let G = \langle g, h | g^p = h^s = 1, g^h = g^b \rangle. Setting N = \langle g \rangle, we have G = N \rtimes \langle h \rangle. In the action of G on the cosets of \langle h\rangle, the non-identity elements of N act as p-cycles, and all elements of G \backslash N have a unique fixed point. Therefore G is a Frobenius group. The conjugacy class of g is \{g, g^h, \ldots, g^{h^{s-1}}\} = \{g,g^b, \ldots, g^{b^{s-1}} \}, and the non-identity conjugacy classes have representatives

g, g^a, \ldots, g^{a^{t-1}}, h, h^2, \ldots, h^{s-1}

and are of sizes s, s, \ldots, s, p, p, \ldots, p respectively.

There are s linear characters of G inflated from G / \langle g \rangle \cong C_{s-1}. Let \phi : N \rightarrow \mathbb{C} be defined by \phi(g) = \zeta. Let \theta = \phi\uparrow^G. Then

\begin{aligned} \theta(g) &= \phi(g) + \phi(g^h) + \cdots + \phi(g^{h^{s-1}}) \\&= \phi(g) + \phi(g^b) + \cdots + \phi(g^{b^{s-1}}) \\ &= \zeta + \zeta^b + \cdots + \zeta^{b^{s-1}} \\ &= \alpha, \end{aligned}

and more generally, \theta(g^{a^j}) = \alpha^{\sigma^j} for all j. It follows that \theta, \theta^\sigma, \ldots, \theta^{\sigma^{s-1}} are s distinct characters of G. By Frobenius reciprocity

\begin{aligned} \langle \theta, \theta \rangle &= \langle \theta\downarrow_N, \phi \rangle \\ &= \langle \phi, \phi + \phi^{\tau} + \cdots + \phi^{\tau^{s-1}} \rangle \\ &= 1. \end{aligned}

The sum of the squares of the degrees of the characters constructed so far is 1^2 \times s + s^2 \times t = s + s^2 t = s (1+st) = sp, so we have found all irreducible characters.


Let 0 < j < t. From

|G| \langle \theta, \theta^{\sigma^j} \rangle = s^2 + s(\alpha \alpha^{-\sigma^j} + \alpha^{\sigma} \alpha^{-\sigma^{j+1}} + \cdots + \alpha^{\sigma^{t-1}} \alpha^{-\sigma^{j+(t-1)}})

we get

\alpha \alpha^{-\sigma^j} + \alpha^{\sigma} \alpha^{-\sigma^{j+1}} + \cdots + \alpha^{\sigma^{t-1}}\alpha^{-\sigma^{j+(t-1)}} = \frac{0-s^2}{s} = -s

for j\not= 0. If j=0, the left-hand side is |G| and we get instead p-s.

Two related well known tricks

In Wielandt’s well-known proof of Sylow’s Theorem, one needs to know that if m is coprime to p then \binom{p^a m}{p^a} \not\equiv 0 mod p. This follows immediately from Lucas’ Theorem, or one can follow Wielandt and use a short direct argument, but it is in the spirit of his proof to fix a set \Omega of size p^a m and let g \in \mathrm{Sym}(\Omega) have m disjoint p^a cycles. Then \langle g \rangle clearly has m fixed points, in its action on the set of subsets of \Omega of size p^a and all other orbits have size divisible by p. So \binom{p^a m}{p^a} \equiv m modulo p.

Another nice argument, applying linear algebra to algebraic number theory, shows that if \alpha and \beta are algebraic numbers then \alpha \beta is also algebraic. Indeed, \alpha \beta is an eigenvalue of X_\alpha \otimes X_\beta where X_\alpha and X_\beta are irreducible integer matrices having \alpha and \beta as eigenvalues. (For example, take the companion matrices of the minimal polynomials.) This trick also shows that \alpha \beta^{-1} is an eigenvalue of an integer matrix, Y say, and so 1 + \alpha\beta^{-1} is an eigenvalue of I + Y. Therefore 1 + \alpha\beta^{-1} is algebraic, and multiplying through by \beta (using the result just proved), we get that \alpha + \beta is algebraic.

Putnam 2016

November 4, 2016

Today I picked up the October issue of the American Mathematical Monthly. As traditional, the first article has the questions in this year’s William Lowell Putnam Competition. As displacement activity, I did Paper A, allowing myself two of the usual three hours, but, as is my normal habit, full use of computer algebra. For the entertainment of others (and possibly as a warning to myself), here is a record of the thought processes of an averagely procrastinating mathematician.

A1. Let A and B be points on the same branch of the hyperbola xy=1. Suppose that P is a point lying between A and B on this hyperbola, such that the area of the triangle APB is as large as possible. Show that the region bounded by the hyperbola and the chord AP has the same area as the region bounded by the hyperbola and chord PB.

Thoughts / solution. This looks like a routine optimization problem. Put A at (a,1/a), B at (b,1/b) and P at (p,1/p), with a > 0. The first region is formed from the quadrilateral with vertices (a,0),(p,0),(p,1/p),(a,1/p), missing out the area below the hyperbola from a to p. So its area is

\displaystyle \frac{(p-a)(1/a+1/p)}{2} - \log \frac{p}{a}.

Similarly the second region has area

\displaystyle \frac{(b-p)(1/p+1/b)}{2} - \log \frac{b}{p}.

I can see that p is eliminated when we add the logs. This looks hopeful. I add the areas and promptly make an algebraic slip that says the sum is independent of p. This can’t be right. Turning to Mathematica I get the correct sum, and by differentiating, see that it is minimized when p = \sqrt{ab}. This maximizes the area of the triangle, and sure enough, the expressions above for the areas under the chords turn out to be equal when p = \sqrt{ab}.

(In retrospect, this had to be the right p, because by the moral linear independence of arbitrary transcendentals, there’s no way the expressions will be equal without the logs agreeing.)

A2. Let a_0 = 1, a_1 = 2, and a_n = 4a_{n-1} - a_{n-2} for n \ge 2. Find an odd prime factor of a_{2015}.

Thoughts / solution. I immediately load the problem into Mathematica and compute the factorizations of a_1, \ldots, a_{100}. It looks like 181 is always a factor of a_{5m} when m is odd. Since 2015 = 5 \times 13 \times 31, this would do it. After some dithering with 2 \times 2 matrices versus direct manipulation of the recurrence, I use Mathematica with

b[0] := b0; b[1] := b1; b[n_] := 4b[n-1] - b[n-2]

to find b_{10} = -40545 b_0 + 151316 b_1. It follows that

b_{n+10} = -40545b_{n} + 151316b_{n+1}

for all n. Since 151316 = 181 \times 836, the claim follows by induction. Thinking that this wouldn’t have been much fun to do without computer algebra, I press on.

A3. Compute

\displaystyle \log_2 \Bigl( \prod_{a=1}^{2015} \prod_{b=1}^{2015} (1 + \mathrm{e}^{2\pi i ab/2015}) \Bigr).

Here i is the imaginary unit (that is, i^2 = -1).

Thoughts / solution. My immediate thought is ‘who is going to know the definition of the complex exponential function, but get confused by i‘? Next I put the product into Mathematica, replacing 2015 with a general N. After some confusing errors, I realize that N (even used in pattern matching as N_) is a bad choice, because it clashes with the built in numerical evaluation function. This hitch resolved, the small values suggest that the product is 0 when N (or rather, a more Mathematica-friendly letter) is even, and a power of 2 when N is odd. A trivial change in the code lets me compute

\displaystyle Q(a) = \prod_{b=1}^{N} (1 + \mathrm{e}^{2\pi i ab/N}).

Of course its still a power of 2, but the exponent is still not obvious. Fairly quickly I realise I’ve encountered the product

(1+\zeta)(1+\zeta^2) \ldots (1+\zeta^{M}),

where \zeta is a primitive Mth root of unity, before: the standard (and I feel slightly tired) trick, is to note that, up to a sign, it’s obtained by evaluating X^M-1=(X-1)(X-\zeta)\ldots (X-\zeta^{M-1}) at X=-1. So (1+\zeta)(1+\zeta^2) \ldots (1+\zeta^{M}) = 2 and, after some stupid slips and general lack of progress, I realise that

Q(a) = 2^{\gcd(N,a)}.

This reduces the problem to evaluating \sum_{a=1}^N \gcd(N,a). Turning to a small example, I see how this works for N = 15: we have \phi(15) = 8 coprime numbers with gcd 1, then 3,6,9,12 each contribute 3, and 5,10 each contribute 5. I go off on a tangent trying to get a general formula with the Principle of Inclusion and Exclusion. Nothing seems very nice, and, noting that the sum is over all a, not those a satisfying some ‘sievable’ property, it starts to feel like the wrong approach. Going back to the example, I realise that there are exactly \phi(N/d) numbers with gcd d. So we need

P(N) = \prod_{c \mid n} 2^{\phi(c) N/c}.

It looks like P might be multiplicative. Yes it is. I evaluate P(p^a), then realise I’m wasting time, since I should remember from A2 that 2015 = 5 \times 13 \times 31 is square-free. Clearly

P(p) = 2^p \times 2^{p-1} = 2^{2p-1}..

So P(2015) = 2^{9 \times 25 \times 61}.

A4. For each real number x, let

\displaystyle f(x) = \sum_{n \in S_x} \frac{1}{2^n}

where S_x is the set of positive integers n for which \lfloor nx \rfloor is even. What is the largest real number L such that f(x) \ge L for all x \in [0,1)? (As usual \lfloor z \rfloor denotes the greatest integer less than or equal to z.)

Thoughts / solution. I’m getting a bit tired now, but this looks like an interesting problem. Once again, I load it into Mathematica. This takes a bit more effort, but after a few minutes I’ve got the code and graph shown below.

It looks like the minimum is attained at 2/3. I evaluate N[f[1000, 2/3 - 0.0001], 20] and get 0.57142857142857142857. I should recognise this immediately as 4/7, but instead put it into Google.

I’m now quite keen to solve this problem. I look at some small cases by hand, seeing that if x \in [0,1/2) we get an immediate contribution of 2^{-2}, so x \ge 1/2 is indicated. Then x \in [1/3,2/3) makes sure that we don’t get 2^{-3}, so it looks like x \le 2/3. But then we have x \in [1/2,3/4), so 4 \in S_x and we do get 2^{-4}. Maybe this is unavoidable, … I have my first ‘aha’ moment of the competition, when I realise that this greedy approach is going to solve the problem: since 1/2^n is equal to 1/2^{n+1} + 1/2^{n+2} + \cdots , nothing is more important at the nth step than avoiding n \in S_x, if we can. I celebrate by taking 10 minutes off to play some blitz chess.

Back at the problem, I see that I have to rule out an interval of the form [2r/n, (2r+1)/n) containing 2/3, such that the lower bound from the greedy strategy after n-1 steps is less than 2r/n. Then we would `dodge’ left of this interval, and lose control of x. I try to make this work for at least 10 minutes, not getting anywhere much. Maybe only prime values of n are relevant? I get Mathematica to show me the relevant intervals with the code below.

This shows that x \ge 5/8 is a lower bound from the greedy strategy not coming from prime n. But more importantly, I finally see the pattern: starting at n=3m-1, we have

\frac{2}{3} \in [(2m-1)/(3m-1), 2m/(3m-1));

since 2m-1 is odd, the greedy strategy tells us that x \ge (2m-1)/(3m-1). For the next m we have

\frac{2}{3}  \in [(2m-1)/3m, (2m+1)/3m,]

so x \ge (2m-1)/3m, but this is already implied. Next

\frac{2}{3}  \in [2m/(3m+1), (2m+1)/(3m+2)),

and since 2m/(3m+1) \le (2m-1)/(3m-1), there is no way to avoid 3m+1 \in S_x. So taking x arbitrarily close to 2/3 we can get f(x) arbitrarily close to

\frac{1}{2^1} + \frac{1}{2^4} + \frac{1}{2^7} + \cdots   = \frac{1}{2} \times \frac{1}{1-1/8} = \frac{4}{7}.

A5. Let q be an odd positive integer, and let N_q denote the number of integers a such that 0 < a < q/4 and \mathrm{gcd}(a,q) = 1. Show that N_q is odd if and only if q is of the form p^k with k a positive integer and p a prime congruent to 5 or 7 modulo 8.

Thoughts. My immediate thought: this is too much, and I’m wasting time doing this anyway. Maybe it is something to do with arithmetic sums (which I know I don’t know well). I decide to skip it.

A6. Let n be a positive integer. Suppose that A, B, and M are n \times n matrices with real entries such that AM = MB and such that A and B have the same characteristic polynomial. Prove that \mathrm{det}(A-MX) = \mathrm{det}(B-XM) for every n \times n matrix X with real entries.

Thoughts / sketched solution. Surely as a professional user of linear algebra, this can’t be hard for me? Reluctantly I commit to solving it. Immediately I see that if M is invertible then we’re done, since then M^{-1}AM = B and so

\mathrm{det}(B-XM) = \mathrm{det}(M^{-1}A-X) \mathrm{det}(M)

and similarly

\mathrm{det}(A-MX) = \mathrm{det}(M)\mathrm{det}(M^{-1}A - X).

Thinking about the other extreme case, when M is 0, the AM = MB condition tells us nothing, but now all we need is \mathrm{det}(A) = \mathrm{det}(B), which follows from the assumption on the characteristic polynomials.

By Fitting’s Lemma we can write M as the direct sum of a nilpotent matrix and an invertible matrix. I mess around with block decompositions for a few minutes: it looks very fiddly. Throwing some more technology at the problem, I decide we can assume M is diagonalizable (by density passing to the complex numbers), and in fact diagonal (by conjugation). Then the block decompositions look more tractable: putting

M = \left( \begin{matrix} D & 0 \\ 0 & 0 \end{matrix} \right),  A = \left( \begin{matrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{matrix} \right),  B = \left( \begin{matrix} B_{11} & B_{12} \\ B_{21} & B_{22} \end{matrix} \right)

we get

AM = \left( \begin{matrix} A_{11} D & 0 \\ A_{21} D & 0 \end{matrix} \right)  = \left( \begin{matrix} D B_{11} & DB_{12} \\ 0 & 0 \end{matrix} \right) = MB

so A_{21} = 0, B_{12} = 0 and A_{11}D = DB_{11} with D invertible. So we need

\left( \begin{matrix} A_{11} - DX_{11} & A_{12} - DX_{12} \\ 0 & A_{22} \end{matrix} \right),  \left( \begin{matrix} B_{11} - X_{11}D & 0 \\ B_{21} - X_{21} D & B_{22} \end{matrix} \right)

to have the same determinant. The determinants factorize: the contributions from the top left are equal by the invertible case. The characteristic polynomials also factorize, so, as in the zero case, \mathrm{det} A_{22} = \mathrm{det} B_{22}.

Final thoughts. All this took about two hours, with some breaks for chess / tea-making. I don’t feel inclined to go back to A5, so call it a day. I enjoyed doing A4, but otherwise it felt like a bit of a slog. Maybe I am too quick to reach for my trusty computer algebra system? Or maybe my experience also shows the increasing artificiality of time-limited, closed-book, no-internet, no-collaboration, no-technological-assistance tests such as the Putnam?