Differences of permutation characters of symmetric groups

July 13, 2016

Here are a few small observations on permutation characters motivated by discussions at the Herstmonceux conference Algebraic combinatorics and group actions. Let \chi^\lambda denote the irreducible character of the symmetric group S_n labelled by the partition \lambda of n. Let \pi^\lambda denote the permutation character of S_n acting on the cosets of the Young subgroup S_\lambda.

  • Since \pi^\lambda = \chi^\lambda + \phi where \phi is an integral linear combination of characters \chi^\mu for partitions \mu dominating \lambda, any irreducible character of S_n is an integral linear combination of the \pi^\lambda.

  • This can be made more explicit as follows. Let \delta = (n-1,\ldots,1,0). For \sigma \in S_n, let \lambda \cdot \sigma = (\lambda + \delta)\sigma - \delta, where \sigma acts on \lambda + \delta by place permutation. The Jacobi–Trudi formula states that

    \chi^\lambda = \sum_{\sigma \in S_n} \pi^{\lambda \cdot \sigma} \mathrm{sgn}(\sigma).

  • It is a special property of symmetric groups that any character is a difference of permutation characters. For example, the cyclic group C_n has \mathrm{d}(n) (the number of divisors of n) distinct permutation characters, but n irreducible characters, so unless n \le 2, the additive group generated by the permutation characters is a proper subgroup of the integral character ring. And clearly no irreducible character of a finite group taking a non-rational value can be an integral linear combination of permutation characters.

  • Since a transitive permutation character contains the trivial character exactly once, the trivial character is not a difference of two transitive permutation characters. A further example is given below.

Claim. \chi^{(n-1,1)} + \chi^{(1^n)} is not a difference of two transitive permutation characters of S_n.

Proof. Let \pi_K denote the permutation character of S_n acting on the subgroup K of S_n. Note that the sign character, \chi^{(1^n)}, appears in \pi_K if and only if K \le A_n.

Let s(K) be the number of orbits of K on \{1,\ldots, n\}. Since

\chi^{(n-1,1)} = \pi^{(n-1,1)} - 1_{S_n},

it follows from Frobenius reciprocity and Mackey’s Formula that \langle \pi_K, \chi^{(n-1,1)} \rangle = s-1. Consider \chi^{(2,1^{n-2})}. Multiplying the displayed equation above by \mathrm{sgn}, we get

\chi^{(2,1^{n-2})} = \mathrm{sgn}_{S_{n-1}}\uparrow^{S_n} - \mathrm{sgn}_{S_n}.

By Frobenius reciprocity

\langle \pi_K, \mathrm{sgn}_{S_{n-1}}\uparrow^{S_n} \rangle = \langle \pi_K \uparrow^{S_n} \downarrow_{S_{n-1}}, \mathrm{sgn}_{S_{n-1}} \rangle.

By Mackey’s Formula, the right-hand side is

\sum \langle \pi_K \uparrow_{K^g \cap S_{n-1}}^{S_{n-1}}, \mathrm{sgn}_{S_{n-1}} \rangle,

where the sum is over distinct double cosets K g S_{n-1}. It follows that \langle \pi_K, \chi^{(2,1^{n-2})} \rangle \le s(K)-1, with equality when K \le A_n.

Suppose that H and G are subgroups of S_n such that \pi_H - \pi_G contains \chi^{(n-1,1)} and \chi^{(1^n)}. From \chi^{(1^n)} we see that H \le A_n and G \not\le A_n. From \chi^{(n-1,1)}, we see that s(G) = s(H) -1. By the previous paragraph,

\langle \pi_H, \chi^{(2,1^{n-2})} \rangle = s(H)-1,


\langle \pi_G, \chi^{(n-2,1^{n-2})} \rangle \le s(G)-1 < s(H)-1.

Hence \chi^{(2,1^{n-2})} also appears in \pi_H -\pi_G. Therefore

\chi^{(n-1,1)} + \chi^{(1^n)}

is not a difference of two transitive permutation characters. \Box

  • Since \pi^{(n-r,r)} - \pi^{(n-r+1,r-1)} = \chi^{(n-r,r)} for 1 \le r \le n/2 (this is a special case of the Jacobi—Trudi formula) each \chi^{(n-r,r)} is a difference of two transitive permutation characters. Moreover, \chi^{(1^n)} = \pi_{A_n} - 1_{S_n}. Exhausting over all subgroups of S_n by computer algebra supports the conjecture that, provided n \ge 9, these are the only irreducible characters of S_n that are the difference of two transitive permutation characters.
  • The behaviour of S_6 is interesting: all irreducible characters except for \chi^{(3,2,1)} are the difference of two transitive permutation characters. This is explained in part by the outer automorphism that exists in this case.

What price Brexit?

June 18, 2016

On next Thursday it looks like I may well have to abandon at least one somewhat-cherished belief: (1) democracy basically works; (2) leaving the EU will be terrible, economically, socially and culturally, for the UK; (3) my own beliefs on this matter have no gaping inconsistencies. Still, at least the campaign will be over. I thought nothing could be more painful than the pro-Brexit leaflet I was sent last week, including a truly-beyond-parody map of Europe, suitably enlarged to include Iraq and Syria, coloured in a threatening shade of blood, until I saw Farage’s latest poster (no link).

At the time of writing (Saturday 18 June), the FT poll-of-polls has Remain on 43% and leave on 48%, with the remainder undecided. The best odds available from conventional bookmakers are 1:2 for remain and 7:4 for leave, corresponding to 66.7% and 36.4% (with a 3% profit margin). The Betfair prices are 1.51 for remain and 2.94 for leave, corresponding to 66.2% and 34.0%.

Here is a highly inexpert attempt to work out the implied odds from option prices. The GBP/USD exchange rate closed on Friday at 1.4358, having bounced back slightly from a low of 1.4114 on Tuesday. According to the minutes of the MPC meeting on Thursday, if the UK votes to leave the EU then ‘Sterling is also likely to depreciate further, perhaps sharply’. Maybe there is a code, known to experts, that translates phrases such as ‘depreciate further, perhaps sharply’, ‘drop significantly’, ’cause the end of civilisation as we know it’ into approximate percentages. For this post, I’ll assume that a leave vote means the rate is 1.44(1-e), and a remain vote means the rate is at least 1.44, at any time when this matters. Let p be the probability of a vote to leave. From now on, all prices are in dollars.

The most obvious way to hedge Brexit is to buy a put option with strike price 1.44, expiring after the referendum. According to these prices, a put option with strike 1.44 expiring in July costs 0.0536. So for about 0.05, I (or rather, an investment bank) can purchase the right to sell 1 GBP for 1.44 at some date (I’m not sure when) in July. This option is worthless after a vote to remain. By assumption, it pays off

1.44 - 1.44(1-e) = 1.44e

after a vote to leave. No-arbitrage implies that 1.44 ep = 0.0536, giving p = 0.0372/e. Taking e = 0.1, corresponding to a post-Brexit rate of 1.296, gives p = 0.372.

A slightly more sophisticated hedge also sells a put option at a lower price. By assumption, the rate does not go below 1.44(1-e), so selling a put option at this price gives some free extra cash. (Of course this contradicts no-arbitrage, but never mind.) Assuming that e \le 0.1, the pay-off is maximised by selling a put option with strike 1.295, netting 0.0069. No-arbitrage now implies that

1.44 ep = 0.0536-0.0069 = 0.0467,

giving p = 0.0324/e. The new hedging strategy makes Brexit more profitable, so the implied odds go down: if e =0.1 then p=0.324.

Going the other way, if any of this is to be believed, the betting markets predict a 10% drop in the pound on Brexit, and the opinion polls predict a 6% drop (0.061425 = 0.0324 \times (48 + 43)/48).

Post Brexit update, 9th July. At least one somewhat-cherished belief has duly been abandoned. The pound/dollar rate is now 1.2953, almost exactly 10% down.

Access structures and CNF versus DNF

June 9, 2016

Suppose there are n people, numbered from 1 up to n. Some subsets of these people are authorized to launch a nuclear strike. We suppose that the access structure is monotone: that is, if Y is an authorized set and Z contains Y, then Z is also authorized. Thus the set \Gamma of authorized sets is determined by its subset \Delta of minimal elements. For example, if n=4 and any set of at least three people is authorized, then \Delta = \big\{ \{1,2,3\}, \{1,2,4\}, \{1,3,4\}, \{2,3,4\} \bigr\} and \Gamma = \Delta \cup \big\{ \{1,2,3,4\} \bigr\}. If in addition, \{1,2\} is an authorized set then \Delta = \big\{ \{1,2\}, \{1,3,4\}, \{2,3,4\} \bigr\}.

Let x_i be true if Person i is present and false otherwise. Let S be the proposition ‘a nuclear strike can be launched’. Considering each minimal authorized set in turn, and assuming that any person present is happy to play their role in an epoch-defining nuclear strike, we see that

\displaystyle S \longleftrightarrow \bigvee_{X \in \Delta} \bigl( \bigwedge_{i \in X} x_i \bigr).

This expresses S in disjunctive normal form: a disjunction in which each term is a conjunction in the variables x_i and \neg x_i. In this example, corresponding to the monotonicity of the access structure, negated variables never appear.

It is an interesting, and maybe not completely obvious fact, that any proposition is logically equivalent to one in disjunctive normal form. Roughly, one proof goes as follows: consider all true/false assignments to the variables x_1, \ldots, x_n that make a given proposition P true. Then P is logically equivalent to ‘at least one of these assignments holds’, which is easily written in DNF.

For instance, in the first example above, the assignments that make S true are TTTT, FTTT, TFTT, TTFT, TTTF, and S is logically equivalent to

S' = Q \vee P_1 \vee P_2 \vee P_3 \vee P_4

where Q is x_1 \wedge x_2 \wedge x_3 \wedge x_4, P_1 is \neg x_1 \wedge x_2 \wedge x_3 \wedge x_4, P_2 is x_1 \wedge \neg x_2 \wedge x_3 \wedge x_4, and so on. A disjunction, like S' above, in which every clause is of the form y_1 \wedge y_2 \wedge \cdots \wedge y_n, where each y_i is either x_i or \neg x_i, is said to be in full disjunctive normal form. Since there are 2^n such clauses, and taking the disjunction over different subsets of them gives logically inequivalent propositions, there are exactly 2^{2^n} propositional formulae in the variables x_i, up to logical equivalence. (The empty disjunction is false by convention.)

The Dedekind number D_n is the number of such formulae not involving any \neg x_i. Thus D_n is the number of monotone boolean functions in n variables, or the number of monotone access structures with n people, or (by the minimal authorized subsets interpretation), the number of antichains in the poset of subsets of \{1,2,\ldots, n\}, ordered by inclusion. The OEIS link gives several further combinatorial interpretations. The sequence of Dedekind numbers, for n \in \mathbb{N}, begins 2, 3, 6, 20, 168, 7581, 7828354. No exact formula is known (or likely ever to be known), but the antichain interpretation makes it plausible that

\log_2 \displaystyle D_n \sim \binom{n}{n/2} \sim \sqrt{\frac{2}{\pi}} \frac{2^{n}}{\sqrt{n}},

and this was first proved by Kleitman.

Negating formulae in DNF, it follows from de Morgan’s laws that any proposition is equivalent to a conjunction of clauses, each of the form y_1 \vee y_2 \vee \cdots \vee y_n, where each y_i is either x_i or \neg x_i. (The empty conjunction is true by convention.) For monotonic boolean functions this gives a normal form in which every x_i appears negated. The access structure interpretation makes an alternative conjunctive form apparent: for each non-authorized set Y \not\in \Gamma, we require that at least one person its complement is present. Clearly it suffices to consider the maximal such Y. Thus

\displaystyle S \longleftrightarrow \bigwedge_{X \in \Delta} \bigwedge_{i \in X} \bigl( \bigvee_{j \in X' \cup \{i\}} x_j \bigr).

Taking the special case where the minimal authorized sets are all r-subsets of the n people we get the appealing

\displaystyle \bigvee_{X \subseteq \{1,\ldots, n\} \atop |X| = r} \bigl( \bigwedge_{i \in X} x_i \bigr) \longleftrightarrow \bigwedge_{Y \subseteq \{1,\ldots, n\} \atop |Y| = n-r+1} \bigl( \bigvee_{j \in Y} x_j \bigr).

For a direct proof, just note that at least r people are present if and only if at most n-r people are absent, so if and only if at least one person from every subset of size n-r+1 is present.

LMS Education Day 2016

May 24, 2016

The day started with Jane White and Paola Iannone on Assessment practices and learning mathematics. There was some discussion of the rationale for different forms of assessment, and then Paola reported on an interesting survey that asked students to rate different forms of assessment by (a) their personal preference; (b) their perceived ability to discriminate. For mathematics students there was some correlation. The traditional closed book exam was the most preferred option, and also regarded as having the greatest discriminatory power.

From interviews with students several other themes emerged: students dislike coursework, knowing how easy it is plagiarise (it was quickly pointed out that plagiarising successfully is not so easy), and group assessment, perhaps because of the ‘free-rider’ problem. They value fairness highly, and were surprisingly open to unusual forms of assessment. Several suggested short oral exams would give them a fair chance to show what they could do. Paola reported that vivas are used instead of one problem sheet in a course at UEA. The workload is roughly equivalent, but shifts from Ph.D markers to the lecturer.

The next talk was Kevin Houston and Mike Robinson on The flipped classroom. Kevin talked about flipping a first term ‘Number Systems’ course at Leeds. I lectured a similar course at Royal Holloway for three years, with Kevin’s book as required additional reading, and most recently gave a ‘partially flipped’ follow on course on Matrix Algebra, so naturally my ears pricked up. Before flipping, Kevin’s course had 10 weeks of 1 hour lectures, with a 1 hour weekly tutorial in groups of 10 to 13. After flipping, it has two 1 hour lectures, on Monday and Fridays. On Monday he lectured in his usual style; on Friday students were expected to have worked through written notes in advance, and the hour was spent solving problems and reviewing the key ideas in proofs. (I’m not sure if Kevin fully adopted Eric Mazur’s peer instruction cycle.) He used socrative to run quizzes in lectures, relying on students to own a smartphone: they all did. I’d be worried about students getting distracted, but there are clearly advantages over clickers: for instance, students can submit short text responses, so questions do not have to be purely multiple choice.

Students were encouraged to do the preliminary work by an assessed online quiz, available before each Friday lecture. Full exam credit was available for any result above 50%, with multiple attempts permitted — this seems a neat way to provide some additional motivation, while keeping an essentially formative character to the assessment. Exam results were ‘comparable to previous years on a slightly harder course’. Student engagement was significantly improved. Kevin commented on the high workload in preparing the new course for the first time, and the need for very accurate timing to get students to the right point on Monday evening.

Mike then talked about his experience flipping a second year course ‘Dynamical systems and Fourier analysis’. Before flipping, this course had a 1 hour lecture, and 1 hour practical workshop in a PC Lab. Post flipping, it had a single 2 hour class, held in a room with plentiful laptop computers. The preliminary work was to watch short online videos prepared by Mike. The advantage of online videos over lectures, that the viewer can pause at any moment, was appreciated by the students. They also observed, however, that there was no immediate opportunity to ask questions to the lecturer or their peers. Mike reported on a survey, addressed only to serial non-attenders at this course: a typical non-attender missed classes for reasons unrelated to the courses, but a few, who admitted not doing the preliminary work, felt their lack of preparation would be exposed. Another survey showed the new course was very popular with students, with many seeing only advantages to the new format.

I would like to know whether strong or weak students benefit more from the flipped classroom. My guess is that flipping can help weak but hard-working students to get somewhere, rather than nowhere, but is maybe of less benefit to stronger students. And under any system there will be a core of students who’ll fail no matter what we do. Kevin made the interesting point that many students adopt a pragmatic attitude and aim to do just enough in any given course. So it is unrealistic to expect huge improvements in exam performance, even from successful interventions.

All speakers were enthusiastic about their experience; both Kevin and Mike felt they ‘hadn’t flipped enough’. A recurring comment was that, post flipping, they had more meaningful interaction with the students. Paola commented that it was often surprising how much students turned out to know. I agree. Going the other way, I’m now only rarely surprised by the many essential things (definitions, for example) that pass them by, causing endless stumbling blocks. The flipped classroom may create — at least in the short term — more work, and is probably not suited to everyone, but it offers a way to tackle these problems head on.

The final talk was by Peter Giblin and Alice Rogers on The Teaching Excellence Framework, HE White Paper, and Quality Assurance. Peter gave an excellent summary of the white paper (he said slides would be available from the LMS website), highlighting the somewhat strange way in which the TEF is expected to be introduced, at the institutional level very soon (but with everyone who agrees to play guaranteed at least the minimum ‘Meets Expectation’ rating), and at the discipline level only by 2019/2020. Peter quoted one particularly discouraging requirement, buried in the additional technical documentation, that submissions should ‘avoid focusing on successful but highly localised practices’. The whole game seems to be a bureaucrat’s dream and an academic’s nightmare.

There was much uncertainty about the way departments would be assessed, and considerable debate about what, if any, benchmarks could fairly be used. On this note, the day ended.

Variance of an estimate for the second moment

May 4, 2016

Suppose there are u urns, numbered from 1 up to u containing in total b balls, numbered from 1 up to b. Let p_i be the proportion of balls in i. Let s be the probability that if we pick two numbers in \{1,\ldots,b\} uniformly at random, then the balls with these numbers lie in the same urn. Clearly s = \sum_{i=1}^u p_i^2.

The honest way to estimate s follows the definition by repeatedly picking pairs (b,b') of balls, and counting the number of ‘hits’ where b and b' are in the same urn. For example, suppose that each urn contains equally many balls, so p_i = 1/u for each i. Then our estimate for s = 1/u after N samples is H/N, where the number of hits H has distribution \mathrm{Bin}(N,1/u). As expected, \mathbb{E}[H/N] = 1/u = s.

An alternative is to choose N ball numbers uniformly at random, and for each ball, note down its urn number. For each i \in \{1,\ldots, u\}, let B_i be the random variable counting the number of balls lying in urn i. We then estimate s by the statistic

S = \sum_{i=1}^u \bigl( \frac{B_i}{N} \bigr)^2.

By the well known formula

\mathrm{Var} X = \mathbb{E}[(X -\mathbb{E}X)^2] = \mathbb{E}[X^2] - \mathbb{E}[X]^2,

we get

\begin{aligned}  \mathbb{E}[S] &= \frac{1}{N^2} \mathbb{E}\bigl[ \sum_{i=1}^u (B_i-\mathbb{E}[B_i])^2\bigr] + \sum_{i=1}^u \mathbb{E}[B_i]^2) \\  &= \frac{1}{N^2} \sum_{i=1}^u \mathbb{E}[(B_i - \mathbb{E}[B_i])^2]  + s\\ &= \frac{1}{N^2} \sum_{i=1}^u \mathrm{Var} B_i + s.  \end{aligned}

Thus S is a biased estimator for s. Since B_i has distribution \mathrm{Bin}(N, p_i), we have \mathrm{Var}(B_i) = Np_i(1-p_i) and so

\mathbb{E}[S] = p + \frac{1}{N^2} \sum_{i=1}^U Np_i(1-p_i) = s + \frac{1}{N} - \frac{s}{N}.

Since s is small compared to 1, a good correction for the systematic error can be made by subtracting 1/N. All the same, it might well seem sampling by pairs is the simplest and best approach. But this ignores the variability in our estimates.

Again consider the case where p_i = 1/u for each i. Then, as already seen, H has distribution \mathrm{Bin}(N,1/u) and so \mathrm{Var}[H/N] = \mathrm{Var}[H]/N^2 = N(1/u)(1-1/u)/N^2 \approx 1/Nu. The standard deviation in H/N is therefore 1/\sqrt{Nu}. In particular, to improve the accuracy of our estimate by a factor f, we must increase the number of samples by f^2.

For S, we extend the ‘orthogonality trick’ used earlier to express \mathbb{E}[S] in terms of the variances of the B_i, to get

\begin{aligned}  S &= \sum_{i=1}^u \frac{B_i^2}{N^2} \\  &= \sum_{i=1}^u \Bigl( \bigl( \frac{B_i}{N} - \frac{1}{u} \bigr)^2  + \frac{2B_i}{Nu} - \frac{1}{u^2} \Bigr) \\  &= \sum_{i=1}^u \bigl( \frac{B_i}{N} - \frac{1}{u} \bigr)^2 + \frac{2}{u} - \frac{1}{u} \\  &= \sum_{i=1}^u \bigl( \frac{B_i}{N} - \frac{1}{u} \bigr)^2 - \frac{1}{u}  \end{aligned}
where the penultimate line uses \sum_{i=1}^u B_i = N.

By the Central Limit Theorem, \sqrt{uN}\bigl( \frac{B_i}{N} - \frac{1}{u} \bigr) is approximately normally distributed with mean 0 and variance uN \mathrm{Var}[B_i/N] \approx 1. While the B_i are not independent, it still seems reasonable to approximate

\hat{S} = uN \sum_{i=1}^u \bigl(  \frac{B_i}{N} - \frac{1}{u} \bigr)^2

as the sum of the squares of u standard normal variables, i.e. as the \chi^2 distribution with u degrees of freedom. This distribution has variance 2u, so we have \mathrm{Var}[\hat{S}] \approx 2u. Undoing the scaling we get

\mathrm{Var}[S] \approx \frac{2u}{u^2N^2} = \frac{2}{uN^2} .

The standard deviation in S is then \frac{1}{N} \sqrt{\frac{2}{u}}. Therefore increasing number of samples by a factor of f reduces the standard deviation by the same factor. Moreover, we use fewer samples: N rather than 2N.


Data from simulation suggest these approximations work well in practice. For instance, with u = 100 urns and N = 1000 samples, the mean of H/N over 1000 repetitions of the sampling experiment was 0.1004, with standard deviation 3.0417 \times 10^{-3}; the mean of S/N was 0.010990 \approx 1/u + 1/N, with standard deviation 1.4101 \times 10^{-4}. Increasing the number of samples to 10000 reduced the standard deviations for H/N and S/N to 1.0114 \times 10^{-3} and 1.4176 \times 10^{-5}, respectively, agreeing with the prediction above. For more skewed distributions of balls into urns the standard deviation of S/N is still far less than for H/N, but the improvement on increasing the number of samples is less dramatic.

Modular representations that look ordinary

March 23, 2016

Let G be a finite group and let (K, \mathcal{O}, k) be a p-modular system for G. Thus K is a field of characteristic zero sufficiently large that if V is any irreducible representation of G then \mathrm{End}_{KG}(V) \cong K and K contains all the eigenvalues of the matrices representing elements of G in their action on V, \mathcal{O} is an integrally closed subring of K containing all these eigenvalues, and k is a field of prime characteristic p obtained by quotienting \mathcal{O} by a maximal ideal \mathfrak{m}, necessarily lying above \langle p \rangle \unlhd \mathcal{O}. By localizing if necessary, we may assume that all elements of \mathcal{O} not in \mathfrak{m} are invertible. So \mathcal{O} is a local ring with unique maximal ideal \mathfrak{m}.

This is a standard setup for modular representation theory. The main purpose of this proof is to prove Theorem 2 below, which gives a precise description of a family of representations that look substantially the same, whether defined over K, \mathcal{O} or k.

As a warm-up we prove the following result on ordinary characters.

Proposition 1. Let \chi be an irreducible character of G such that \chi(1) is divisible by the highest power of p dividing |G|. If g \in G is a p-element then \chi(g) = 0.

We need the following fact: the central primitive idempotent in KG corresponding to \chi is

e_\chi = \frac{\chi(1)}{|G|} \sum_{g \in G} \chi(g^{-1}) g.

Note that the coefficients of e_\chi are contained in \mathcal{O}.

Proof of Proposition 1. Let P be a Sylow p-subgroup of G containing g. Since

(1) \qquad \mathcal{O}G =  e_\chi \mathcal{O}G \oplus (1-e_\chi) \mathcal{O}G,

considered as a right \mathcal{O}G-module, e_\chi \mathcal{O}G is projective. Hence the restriction of e_\chi \mathcal{O}G to \mathcal{O}P is also projective. Now \mathcal{O}P is indecomposable as a right \mathcal{O}P-module, since any direct sum decomposition induces a direct sum decomposition of the indecomposable kP-module kP by reduction modulo \mathfrak{m}. Hence the unique projective indecomposable \mathcal{O}P-module is \mathcal{O}P itself and e_\chi \mathcal{O}G\downarrow_P \cong \mathcal{O}P \oplus \cdots \oplus \mathcal{O}P. Restricting further we get

(2)\qquad e_\chi \mathcal{O}G \downarrow_{\langle g \rangle} \cong \mathcal{O} \langle g \rangle^{\oplus \chi(1)/\mathrm{ord}(g)}.

From the canonical basis for the right-hand we see that the character of g acting on e_\chi \mathcal{O}G is zero. But, by Wedderburn’s Theorem, e_\chi KG is isomorphic to the direct sum of \chi(1) copies of a simple KG-module with character \chi. Hence \chi(1)\chi(g) = 0.\qquad\Box

I find it quite striking that the modular theory gives a short proof of a result stated entirely in the language of ordinary (i.e. characteristic zero) representation theory. Moreover, from (2), we can easily get a stronger result: \chi(g) = 0 whenever g has order divisible by p.

Outline proof. let g have order p^a c where c is not divisible by p, let g_p = g^c, and let g_{p'} = g^{p^a}. Take a direct sum decomposition of e_\chi KG into eigenspaces for the action of g_{p'}. By (2), g_p has trace zero in its action on each eigenspace, hence so does g.

Proposition 1 is also a corollary of the following theorem. My first version of the proof did not make it entirely clear that e_\chi \mathcal{O}G is a direct sum of isomorphic \mathcal{O}G-modules, each simple as a KG-module. To repair the gap, I borrowed the proof of (1) \implies (2) on page 162 of this recent book by Peter Webb.

Theorem 2. Let \chi be an irreducible character of G such that \chi(1) is divisible by the highest power of p dividing |G|. There is a projective \mathcal{O}G-module M with ordinary character \chi such that M/\mathfrak{m}M is a simple projective kG-module.

Proof. Let M be a \mathcal{O}G-module such that M_K = M \otimes_\mathcal{O} K is a simple KG-module. By Wedderburn’s Theorem, e_\chi KG is isomorphic to \mathrm{End}_K(M_K), thought of as the complete matrix algebra of \chi(1) \times \chi(1) matrices with coefficients in K. Let \rho : KG \rightarrow \mathrm{End}_K(M_K) be the corresponding ring homomorphism, with kernel (1-e_\chi)KG. The restriction of \rho to \mathcal{O}G has kernel (1-e_\chi)\mathcal{O}G. So it is sufficient to prove that the image of \rho : \mathcal{O}G \rightarrow \mathrm{End}_K(M_K) is \mathrm{End}_O(M), thought of as the complete matrix ring of \chi(1) \times \chi(1) matrices with coefficients in \mathcal{O}. Since M is a \mathcal{O}G-module, the matrix coefficients lie in \mathcal{O}.

The surjectivity of the restriction of \rho follows from the following ‘averaging’ formula which expresses an arbitrary \phi \in \mathrm{End}_\mathcal{O}(M) as something in its image:

\phi = \frac{\chi(1)}{|G|} \sum_{g \in G} \mathrm{Tr}_M\bigl( \rho(g^{-1})  \phi\bigr) \rho(g).

Subproof. Since the ring homomorphism \rho: KG \rightarrow \mathrm{End}_K(M) is surjective, it is sufficient to prove this when \phi = \rho(x) with x \in G; the right-hand side is

\rho(x) \frac{\chi(1)}{|G|} \sum_{g\in G}  \chi(g^{-1} x) \rho(x^{-1} g) = \rho(x) \rho(e_\chi) = \rho(x),

as required.

We can now think of the right \mathcal{O}G-module M very concretely, as those d \times d matrices with coefficients in \mathcal{O} that are zero outside their first row. Reducing mod \mathfrak{m}, we obtain M/\mathfrak{m}M; this is the unique simple module (up to isomorphism) for \mathrm{Mat}_{d \times d}(k). Hence M/\mathfrak{m}M is a simple projective kG-module, as claimed. \qquad\Box

Cheating ones way through linear algebra

March 5, 2016

Here are a few arguments that are possibly useful to prove things when one hasn’t set up all the usual machinery of vector spaces, subspaces and dimensions, presented in decreasing order of how seriously that could be used in a first course on matrix algebra. Since it fits well with the syllabus for the first year course I’m lecturing this term, I’m going to assume reduced row-echelon form is available. Throughout all matrices have coefficients in a field F.

Invertible if and only if injective by row operations

Given a matrix B in reduced row-echelon form it is almost trivial to write down all solutions to B x = 0. There are non-zero solutions if and only if not every column of B contains a pivot. For example, the solutions to

\left( \begin{matrix} 1 & 2 & 0 & 3 \\ 0 & 0 & 1 & 4 \\ 0 & 0 & 0 & 0 \end{matrix}\right) \left( \begin{matrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{matrix} \right) = \left( \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \right)


\left\{ \left( \begin{matrix} -2x_2-3x_4 \\ x_2 \\ -4x_4 \\ x_4 \end{matrix} \right) : x_2, x_4 \in F \right \};

the pivot column variables are determined by the non-pivot variables, which may be chosen freely.

Let f : F^n \rightarrow F^n be an invertible linear map represented by the matrix A. Obviously if f is invertible then f is injective. Conversely, suppose that f is injective and that A has reduced row-echelon form B, so B = QA for some invertible matrix Q. If B has a zero row then it has at most n-1 pivots, so by the remark above, B has a non-zero kernel. Therefore B has n pivots and so is the identity matrix. Hence I = QA. Now this only shows that A has a left inverse, but since Q is a product of elementary matrices (corresponding to row operations), and each such elementary matrix is easily seen to be invertible (in the usual sense), Q is invertible and so A = Q^{-1} is invertible.

Very similarly one can show that f is invertible if and only if f is surjective.

Linearly dependent if too many vectors

If A is the (n+1) \times n matrix whose rows are the components of n+1 vectors in F^n (expressed in the standard basis) then the reduced row echelon form of A has at most n pivot columns, and so the bottom row is zero. Correspondingly an F-linear combination of the rows of A is zero.

Invertible if and only if injective by disguised minimal polynomial and Primary Decomposition Theorem

The vector space of all linear endomorphisms of F^n is n^2 dimensional. By the previous result there is a non-trivial linear dependency between the powers of A. Let c_0 I + c_1 A +\cdots + c_d A^d be a relation with d minimal. If c_0\not=0 then formally multiplying through by c_0^{-1}A^{-1} gives

-c_0^{-1}c_1 I - c_0^{-1}c_2 A - \cdots - c_0^{-1}c_dA^{d-1}

as a plausible (and clearly correct) guess for the inverse of A. If c_0 = 0 then A^r g(A) = 0 where r \in \mathbb{N} and g has degree d-r. By minimality of d, we have g(A) \not=0; if w = g(A)v \not=0 then A^r w = 0, so, choosing s \in \mathbb{N}_0 least such that A^s w\not =0, we find that \ker A \not= 0. So either f is invertible, or \ker f \not=0, and so f is not injective.

Row rank is column rank

William P. Wardlaw published a very short proof in Mathematics Magazine (2005) 78 316—318. Let A be a non-zero n \times n matrix. Let r be minimal such that there exists an n \times r matrix P and an r \times n matrix Q such that PQ = A. By minimality the rows of Q are linearly independent, and clearly the row span of Q contains the row span of A. Hence r is the row rank of A. Dually, r is the column rank of A.

Linear dependencies by pigeonhole principle

Take n+1 vectors v_1, \ldots, v_{n+1} in \mathbb{F}_p^n. There are p^{n+1} linear combinations a_1v_1 + \cdots + a_{n+1}v_{n+1} of these vectors, but only p^n elements of \mathbb{F}_p^n, so two linear combinations with different coefficients are equal.

As I learned from this MathOverflow answer, one can make this argument work in \mathbb{Z}^n, and so in \mathbb{Q}^n by scaling to remove denominators. Let n+1 non-zero elements of \mathbb{Z}^n be given, and let N be the largest component in absolute value. There are (2M+1)^{n+1} \mathbb{Z}-linear combinations with coefficients in \{-M,\ldots,M\}, each of which has components in \{-MN, \ldots, MN\}. The pigeonhole principle strikes as soon as (2MN+1)^n < (2M+1)^{n+1}; clearly this is true for large enough M. For instance, since (2MN+1)^n < 3^n N^n M^n and 2^n M^n M < (2M+1)^{n+1}, it is sufficient if (3/2)^n N^n < M.

Question. Is there a trick to extend this argument to arbitrary fields?

Modules over general rings

Yiftach Barnea pointed out to me that if A is an n \times n matrix with entries in a principal ideal domain R then, regarded as a linear map R^n \rightarrow R^n, if A is surjective then A is injective. One proof (to my eyes, essentially the same as using the structure theorem for modules over a PID) is to use row and column operations to reduce to the case where A is a diagonal matrix: then all the diagonal entries of A must be invertible in R, so A is itself invertible, and so injective. Clearly the converse holds if and only if R is a field.

Suppose that R is a unital ring containing a non-zero element b with a left inverse. Then multiplication by b, i.e. r \mapsto br, is an injective linear map. This map is surjective if and only if b has a right inverse. So for general rings, neither implication holds.


Is there a nice characterization of rings R such that if f : R^n \rightarrow R^n is a surjective R-linear map then f is injective?

Partial answer.

My Ph.D student Bill O’Donovan told me about Hopfian objects. By definition, an R-module M is Hopfian if and only if it has the property in the question, i.e. any surjective R-module endomorphism of M is injective.

A related property is that M is strongly Hopfian if and only if whenever f : M \rightarrow M is an R-module homomorphism, the kernel chain

\mathrm{ker} f \subseteq \mathrm{ker} f^2 \subseteq \mathrm{ker} f^3 \ldots

stabilises. The relevant implication follows from the standard proof of Fitting’s lemma.

Claim. Strongly Hopfian implies Hopfian.

Proof. Suppose that \mathrm{ker} f^n = \mathrm{ker} f^{n+1}. Since f restricts to a map \mathrm{ker} f^i \rightarrow \mathrm{ker} f^{i-1} for any i \ge 1, we see that f preserves \mathrm{ker} f^n, and acts on this space as a nilpotent map of degree n. Let u \in \mathrm{ker} f^n. Since f is surjective, u = f^n v for some v \in M, and by hypothesis, f^n u = 0. Hence f^{2n} v = 0. But \mathrm{ker} f^{2n} = \mathrm{ker} f^n, so f^n v = 0, hence u=0. Therefore \ker f \subseteq \mathrm{ker} f^n = \{ 0 \} and f is injective.

In particular, any Noetherian ring is Hopfian as a module for itself. There are Hopfian modules that are not strongly Hopfian modules, for example Proposition 2.7(4) in this paper by A. Hmaimou, A. Kaidi and E. Sánchez Campos gives the example of the \mathbb{Z}-module \oplus (\mathbb{Z}/p_n \mathbb{Z}^n), where p_n is the nth prime; the kernel chain for the endomorphism acting nilpotently with degree n on the nth summand does not stabilise, but since any endomorphism preserves the summands, the module is Hopfian.

I asked on MathOverflow is there was a ring R and a Hopfian R-module M such that M \oplus M was not Hopfian, and my worst suspicions were confirmed.


Get every new post delivered to your Inbox.