## Differences of permutation characters of symmetric groups

July 13, 2016

Here are a few small observations on permutation characters motivated by discussions at the Herstmonceux conference Algebraic combinatorics and group actions. Let $\chi^\lambda$ denote the irreducible character of the symmetric group $S_n$ labelled by the partition $\lambda$ of $n$. Let $\pi^\lambda$ denote the permutation character of $S_n$ acting on the cosets of the Young subgroup $S_\lambda$.

• Since $\pi^\lambda = \chi^\lambda + \phi$ where $\phi$ is an integral linear combination of characters $\chi^\mu$ for partitions $\mu$ dominating $\lambda$, any irreducible character of $S_n$ is an integral linear combination of the $\pi^\lambda$.

• This can be made more explicit as follows. Let $\delta = (n-1,\ldots,1,0)$. For $\sigma \in S_n$, let $\lambda \cdot \sigma = (\lambda + \delta)\sigma - \delta$, where $\sigma$ acts on $\lambda + \delta$ by place permutation. The Jacobi–Trudi formula states that

$\chi^\lambda = \sum_{\sigma \in S_n} \pi^{\lambda \cdot \sigma} \mathrm{sgn}(\sigma).$

• It is a special property of symmetric groups that any character is a difference of permutation characters. For example, the cyclic group $C_n$ has $\mathrm{d}(n)$ (the number of divisors of $n$) distinct permutation characters, but $n$ irreducible characters, so unless $n \le 2$, the additive group generated by the permutation characters is a proper subgroup of the integral character ring. And clearly no irreducible character of a finite group taking a non-rational value can be an integral linear combination of permutation characters.

• Since a transitive permutation character contains the trivial character exactly once, the trivial character is not a difference of two transitive permutation characters. A further example is given below.

Claim. $\chi^{(n-1,1)} + \chi^{(1^n)}$ is not a difference of two transitive permutation characters of $S_n$.

Proof. Let $\pi_K$ denote the permutation character of $S_n$ acting on the subgroup $K$ of $S_n$. Note that the sign character, $\chi^{(1^n)}$, appears in $\pi_K$ if and only if $K \le A_n$.

Let $s(K)$ be the number of orbits of $K$ on $\{1,\ldots, n\}$. Since

$\chi^{(n-1,1)} = \pi^{(n-1,1)} - 1_{S_n},$

it follows from Frobenius reciprocity and Mackey’s Formula that $\langle \pi_K, \chi^{(n-1,1)} \rangle = s(K)-1$. Consider $\chi^{(2,1^{n-2})}$. Multiplying the displayed equation above by $\mathrm{sgn}$, we get

$\chi^{(2,1^{n-2})} = \mathrm{sgn}_{S_{n-1}}\uparrow^{S_n} - \mathrm{sgn}_{S_n}.$

By Frobenius reciprocity

$\langle \pi_K, \mathrm{sgn}_{S_{n-1}}\uparrow^{S_n} \rangle = \langle \pi_K \uparrow^{S_n} \downarrow_{S_{n-1}}, \mathrm{sgn}_{S_{n-1}} \rangle.$

By Mackey’s Formula, the right-hand side is

$\sum \langle \pi_K \uparrow_{K^g \cap S_{n-1}}^{S_{n-1}}, \mathrm{sgn}_{S_{n-1}} \rangle,$

where the sum is over distinct double cosets $K g S_{n-1}$. It follows that $\langle \pi_K, \chi^{(2,1^{n-2})} \rangle \le s(K)-1$, with equality when $K \le A_n$.

Suppose that $H$ and $G$ are subgroups of $S_n$ such that $\pi_H - \pi_G$ contains $\chi^{(n-1,1)}$ and $\chi^{(1^n)}$. From $\chi^{(1^n)}$ we see that $H \le A_n$ and $G \not\le A_n$. From $\chi^{(n-1,1)}$, we see that $s(G) = s(H) -1$. By the previous paragraph,

$\langle \pi_H, \chi^{(2,1^{n-2})} \rangle = s(H)-1,$

whereas

$\langle \pi_G, \chi^{(n-2,1^{n-2})} \rangle \le s(G)-1 < s(H)-1.$

Hence $\chi^{(2,1^{n-2})}$ also appears in $\pi_H -\pi_G$. Therefore

$\chi^{(n-1,1)} + \chi^{(1^n)}$

is not a difference of two transitive permutation characters. $\Box$

• Since $\pi^{(n-r,r)} - \pi^{(n-r+1,r-1)} = \chi^{(n-r,r)}$ for $1 \le r \le n/2$ (this is a special case of the Jacobi—Trudi formula) each $\chi^{(n-r,r)}$ is a difference of two transitive permutation characters. Moreover, $\chi^{(1^n)} = \pi_{A_n} - 1_{S_n}$. Exhausting over all subgroups of $S_n$ by computer algebra supports the conjecture that, provided $n \ge 9$, these are the only irreducible characters of $S_n$ that are the difference of two transitive permutation characters.
• The behaviour of $S_6$ is interesting: all irreducible characters except for $\chi^{(3,2,1)}$ are the difference of two transitive permutation characters. This is explained in part by the outer automorphism that exists in this case.

## What price Brexit?

June 18, 2016

On next Thursday it looks like I may well have to abandon at least one somewhat-cherished belief: (1) democracy basically works; (2) leaving the EU will be terrible, economically, socially and culturally, for the UK; (3) my own beliefs on this matter have no gaping inconsistencies. Still, at least the campaign will be over. I thought nothing could be more painful than the pro-Brexit leaflet I was sent last week, including a truly-beyond-parody map of Europe, suitably enlarged to include Iraq and Syria, coloured in a threatening shade of blood, until I saw Farage’s latest poster (no link).

At the time of writing (Saturday 18 June), the FT poll-of-polls has Remain on 43% and leave on 48%, with the remainder undecided. The best odds available from conventional bookmakers are 1:2 for remain and 7:4 for leave, corresponding to 66.7% and 36.4% (with a 3% profit margin). The Betfair prices are 1.51 for remain and 2.94 for leave, corresponding to 66.2% and 34.0%.

Here is a highly inexpert attempt to work out the implied odds from option prices. The GBP/USD exchange rate closed on Friday at 1.4358, having bounced back slightly from a low of 1.4114 on Tuesday. According to the minutes of the MPC meeting on Thursday, if the UK votes to leave the EU then ‘Sterling is also likely to depreciate further, perhaps sharply’. Maybe there is a code, known to experts, that translates phrases such as ‘depreciate further, perhaps sharply’, ‘drop significantly’, ’cause the end of civilisation as we know it’ into approximate percentages. For this post, I’ll assume that a leave vote means the rate is $1.44(1-e)$, and a remain vote means the rate is at least $1.44$, at any time when this matters. Let $p$ be the probability of a vote to leave. From now on, all prices are in dollars.

The most obvious way to hedge Brexit is to buy a put option with strike price 1.44, expiring after the referendum. According to these prices, a put option with strike 1.44 expiring in July costs 0.0536. So for about 0.05, I (or rather, an investment bank) can purchase the right to sell 1 GBP for 1.44 at some date (I’m not sure when) in July. This option is worthless after a vote to remain. By assumption, it pays off

$1.44 - 1.44(1-e) = 1.44e$

after a vote to leave. No-arbitrage implies that $1.44 ep = 0.0536$, giving $p = 0.0372/e$. Taking $e = 0.1$, corresponding to a post-Brexit rate of 1.296, gives $p = 0.372$.

A slightly more sophisticated hedge also sells a put option at a lower price. By assumption, the rate does not go below $1.44(1-e)$, so selling a put option at this price gives some free extra cash. (Of course this contradicts no-arbitrage, but never mind.) Assuming that $e \le 0.1$, the pay-off is maximised by selling a put option with strike 1.295, netting 0.0069. No-arbitrage now implies that

$1.44 ep = 0.0536-0.0069 = 0.0467,$

giving $p = 0.0324/e$. The new hedging strategy makes Brexit more profitable, so the implied odds go down: if $e =0.1$ then $p=0.324$.

Going the other way, if any of this is to be believed, the betting markets predict a 10% drop in the pound on Brexit, and the opinion polls predict a 6% drop ($0.061425 = 0.0324 \times (48 + 43)/48$).

Post Brexit update, 9th July. At least one somewhat-cherished belief has duly been abandoned. The pound/dollar rate is now 1.2953, almost exactly 10% down.

## Access structures and CNF versus DNF

June 9, 2016

Suppose there are $n$ people, numbered from $1$ up to $n$. Some subsets of these people are authorized to launch a nuclear strike. We suppose that the access structure is monotone: that is, if $Y$ is an authorized set and $Z$ contains $Y$, then $Z$ is also authorized. Thus the set $\Gamma$ of authorized sets is determined by its subset $\Delta$ of minimal elements. For example, if $n=4$ and any set of at least three people is authorized, then $\Delta = \big\{ \{1,2,3\}, \{1,2,4\}, \{1,3,4\}, \{2,3,4\} \bigr\}$ and $\Gamma = \Delta \cup \big\{ \{1,2,3,4\} \bigr\}$. If in addition, $\{1,2\}$ is an authorized set then $\Delta = \big\{ \{1,2\}, \{1,3,4\}, \{2,3,4\} \bigr\}$.

Let $x_i$ be true if Person $i$ is present and false otherwise. Let $S$ be the proposition ‘a nuclear strike can be launched’. Considering each minimal authorized set in turn, and assuming that any person present is happy to play their role in an epoch-defining nuclear strike, we see that

$\displaystyle S \longleftrightarrow \bigvee_{X \in \Delta} \bigl( \bigwedge_{i \in X} x_i \bigr).$

This expresses $S$ in disjunctive normal form: a disjunction in which each term is a conjunction in the variables $x_i$ and $\neg x_i$. In this example, corresponding to the monotonicity of the access structure, negated variables never appear.

It is an interesting, and maybe not completely obvious fact, that any proposition is logically equivalent to one in disjunctive normal form. Roughly, one proof goes as follows: consider all true/false assignments to the variables $x_1, \ldots, x_n$ that make a given proposition $P$ true. Then $P$ is logically equivalent to ‘at least one of these assignments holds’, which is easily written in DNF.

For instance, in the first example above, the assignments that make $S$ true are $TTTT, FTTT, TFTT, TTFT, TTTF$, and $S$ is logically equivalent to

$S' = Q \vee P_1 \vee P_2 \vee P_3 \vee P_4$

where $Q$ is $x_1 \wedge x_2 \wedge x_3 \wedge x_4$, $P_1$ is $\neg x_1 \wedge x_2 \wedge x_3 \wedge x_4$, $P_2$ is $x_1 \wedge \neg x_2 \wedge x_3 \wedge x_4$, and so on. A disjunction, like $S'$ above, in which every clause is of the form $y_1 \wedge y_2 \wedge \cdots \wedge y_n$, where each $y_i$ is either $x_i$ or $\neg x_i$, is said to be in full disjunctive normal form. Since there are $2^n$ such clauses, and taking the disjunction over different subsets of them gives logically inequivalent propositions, there are exactly $2^{2^n}$ propositional formulae in the variables $x_i$, up to logical equivalence. (The empty disjunction is false by convention.)

The Dedekind number $D_n$ is the number of such formulae not involving any $\neg x_i$. Thus $D_n$ is the number of monotone boolean functions in $n$ variables, or the number of monotone access structures with $n$ people, or (by the minimal authorized subsets interpretation), the number of antichains in the poset of subsets of $\{1,2,\ldots, n\}$, ordered by inclusion. The OEIS link gives several further combinatorial interpretations. The sequence of Dedekind numbers, for $n \in \mathbb{N}$, begins 2, 3, 6, 20, 168, 7581, 7828354. No exact formula is known (or likely ever to be known), but the antichain interpretation makes it plausible that

$\log_2 \displaystyle D_n \sim \binom{n}{n/2} \sim \sqrt{\frac{2}{\pi}} \frac{2^{n}}{\sqrt{n}},$

and this was first proved by Kleitman.

Negating formulae in DNF, it follows from de Morgan’s laws that any proposition is equivalent to a conjunction of clauses, each of the form $y_1 \vee y_2 \vee \cdots \vee y_n$, where each $y_i$ is either $x_i$ or $\neg x_i$. (The empty conjunction is true by convention.) For monotonic boolean functions this gives a normal form in which every $x_i$ appears negated. The access structure interpretation makes an alternative conjunctive form apparent: for each non-authorized set $Y \not\in \Gamma$, we require that at least one person its complement is present. Clearly it suffices to consider the maximal such $Y$. Thus

$\displaystyle S \longleftrightarrow \bigwedge_{X \in \Delta} \bigwedge_{i \in X} \bigl( \bigvee_{j \in X' \cup \{i\}} x_j \bigr).$

Taking the special case where the minimal authorized sets are all $r$-subsets of the $n$ people we get the appealing

$\displaystyle \bigvee_{X \subseteq \{1,\ldots, n\} \atop |X| = r} \bigl( \bigwedge_{i \in X} x_i \bigr) \longleftrightarrow \bigwedge_{Y \subseteq \{1,\ldots, n\} \atop |Y| = n-r+1} \bigl( \bigvee_{j \in Y} x_j \bigr).$

For a direct proof, just note that at least $r$ people are present if and only if at most $n-r$ people are absent, so if and only if at least one person from every subset of size $n-r+1$ is present.

## LMS Education Day 2016

May 24, 2016

The day started with Jane White and Paola Iannone on Assessment practices and learning mathematics. There was some discussion of the rationale for different forms of assessment, and then Paola reported on an interesting survey that asked students to rate different forms of assessment by (a) their personal preference; (b) their perceived ability to discriminate. For mathematics students there was some correlation. The traditional closed book exam was the most preferred option, and also regarded as having the greatest discriminatory power.

From interviews with students several other themes emerged: students dislike coursework, knowing how easy it is plagiarise (it was quickly pointed out that plagiarising successfully is not so easy), and group assessment, perhaps because of the ‘free-rider’ problem. They value fairness highly, and were surprisingly open to unusual forms of assessment. Several suggested short oral exams would give them a fair chance to show what they could do. Paola reported that vivas are used instead of one problem sheet in a course at UEA. The workload is roughly equivalent, but shifts from Ph.D markers to the lecturer.

The next talk was Kevin Houston and Mike Robinson on The flipped classroom. Kevin talked about flipping a first term ‘Number Systems’ course at Leeds. I lectured a similar course at Royal Holloway for three years, with Kevin’s book as required additional reading, and most recently gave a ‘partially flipped’ follow on course on Matrix Algebra, so naturally my ears pricked up. Before flipping, Kevin’s course had 10 weeks of 1 hour lectures, with a 1 hour weekly tutorial in groups of 10 to 13. After flipping, it has two 1 hour lectures, on Monday and Fridays. On Monday he lectured in his usual style; on Friday students were expected to have worked through written notes in advance, and the hour was spent solving problems and reviewing the key ideas in proofs. (I’m not sure if Kevin fully adopted Eric Mazur’s peer instruction cycle.) He used socrative to run quizzes in lectures, relying on students to own a smartphone: they all did. I’d be worried about students getting distracted, but there are clearly advantages over clickers: for instance, students can submit short text responses, so questions do not have to be purely multiple choice.

Students were encouraged to do the preliminary work by an assessed online quiz, available before each Friday lecture. Full exam credit was available for any result above 50%, with multiple attempts permitted — this seems a neat way to provide some additional motivation, while keeping an essentially formative character to the assessment. Exam results were ‘comparable to previous years on a slightly harder course’. Student engagement was significantly improved. Kevin commented on the high workload in preparing the new course for the first time, and the need for very accurate timing to get students to the right point on Monday evening.

Mike then talked about his experience flipping a second year course ‘Dynamical systems and Fourier analysis’. Before flipping, this course had a 1 hour lecture, and 1 hour practical workshop in a PC Lab. Post flipping, it had a single 2 hour class, held in a room with plentiful laptop computers. The preliminary work was to watch short online videos prepared by Mike. The advantage of online videos over lectures, that the viewer can pause at any moment, was appreciated by the students. They also observed, however, that there was no immediate opportunity to ask questions to the lecturer or their peers. Mike reported on a survey, addressed only to serial non-attenders at this course: a typical non-attender missed classes for reasons unrelated to the courses, but a few, who admitted not doing the preliminary work, felt their lack of preparation would be exposed. Another survey showed the new course was very popular with students, with many seeing only advantages to the new format.

I would like to know whether strong or weak students benefit more from the flipped classroom. My guess is that flipping can help weak but hard-working students to get somewhere, rather than nowhere, but is maybe of less benefit to stronger students. And under any system there will be a core of students who’ll fail no matter what we do. Kevin made the interesting point that many students adopt a pragmatic attitude and aim to do just enough in any given course. So it is unrealistic to expect huge improvements in exam performance, even from successful interventions.

All speakers were enthusiastic about their experience; both Kevin and Mike felt they ‘hadn’t flipped enough’. A recurring comment was that, post flipping, they had more meaningful interaction with the students. Paola commented that it was often surprising how much students turned out to know. I agree. Going the other way, I’m now only rarely surprised by the many essential things (definitions, for example) that pass them by, causing endless stumbling blocks. The flipped classroom may create — at least in the short term — more work, and is probably not suited to everyone, but it offers a way to tackle these problems head on.

The final talk was by Peter Giblin and Alice Rogers on The Teaching Excellence Framework, HE White Paper, and Quality Assurance. Peter gave an excellent summary of the white paper (he said slides would be available from the LMS website), highlighting the somewhat strange way in which the TEF is expected to be introduced, at the institutional level very soon (but with everyone who agrees to play guaranteed at least the minimum ‘Meets Expectation’ rating), and at the discipline level only by 2019/2020. Peter quoted one particularly discouraging requirement, buried in the additional technical documentation, that submissions should ‘avoid focusing on successful but highly localised practices’. The whole game seems to be a bureaucrat’s dream and an academic’s nightmare.

There was much uncertainty about the way departments would be assessed, and considerable debate about what, if any, benchmarks could fairly be used. On this note, the day ended.

## Variance of an estimate for the second moment

May 4, 2016

Suppose there are $u$ urns, numbered from $1$ up to $u$ containing in total $b$ balls, numbered from $1$ up to $b$. Let $p_i$ be the proportion of balls in $i$. Let $s$ be the probability that if we pick two numbers in $\{1,\ldots,b\}$ uniformly at random, then the balls with these numbers lie in the same urn. Clearly $s = \sum_{i=1}^u p_i^2$.

The honest way to estimate $s$ follows the definition by repeatedly picking pairs $(b,b')$ of balls, and counting the number of ‘hits’ where $b$ and $b'$ are in the same urn. For example, suppose that each urn contains equally many balls, so $p_i = 1/u$ for each $i$. Then our estimate for $s = 1/u$ after $N$ samples is $H/N$, where the number of hits $H$ has distribution $\mathrm{Bin}(N,1/u)$. As expected, $\mathbb{E}[H/N] = 1/u = s$.

An alternative is to choose $N$ ball numbers uniformly at random, and for each ball, note down its urn number. For each $i \in \{1,\ldots, u\}$, let $B_i$ be the random variable counting the number of balls lying in urn $i$. We then estimate $s$ by the statistic

$S = \sum_{i=1}^u \bigl( \frac{B_i}{N} \bigr)^2$.

By the well known formula

$\mathrm{Var} X = \mathbb{E}[(X -\mathbb{E}X)^2] = \mathbb{E}[X^2] - \mathbb{E}[X]^2,$

we get

\begin{aligned} \mathbb{E}[S] &= \frac{1}{N^2} \mathbb{E}\bigl[ \sum_{i=1}^u (B_i-\mathbb{E}[B_i])^2\bigr] + \sum_{i=1}^u \mathbb{E}[B_i]^2) \\ &= \frac{1}{N^2} \sum_{i=1}^u \mathbb{E}[(B_i - \mathbb{E}[B_i])^2] + s\\ &= \frac{1}{N^2} \sum_{i=1}^u \mathrm{Var} B_i + s. \end{aligned}

Thus $S$ is a biased estimator for $s$. Since $B_i$ has distribution $\mathrm{Bin}(N, p_i)$, we have $\mathrm{Var}(B_i) = Np_i(1-p_i)$ and so

$\mathbb{E}[S] = p + \frac{1}{N^2} \sum_{i=1}^U Np_i(1-p_i) = s + \frac{1}{N} - \frac{s}{N}.$

Since $s$ is small compared to $1$, a good correction for the systematic error can be made by subtracting $1/N$. All the same, it might well seem sampling by pairs is the simplest and best approach. But this ignores the variability in our estimates.

Again consider the case where $p_i = 1/u$ for each $i$. Then, as already seen, $H$ has distribution $\mathrm{Bin}(N,1/u)$ and so $\mathrm{Var}[H/N] = \mathrm{Var}[H]/N^2 = N(1/u)(1-1/u)/N^2 \approx 1/Nu$. The standard deviation in $H/N$ is therefore $1/\sqrt{Nu}$. In particular, to improve the accuracy of our estimate by a factor $f$, we must increase the number of samples by $f^2$.

For $S$, we extend the ‘orthogonality trick’ used earlier to express $\mathbb{E}[S]$ in terms of the variances of the $B_i$, to get

\begin{aligned} S &= \sum_{i=1}^u \frac{B_i^2}{N^2} \\ &= \sum_{i=1}^u \Bigl( \bigl( \frac{B_i}{N} - \frac{1}{u} \bigr)^2 + \frac{2B_i}{Nu} - \frac{1}{u^2} \Bigr) \\ &= \sum_{i=1}^u \bigl( \frac{B_i}{N} - \frac{1}{u} \bigr)^2 + \frac{2}{u} - \frac{1}{u} \\ &= \sum_{i=1}^u \bigl( \frac{B_i}{N} - \frac{1}{u} \bigr)^2 - \frac{1}{u} \end{aligned}
where the penultimate line uses $\sum_{i=1}^u B_i = N$.

By the Central Limit Theorem, $\sqrt{uN}\bigl( \frac{B_i}{N} - \frac{1}{u} \bigr)$ is approximately normally distributed with mean $0$ and variance $uN \mathrm{Var}[B_i/N] \approx 1$. While the $B_i$ are not independent, it still seems reasonable to approximate

$\hat{S} = uN \sum_{i=1}^u \bigl( \frac{B_i}{N} - \frac{1}{u} \bigr)^2$

as the sum of the squares of $u$ standard normal variables, i.e. as the $\chi^2$ distribution with $u$ degrees of freedom. This distribution has variance $2u$, so we have $\mathrm{Var}[\hat{S}] \approx 2u$. Undoing the scaling we get

$\mathrm{Var}[S] \approx \frac{2u}{u^2N^2} = \frac{2}{uN^2}$.

The standard deviation in $S$ is then $\frac{1}{N} \sqrt{\frac{2}{u}}$. Therefore increasing number of samples by a factor of $f$ reduces the standard deviation by the same factor. Moreover, we use fewer samples: $N$ rather than $2N$.

### Simulations

Data from simulation suggest these approximations work well in practice. For instance, with $u = 100$ urns and $N = 1000$ samples, the mean of $H/N$ over $1000$ repetitions of the sampling experiment was $0.01014$, with standard deviation $3.1354 \times 10^{-3}$; the mean of $S/N$ was $0.01099 \approx 1/u + 1/N$, with standard deviation $1.4101 \times 10^{-4}$. Increasing the number of samples to $10000$ reduced the standard deviations for $H/N$ and $S/N$ to $1.0114 \times 10^{-3}$ and $1.4176 \times 10^{-5}$, respectively, agreeing with the prediction above. For more skewed distributions of balls into urns the standard deviation of $S/N$ is still far less than for $H/N$, but the improvement on increasing the number of samples is less dramatic.

## Modular representations that look ordinary

March 23, 2016

Let $G$ be a finite group and let $(K, \mathcal{O}, k)$ be a $p$-modular system for $G$. Thus $K$ is a field of characteristic zero sufficiently large that if $V$ is any irreducible representation of $G$ then $\mathrm{End}_{KG}(V) \cong K$ and $K$ contains all the eigenvalues of the matrices representing elements of $G$ in their action on $V$, $\mathcal{O}$ is an integrally closed subring of $K$ containing all these eigenvalues, and $k$ is a field of prime characteristic $p$ obtained by quotienting $\mathcal{O}$ by a maximal ideal $\mathfrak{m}$, necessarily lying above $\langle p \rangle \unlhd \mathcal{O}$. By localizing if necessary, we may assume that all elements of $\mathcal{O}$ not in $\mathfrak{m}$ are invertible. So $\mathcal{O}$ is a local ring with unique maximal ideal $\mathfrak{m}$.

This is a standard setup for modular representation theory. The main purpose of this proof is to prove Theorem 2 below, which gives a precise description of a family of representations that look substantially the same, whether defined over $K$, $\mathcal{O}$ or $k$.

As a warm-up we prove the following result on ordinary characters.

Proposition 1. Let $\chi$ be an irreducible character of $G$ such that $\chi(1)$ is divisible by the highest power of $p$ dividing $|G|$. If $g \in G$ is a $p$-element then $\chi(g) = 0$.

We need the following fact: the central primitive idempotent in $KG$ corresponding to $\chi$ is

$e_\chi = \frac{\chi(1)}{|G|} \sum_{g \in G} \chi(g^{-1}) g.$

Note that the coefficients of $e_\chi$ are contained in $\mathcal{O}$.

Proof of Proposition 1. Let $P$ be a Sylow $p$-subgroup of $G$ containing $g$. Since

$(1) \qquad \mathcal{O}G = e_\chi \mathcal{O}G \oplus (1-e_\chi) \mathcal{O}G,$

considered as a right $\mathcal{O}G$-module, $e_\chi \mathcal{O}G$ is projective. Hence the restriction of $e_\chi \mathcal{O}G$ to $\mathcal{O}P$ is also projective. Now $\mathcal{O}P$ is indecomposable as a right $\mathcal{O}P$-module, since any direct sum decomposition induces a direct sum decomposition of the indecomposable $kP$-module $kP$ by reduction modulo $\mathfrak{m}$. Hence the unique projective indecomposable $\mathcal{O}P$-module is $\mathcal{O}P$ itself and $e_\chi \mathcal{O}G\downarrow_P \cong \mathcal{O}P \oplus \cdots \oplus \mathcal{O}P$. Restricting further we get

$(2)\qquad e_\chi \mathcal{O}G \downarrow_{\langle g \rangle} \cong \mathcal{O} \langle g \rangle^{\oplus \chi(1)/\mathrm{ord}(g)}.$

From the canonical basis for the right-hand we see that the character of $g$ acting on $e_\chi \mathcal{O}G$ is zero. But, by Wedderburn’s Theorem, $e_\chi KG$ is isomorphic to the direct sum of $\chi(1)$ copies of a simple $KG$-module with character $\chi$. Hence $\chi(1)\chi(g) = 0$.$\qquad\Box$

I find it quite striking that the modular theory gives a short proof of a result stated entirely in the language of ordinary (i.e. characteristic zero) representation theory. Moreover, from (2), we can easily get a stronger result: $\chi(g) = 0$ whenever $g$ has order divisible by $p$.

Outline proof. let $g$ have order $p^a c$ where $c$ is not divisible by $p$, let $g_p = g^c$, and let $g_{p'} = g^{p^a}$. Take a direct sum decomposition of $e_\chi KG$ into eigenspaces for the action of $g_{p'}$. By (2), $g_p$ has trace zero in its action on each eigenspace, hence so does $g$.

Proposition 1 is also a corollary of the following theorem. My first version of the proof did not make it entirely clear that $e_\chi \mathcal{O}G$ is a direct sum of isomorphic $\mathcal{O}G$-modules, each simple as a $KG$-module. To repair the gap, I borrowed the proof of (1) $\implies$ (2) on page 162 of this recent book by Peter Webb.

Theorem 2. Let $\chi$ be an irreducible character of $G$ such that $\chi(1)$ is divisible by the highest power of $p$ dividing $|G|$. There is a projective $\mathcal{O}G$-module $M$ with ordinary character $\chi$ such that $M/\mathfrak{m}M$ is a simple projective $kG$-module.

Proof. Let $M$ be a $\mathcal{O}G$-module such that $M_K = M \otimes_\mathcal{O} K$ is a simple $KG$-module. By Wedderburn’s Theorem, $e_\chi KG$ is isomorphic to $\mathrm{End}_K(M_K)$, thought of as the complete matrix algebra of $\chi(1) \times \chi(1)$ matrices with coefficients in $K$. Let $\rho : KG \rightarrow \mathrm{End}_K(M_K)$ be the corresponding ring homomorphism, with kernel $(1-e_\chi)KG$. The restriction of $\rho$ to $\mathcal{O}G$ has kernel $(1-e_\chi)\mathcal{O}G$. So it is sufficient to prove that the image of $\rho : \mathcal{O}G \rightarrow \mathrm{End}_K(M_K)$ is $\mathrm{End}_O(M)$, thought of as the complete matrix ring of $\chi(1) \times \chi(1)$ matrices with coefficients in $\mathcal{O}$. Since $M$ is a $\mathcal{O}G$-module, the matrix coefficients lie in $\mathcal{O}$.

The surjectivity of the restriction of $\rho$ follows from the following ‘averaging’ formula which expresses an arbitrary $\phi \in \mathrm{End}_\mathcal{O}(M)$ as something in its image:

$\phi = \frac{\chi(1)}{|G|} \sum_{g \in G} \mathrm{Tr}_M\bigl( \rho(g^{-1}) \phi\bigr) \rho(g).$

Subproof. Since the ring homomorphism $\rho: KG \rightarrow \mathrm{End}_K(M)$ is surjective, it is sufficient to prove this when $\phi = \rho(x)$ with $x \in G$; the right-hand side is

$\rho(x) \frac{\chi(1)}{|G|} \sum_{g\in G} \chi(g^{-1} x) \rho(x^{-1} g) = \rho(x) \rho(e_\chi) = \rho(x),$

as required.

We can now think of the right $\mathcal{O}G$-module $M$ very concretely, as those $d \times d$ matrices with coefficients in $\mathcal{O}$ that are zero outside their first row. Reducing mod $\mathfrak{m}$, we obtain $M/\mathfrak{m}M$; this is the unique simple module (up to isomorphism) for $\mathrm{Mat}_{d \times d}(k)$. Hence $M/\mathfrak{m}M$ is a simple projective $kG$-module, as claimed. $\qquad\Box$

## Cheating ones way through linear algebra

March 5, 2016

Here are a few arguments that are possibly useful to prove things when one hasn’t set up all the usual machinery of vector spaces, subspaces and dimensions, presented in decreasing order of how seriously that could be used in a first course on matrix algebra. Since it fits well with the syllabus for the first year course I’m lecturing this term, I’m going to assume reduced row-echelon form is available. Throughout all matrices have coefficients in a field $F$.

### Invertible if and only if injective by row operations

Given a matrix $B$ in reduced row-echelon form it is almost trivial to write down all solutions to $B x = 0$. There are non-zero solutions if and only if not every column of $B$ contains a pivot. For example, the solutions to

$\left( \begin{matrix} 1 & 2 & 0 & 3 \\ 0 & 0 & 1 & 4 \\ 0 & 0 & 0 & 0 \end{matrix}\right) \left( \begin{matrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{matrix} \right) = \left( \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \right)$

are

$\left\{ \left( \begin{matrix} -2x_2-3x_4 \\ x_2 \\ -4x_4 \\ x_4 \end{matrix} \right) : x_2, x_4 \in F \right \};$

the pivot column variables are determined by the non-pivot variables, which may be chosen freely.

Let $f : F^n \rightarrow F^n$ be an invertible linear map represented by the matrix $A$. Obviously if $f$ is invertible then $f$ is injective. Conversely, suppose that $f$ is injective and that $A$ has reduced row-echelon form $B$, so $B = QA$ for some invertible matrix $Q$. If $B$ has a zero row then it has at most $n-1$ pivots, so by the remark above, $B$ has a non-zero kernel. Therefore $B$ has $n$ pivots and so is the identity matrix. Hence $I = QA$. Now this only shows that $A$ has a left inverse, but since $Q$ is a product of elementary matrices (corresponding to row operations), and each such elementary matrix is easily seen to be invertible (in the usual sense), $Q$ is invertible and so $A = Q^{-1}$ is invertible.

Very similarly one can show that $f$ is invertible if and only if $f$ is surjective.

### Linearly dependent if too many vectors

If $A$ is the $(n+1) \times n$ matrix whose rows are the components of $n+1$ vectors in $F^n$ (expressed in the standard basis) then the reduced row echelon form of $A$ has at most $n$ pivot columns, and so the bottom row is zero. Correspondingly an $F$-linear combination of the rows of $A$ is zero.

### Invertible if and only if injective by disguised minimal polynomial and Primary Decomposition Theorem

The vector space of all linear endomorphisms of $F^n$ is $n^2$ dimensional. By the previous result there is a non-trivial linear dependency between the powers of $A$. Let $c_0 I + c_1 A +\cdots + c_d A^d$ be a relation with $d$ minimal. If $c_0\not=0$ then formally multiplying through by $c_0^{-1}A^{-1}$ gives

$-c_0^{-1}c_1 I - c_0^{-1}c_2 A - \cdots - c_0^{-1}c_dA^{d-1}$

as a plausible (and clearly correct) guess for the inverse of $A$. If $c_0 = 0$ then $A^r g(A) = 0$ where $r \in \mathbb{N}$ and $g$ has degree $d-r$. By minimality of $d$, we have $g(A) \not=0$; if $w = g(A)v \not=0$ then $A^r w = 0$, so, choosing $s \in \mathbb{N}_0$ least such that $A^s w\not =0$, we find that $\ker A \not= 0$. So either $f$ is invertible, or $\ker f \not=0$, and so $f$ is not injective.

### Row rank is column rank

William P. Wardlaw published a very short proof in Mathematics Magazine (2005) 78 316—318. Let $A$ be a non-zero $n \times n$ matrix. Let $r$ be minimal such that there exists an $n \times r$ matrix $P$ and an $r \times n$ matrix $Q$ such that $PQ = A$. By minimality the rows of $Q$ are linearly independent, and clearly the row span of $Q$ contains the row span of $A$. Hence $r$ is the row rank of $A$. Dually, $r$ is the column rank of $A$.

### Linear dependencies by pigeonhole principle

Take $n+1$ vectors $v_1, \ldots, v_{n+1}$ in $\mathbb{F}_p^n$. There are $p^{n+1}$ linear combinations $a_1v_1 + \cdots + a_{n+1}v_{n+1}$ of these vectors, but only $p^n$ elements of $\mathbb{F}_p^n$, so two linear combinations with different coefficients are equal.

As I learned from this MathOverflow answer, one can make this argument work in $\mathbb{Z}^n$, and so in $\mathbb{Q}^n$ by scaling to remove denominators. Let $n+1$ non-zero elements of $\mathbb{Z}^n$ be given, and let $N$ be the largest component in absolute value. There are $(2M+1)^{n+1}$ $\mathbb{Z}$-linear combinations with coefficients in $\{-M,\ldots,M\}$, each of which has components in $\{-MN, \ldots, MN\}$. The pigeonhole principle strikes as soon as $(2MN+1)^n < (2M+1)^{n+1}$; clearly this is true for large enough $M$. For instance, since $(2MN+1)^n < 3^n N^n M^n$ and $2^n M^n M < (2M+1)^{n+1}$, it is sufficient if $(3/2)^n N^n < M$.

Question. Is there a trick to extend this argument to arbitrary fields?

### Modules over general rings

Yiftach Barnea pointed out to me that if $A$ is an $n \times n$ matrix with entries in a principal ideal domain $R$ then, regarded as a linear map $R^n \rightarrow R^n$, if $A$ is surjective then $A$ is injective. One proof (to my eyes, essentially the same as using the structure theorem for modules over a PID) is to use row and column operations to reduce to the case where $A$ is a diagonal matrix: then all the diagonal entries of $A$ must be invertible in $R$, so $A$ is itself invertible, and so injective. Clearly the converse holds if and only if $R$ is a field.

Suppose that $R$ is a unital ring containing a non-zero element $b$ with a left inverse. Then multiplication by $b$, i.e. $r \mapsto br$, is an injective linear map. This map is surjective if and only if $b$ has a right inverse. So for general rings, neither implication holds.

### Question.

Is there a nice characterization of rings $R$ such that if $f : R^n \rightarrow R^n$ is a surjective $R$-linear map then $f$ is injective?

My Ph.D student Bill O’Donovan told me about Hopfian objects. By definition, an $R$-module $M$ is Hopfian if and only if it has the property in the question, i.e. any surjective $R$-module endomorphism of $M$ is injective.

A related property is that $M$ is strongly Hopfian if and only if whenever $f : M \rightarrow M$ is an $R$-module homomorphism, the kernel chain

$\mathrm{ker} f \subseteq \mathrm{ker} f^2 \subseteq \mathrm{ker} f^3 \ldots$

stabilises. The relevant implication follows from the standard proof of Fitting’s lemma.

Claim. Strongly Hopfian implies Hopfian.

Proof. Suppose that $\mathrm{ker} f^n = \mathrm{ker} f^{n+1}$. Since $f$ restricts to a map $\mathrm{ker} f^i \rightarrow \mathrm{ker} f^{i-1}$ for any $i \ge 1$, we see that $f$ preserves $\mathrm{ker} f^n$, and acts on this space as a nilpotent map of degree $n$. Let $u \in \mathrm{ker} f^n$. Since $f$ is surjective, $u = f^n v$ for some $v \in M$, and by hypothesis, $f^n u = 0$. Hence $f^{2n} v = 0$. But $\mathrm{ker} f^{2n} = \mathrm{ker} f^n$, so $f^n v = 0$, hence $u=0$. Therefore $\ker f \subseteq \mathrm{ker} f^n = \{ 0 \}$ and $f$ is injective.

In particular, any Noetherian ring is Hopfian as a module for itself. There are Hopfian modules that are not strongly Hopfian modules, for example Proposition 2.7(4) in this paper by A. Hmaimou, A. Kaidi and E. Sánchez Campos gives the example of the $\mathbb{Z}$-module $\oplus (\mathbb{Z}/p_n \mathbb{Z}^n)$, where $p_n$ is the $n$th prime; the kernel chain for the endomorphism acting nilpotently with degree $n$ on the $n$th summand does not stabilise, but since any endomorphism preserves the summands, the module is Hopfian.

I asked on MathOverflow is there was a ring $R$ and a Hopfian $R$-module $M$ such that $M \oplus M$ was not Hopfian, and my worst suspicions were confirmed.