## Burnside’s method

April 24, 2017

Burnside proved in 1901 that if $p$ is an odd prime then a permutation group containing a regular subgroup isomorphic to $C_{p^2}$ is either imprimitive or 2-transitive. His proof was an early application of character theory to permutation groups. Groups with this property are now called B-groups.

Burnside attempted to generalize his 1901 result in two later papers: in 1911, he claimed a proof that $C_{p^n}$ is a B-group for any prime $p$ and any $n \ge 2$, and in 1921, he claimed a proof that all abelian groups, except for elementary abelian groups, are B-groups. The first claim is correct, but his proof has a serious gap. This error appears to have been unobserved (or, just possibly, observed but ignored, since the result was soon proved in another way using Schur’s theory of $S$-rings) until 1994 when it was noted by Peter Neumann, whose explication may be found in his introduction to Burnside’s collected works. In 1995, Knapp extended Burnside’s argument to give a correct proof. Burnside’s second claim is simply false: for example, $S_4 \wr S_2$ acts primitively on $\{1,2,3,4\}^2$, and has a regular subgroup isomorphic to $C_4 \times C_4$. In one of my current projects, I’ve simplified Knapp’s proof and adapted Burnside’s character-theoretic methods to show, more generally, that any cyclic group of composite order is a B-group.

The purpose of this post is to record some proofs omitted for reasons of space from the draft paper. This companion post has some notes on B-groups that may be of more general interest.

### Sums over roots of unity

Let $\xi$ be a primitive $n$th root of unity. Define

$R(r) = \{ r, r + p^{n-1}, \ldots, r+ (p-1)p^{n-1} \}$

for $0 < r < p^{n-1}$. Define a subset $Z$ of $\{1,\ldots,p^n-1\}$ to be null if there exists $s \in \mathbb{N}_0$ and distinct $r_{ij} \in \{1,\ldots, p^{n-1}-1\}$ for $0 \le i \le p-1$ and $1 \le j \le s$ such that $r_{ij} \equiv i$ mod $p$ for each $i$ and $j$ and

$Z = \bigcup_{i=0}^{p-1} \bigcup_{j=1}^s R(r_{ij}).$

Proposition 6.2 Let $\omega = \zeta^{p^{n-1} c}$ where $c$ is not divisible by $p$. Let $\mathcal{O} \subseteq \{1,\ldots,p^n-1\}$. Then

$\sum_{i \in \mathcal{O}} \zeta^i = \sum_{i \in \mathcal{O}} \omega^i$

if and only if either

1. $\mathcal{O}$ is null; or
2. $\mathcal{O} = \{p^{n-1}, \ldots, (p-1)p^{n-1} \} \; \cup \; \bigcup_{i=1}^{p-1} R(r_i) \; \cup \; Z$ where $Z$ is a null set, the $r_i$ are distinct elements of $\{1,\ldots,p^{n-1}-1\}\backslash Z$ and $r_i \equiv i$ mod $p$ for each $i$.

Proof. Since the minimum polynomial of $\zeta$ is

$1+X^{p^{n-1}} + \cdots + X^{(p-1)p^{n-1}}$

we have $\sum_{i \in R(r)} \zeta^i = 0$. Since $\xi^{p^{n-1}} = \omega$, we have $\sum_{i \in R(r)} \omega^i = p\omega^r$. It follows that $\sum_{i \in Z} \xi^i = \sum_{i \in Z} \omega^i = 0$ if $Z$ is a null set. (For the second equality, note the contributions from the $R(r_{ij})$ for fixed $j$ combine to give $p + p\omega + \cdots + p\omega^{p-1} = 0$.) For (2) we have $\sum_{i=1}^{p-1} \xi^{i p^{n-1}} = \omega + \cdots + \omega^{p-1} = -1$, $\sum_{i=1}^{p-1} \omega^{i p^{n-1}} = (p-1)$ and $\sum_{i=1}^{p-1} p \omega^i = - p$. This proves the ‘if’ direction.

Conversely, by Lemma 2.1 in the paper, $\mathcal{O} \backslash \{p^{n-1},\ldots,(p-1)p^{n-1} \}$ is a union of some of some of the sets $R(r)$. There exists a unique subset $A$ of $\{1,\ldots,p-1\}$ and unique $j_i \in \mathbb{N}$ for $0 \le i \le p-1$ and unique $r_{ij} \in \{1,\ldots, p^{n-1}-1\}$ for $1 \le j \le s$ and $0 \le i \le j_i$ such that

$\mathcal{O} = \{ p^{n-1} i : i \in A \} \cup \bigcup_{i=0}^{p-1}\bigcup_{j=1}^{i_j} R(r_{ij}).$

We have $\sum_{i \in \mathcal{O}} \zeta^i = \sum_{i \in A} \omega^i$ and $\sum_{i \in \mathcal{O}} \omega^i = |A| + \sum_{i=0}^{p-1} pj_i \omega^i$. Therefore

$|A| + \sum_{i=0}^{p-1} (pj_i - [i \in A]) X^i$

has $\omega$ as a root. Since this polynomial has degree at most $p-1$ and the minimal polynomial of $\omega$ is $1+X +\cdots + X^{p-1}$, it follows that the coefficients are constant. Hence

$|A| + pj_0 = pj_i - [i \in A]$

for $1\le i \le p-1$. If $A = \varnothing$ then $j_0 = j_1 = \ldots = j_{p-1}$ and $\mathcal{O}$ is null. Otherwise, taking the previous displayed equation mod $p$ we see that $A = \{1,\ldots,p-1\}$, and, moreover, the $j_i$ are constant for $1 \le i \le p-1$. (This holds even if $p=2$.) Set

$s = j_1 = \ldots = j_{p-1}.$

We have $j_0 = s-1$. Hence, $s \in \mathbb{N}$ and choosing in any way $r_i \in \{1,\ldots,p^{n-1}-1\}$ for $1 \le i \le p-1$ such that $r_i \in \mathcal{O}$, we see that $\mathcal{O}$ is the union of $\{p^{n-1}, \ldots, (p-1)p^{n-1}\}$, the sets $R(r_i)$ for $1 \le i \le p-1$ and a null set. $\Box$.

### Ramanujan matrices.

For $p$ prime and $n \in \mathbb{N}$ we define $R(p^n)$ to be the matrix

$\left( \begin{matrix} \scriptstyle 1 & \scriptstyle1 & \scriptstyle 1 & \scriptstyle\ldots & \scriptstyle 1 & \scriptstyle1 & \scriptstyle1 \\ \scriptstyle-1 & \scriptstyle p-1 & \scriptstyle p-1 & \scriptstyle \ldots & \scriptstyle p-1 & \scriptstyle p-1 & \scriptstyle p-1 \\ \scriptstyle0 & \scriptstyle -p & \scriptstyle p(p-1) & \scriptstyle \ldots & \scriptstyle p(p-1) & \scriptstyle p(p-1) & \scriptstyle p(p-1) \\ \scriptstyle 0 & \scriptstyle 0 & \scriptstyle -p^2 & \scriptstyle \ldots & \scriptstyle p^2(p-1) & \scriptstyle p^2(p-1) & \scriptstyle p^2(p-1) \\ \scriptstyle\vdots & \scriptstyle \vdots & \scriptstyle \vdots & \scriptstyle \ddots & \scriptstyle \vdots & \scriptstyle \vdots & \scriptstyle \vdots \\ \scriptstyle 0 & \scriptstyle 0 & \scriptstyle 0 & \scriptstyle \ldots & \scriptstyle -p^{n-2} & \scriptstyle p^{n-2}(p-1) & \scriptstyle p^{n-2}(p-1) \\ \scriptstyle 0 & \scriptstyle 0 & \scriptstyle 0 & \scriptstyle \ldots & \scriptstyle 0 & \scriptstyle -p^{n-1} & \scriptstyle p^{n-1}(p-1) \end{matrix} \right).$

More generally, if $d$ has prime factorization $p_1^{a_1} \ldots p_s^{a_s}$, we define $R(d) = R(p_1^{a_1}) \otimes \cdots \otimes R(p_s^{a_s})$. The rows and columns of $R(d)$ are labelled by the divisors $D$ of $2^m p^n$, as indicated below for the case $d = 2^3 p$, with $p$ and odd prime.

$R(2^3 p) = \begin{matrix} 1 \\ 2 \\ 2^2 \\ 2^3 \\ p \\ 2p \\ 2^2p \\ 2^3p \end{matrix} \left( \begin{array}{cccc|cccc} 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ -1 & 1 & 1 & 1 & -1 & 1 & 1 & 1 \\ 0 & -2 & 2 &2 & 0 & -2 & 2 & 2 \\ 0 & 0 & -2^2 & 2 & 0 & 0 & -2^2 & 2^2\\ \hline -1 & -1 & -1 & -1 & 1 & 1 & 1 & 1 \\ 1 & -1 & -1 & -1 & -1 & 1 & 1 & 1 \\ 0 & 2 & -2 & -2 & 0 & -2 & 2 & 2 \\ 0 & 0 & 2^2 & -2^2 & 0 & 0 & -2^2 & 2^2 \end{array} \right).$

Say that a partition of $D \backslash \{ d\}$ is coprime if the highest common factor of the numbers in each part is $1$. The aim of the game is to find a non-empty set of rows, $X$ say, of $R(d)$ such that the subsets of columns (excluding column $d$) on which the sum of the rows in $X$ are equal form a coprime partition of the divisors.

There is an application to B-groups only when $d$ is even, in which case we may assume that $2 \in X$. Proposition 6.7 in the paper states that in this case, the only way to win this game when $d = 2^n$, $d = 2^n p$ or $d = 2 p^n$, where $p$ is an odd prime, is to put every single row in $X$. This implies that $C_{2^m p^n}$ is a B-group when $n \le 1$ or $m \le 1$. However the result on the game may well hold more generally.

#### Constant sums for $p^n$

The only coprime partition of $D \backslash \{p^n\}$ has a singleton part, so the row sums are all equal. By adding $p^{e-1}$ to each entry in row $p^e$ for $e \in \{1,\ldots, n\}$, we obtain the matrix below.

$\left( \begin{matrix} \scriptstyle 1 & \scriptstyle1 & \scriptstyle 1 & \scriptstyle\ldots & \scriptstyle 1 & \scriptstyle1 & \scriptstyle1 \\ \scriptstyle 0 & \scriptstyle p & \scriptstyle p & \scriptstyle \ldots & \scriptstyle p & \scriptstyle p & \scriptstyle p \\ \scriptstyle p & \scriptstyle 0 & \scriptstyle p^2 & \scriptstyle \ldots & \scriptstyle p^2 & \scriptstyle p^2 & \scriptstyle p^2 \\ \scriptstyle p^2 & \scriptstyle p^2 & \scriptstyle 0 & \scriptstyle \ldots & \scriptstyle p^3 & \scriptstyle p^3 & \scriptstyle p^3 \\ \scriptstyle\vdots & \scriptstyle \vdots & \scriptstyle \vdots & \scriptstyle \ddots & \scriptstyle \vdots & \scriptstyle \vdots & \scriptstyle \vdots \\ \scriptstyle p^{n-2} & \scriptstyle p^{n-2} & \scriptstyle p^{n-2} & \scriptstyle \ldots & \scriptstyle 0 & \scriptstyle p^{n-1} & \scriptstyle p^{n-1} \\ \scriptstyle p^{n-1} & \scriptstyle p^{n-1} & \scriptstyle p^{n-1} & \scriptstyle \ldots & \scriptstyle p^{n-1} & \scriptstyle 0 & \scriptstyle p^n \end{matrix} \right)$

which still has constant sums over the rows in $X$. Let $x_i = 1$ if $p^i \in X$ and let $x_i = 0$ otherwise. Writing numbers in base $p$, the row sums over $X$ are, for each column,

\begin{aligned} 1 \quad &\quad x_nx_{n-1} \ldots x_3x_2x_0 \\ p \quad &\quad x_n x_{n-1} \ldots x_3x_1x_0 \\ p^2\quad &\quad x_n x_{n-1} \ldots x_2x_1x_0 \\ \vdots \; \quad &\quad \qquad \vdots \\ p^{n-2} \quad &\quad x_nx_{n-2} \ldots x_2x_1x_0 \\ p^{n-1} \quad &\quad x_{n-1}x_{n-2} \ldots x_2x_1x_2. \end{aligned}

(Note in each case there are $n$ digits, the most significant corresponding to $p^{n-1}$.) The numbers in the right column are equal, hence all the $x_i$ are equal and $X = D$, as required. Taking $p=2$ this gives an alternative proof of Proposition 6.7(i).

#### Proofs of further claims on the Ramanujan matrices

Let $R(d)^{\circ}$ denote $R(d)$ rotated by a half turn. The following result was stated without proof in the paper; it implies that each $R(d)$ is invertible, with inverse $\frac{1}{d}R(d)^\circ$. (Update: I later found a much better proof, now outlined in the paper, using a lower-upper decomposition of $R(d)$.)

Proposition. We have $R(p^n)^{-1} = \frac{1}{p^n} R(p^n)^\circ$ and $\det R(p^n) = p^{n(n+1)}/2$.

Proof. Since $R(p^n)^\circ_{p^sp^t} = R(p^n)_{p^{n-s}p^{n-t}}$, we have $R(p^n)^\circ_{p^np^t} = 1$ for all $t \in \{0,\ldots, n\}$ and

\begin{aligned} R(p^n)^\circ_{p^sp^t} &= \begin{cases} (p-1)p^{n-s-1} & n-t \ge n-s \\ -p^{n-s-1} & n-t=n-s-1 \\ 0 & n-t \le n-s-2 \end{cases} \\ &= \begin{cases} (p-1)p^{n-s-1} & s \ge t \\ -p^{n-s-1} & s = t-1 \\ 0 & s \le t-2 \end{cases} \end{aligned}

for $s \in \{0,\ldots, n-1\}$ and $t \in \{0,\ldots,n\}$. We use this to show that

$\sum_{c=0}^n R(p^n)_{p^rp^c} R(p^n)^{\circ}_{p^c p^{r'}} = p^n [r=r']$

for $r,r' \in \{0,\ldots, n\}$. When $r=0$ the first term in each product is $1$, so the left-hand side is the column sum of the $n-r'$-th of $R(p^n)$; this is $p^n$ if $r'=0$ and $0$ otherwise, as required. Now suppose that $r \ge 1$. Since $R(p^n)_{p^rp^c}$ vanishes when $c \le r-2$, the left-hand side is

$-p^{r-1}R(p^n)^\circ_{p^{r-1}p^{r'}} + \sum_{c=r}^n p^{r-1}(p-1) R(p^n)_{p^cp^{r'}}.$

Take out a factor $p^{r-1}$ to define $L$. Substitute for $R(p^n)^\circ_{p^cp^{r'}}$, and split off the summand from $R(p^n)^\circ_{p^np^{r'}} = 1$ to get

\begin{aligned} L &= \begin{cases} (p-1)p^{n-r} & r' \le r-1 \\ -p^{n-r} & r' = r \\ 0 & r' \ge r+1 \end{cases} \\ & \qquad + (p-1) \sum_{c=r}^{n-1} (p-1) \begin{cases} (p-1)p^{n-c-1} & c \ge r' \\ -p^{n-c-1} & c = r'-1 \\ 0 & c \le r'-2 \end{cases} + p-1.\end{aligned}

We must now consider three cases. When $r=r'$ we get

\begin{aligned} L &= p^{n-r} + (p-1)\sum_{c=r}^{n-1} (p-1)p^{n-c-1} + p-1 \\ &= p^{n-r} + (p-1)(p^{n-r}-1) + p-1 \\ &= p^{n-r+1} \end{aligned}

as required. When $r \ge r'+1$ we have $c > r'$ in all summands so

\begin{aligned} L &= -(p-1)p^{n-r} + (p-1)\sum_{c=r}^{n-1} (p-1)p^{n-c-1} + p^{r-1}(p-1) \\ &= (p-1) \bigl( -p^{n-r} + (p^{n-r}-1) + 1 \bigr) \\ &= 0. \end{aligned}

When $r \le r'-1$ the first non-zero summand occurs for $c=r'-1$ so we have

\begin{aligned} L &= - (p-1)p^{n-r'} + (p-1) \sum_{c=r'}^{n-1} (p-1)p^{n-c-1} + p^{r-1}(p-1) \\ &= (p-1) \bigl( -p^{n-r'} + (p^{n-r'}-1) + 1 \bigr) \\ &=0. \end{aligned}

Now taking determinants, using that $\det R(p^n)^\circ = \det R(p^n)$ (the matrices are conjugate by the matrix with $1$s on its antidiagonal and $0$s elsewhere), we get $1 / \det R(p^n) = (p^n)^{n+1} \det R(p^n)$. Hence $\det R(p^n) = p^{n(n+1)/2}$, as required.

### Jordan block matrices.

Let $J$ be the $m \times m$ unipotent upper-triangular Jordan block matrix over $\mathbb{F}_p$. We have $(J-I)^m = 0$, hence $(J-I)^{p^r} = 0$ whenever $p^r \ge m$. On the other hand if $k \le m-1$ then $J^k$ has a $1$ in position $(1,k)$. Therefore $J^{p^r} = I$ if and only if $p^r \ge m$. We also need a result on the relative trace matrix $I + J + \cdots + J^{p^{r}-1}$. Note that $\binom{p}{k}$ is divisible by $p$ for $k \in \{1,\ldots, p-1\}$. (For instance, a $p$-cycle acts freely on the set of $k$-subsets of $\{1,\ldots, p\}$.) An easy inductive argument using $\binom{p-1}{k-1} + \binom{p-1}{k} = \binom{p}{k}$ shows that $\binom{p-1}{k} \equiv (-1)^k$ mod $p$ for all $k$. Now Lucas’ Theorem implies that $\binom{p^r-1}{k} \equiv (-1)^k$ mod $p$. Hence

$(J-I)^{p^r-1} = I + J + \cdots + J^{p^r-1}.$

It follows that the right-hand side is $0$ if and only if $p^r > m$.

## Regular abelian subgroups of permutation groups

March 15, 2017

A B-group is a group $K$ such that if $G$ is a permutation group containing $K$ as a regular subgroup then $G$ is either imprimitive or $2$-transitive. (Regular subgroups are always transitive in this post.) The term ‘B-group’ was introduced by Wielandt, in honour of Burnside, who showed in 1901 that if $p$ is an odd prime then $C_{p^2}$ is a B-group. This is a companion to his important 1906 theorem that a transitive permutation group of prime degree is either solvable or $2$-transitive.

One might imagine that, post-classification, it would be clear which groups are B-groups, but this is far from the case. The purpose of this post is to collect some ancillary results with a bearing on this question.

### Quasiprimitivity

When Burnside’s argument succeeds in proving that $G$ is imprimitive, it does so by showing that the kernel $N$ of an irreducible non-trivial constituent of the permutation character of $G$ is intransitive. The orbits of $N$ are then a non-trivial block system for $G$. Thus $G$ is not even quasiprimitive. That one always gets this stronger result is explained by Corollary 3.4 in this paper by Li. The details are spelled out below.

Claim. If $G$ is a permutation group containing a regular abelian subgroup $K$ then $G$ is primitive if and only if $G$ is quasiprimitive.

Proof. Suppose $B$ is part of a non-trivial block system $\mathcal{B}$ for $G$. Let $L$ be the kernel of the induced action of $K$ on $\mathcal{B}$. The key observation is that $K/L$ acts regularly on $\mathcal{B}$: it acts semiregularly, since $K/L$ is abelian, and since $K$ is transitive, $K/L$ is transitive on $\mathcal{B}$. Similarly, since $K$ is transitive, $L$ is transitive on the block $B$ (given $\alpha, \beta \in B$, there exists $k \in K$ such that $\alpha^k = \beta$; then $B^k = B$ and because $k$ acts regularly on $\mathcal{B}$, we have $k \in L$). So we can factor $K$ as a top part $K/L$, acting regularly on $\mathcal{B}$, and a bottom part $L$ acting regularly on each block. In particular, the kernel of $G$ acting on $\mathcal{B}$ contains $L$ and $L$ has $\mathcal{B}$ as its orbits. Therefore $\mathcal{B}$ is the set of orbits of a normal subgroup of $G$, and $G$ is not quasiprimitive. $\Box$

Dropping the condition that $K$ be abelian, one finds many imprimitive but quasiprimitive groups: they can be obtained from factorizations $G = HK$ of a finite simple group $G$ where $H \cap K = 1$ and $H$ is not maximal. One nice example is $A_7 = F_{21}S_5$ where $F_{21}$ is the transitive Frobenius group of order $21$ inside $A_7$ and $S_5$ is generated by $(1,2,3,4,5)$ and $(1,2)(6,7)$. Here $F_{21}$ is not maximal because it is a subgroup of $\mathrm{GL}_3(\mathbb{F}_2)$ in its action on the $7$ non-zero vectors of $\mathbb{F}_2^3$. For an easier example, take $A_5 = \langle (1,2,3,4,5) \rangle A_4$.

### Factorizable groups

Say that a finite group $K$ is factorizable if there exists $t > 1$ and groups $K_1, \ldots, K_t$ such that $K \cong K_1 \times \cdots \times K_t$ where $|K_1| = \ldots = |K_t| \ge 3$. If $K$ is factorizable with $|K_i| = m$ for each $i$, then, since each $K_i$ acts as a regular subgroup of $S_m$, $K$ is a regular subgroup of $S_m \wr S_t$ in its primitive action on $\{1,\ldots,m\}^t$. Therefore no factorizable group is a B-group. This example appears as Theorem 25.7 in Wielandt Permutation Groups.

### Affine groups

An interesting source of examples is affine groups of the form $G = V \rtimes H$ where $V = \mathbb{F}_p^n$ for some prime $p$ and $H \le \mathrm{GL}(V)$. Here $V$ acts by translation on itself and $H$ is the point stabiliser of $0$. Any non-trivial block is invariant under the action of $V$ and so is a linear subspace of $V$. Therefore $G$ is primitive if and only if $H$ acts irreducibly on $V$. The action of $G$ is 2-transitive if and only if the point stabiliser $H$ is transitive. Therefore if there exists a linear group $H$ acting irreducibly but not transitively on $\mathbb{F}_p^n$, then $C_p^n$ is not a B-group.

#### Symmetric group representations

Consider the symmetric group $S_n$ acting on $\langle v_1, \ldots, v_n \rangle$ by permutation matrices. Let $V = \langle v_i - v_j : 1 \le i 2$ and $n \ge 2$ then the orbit of $v_1-v_2$ does not contain $v_2-v_1$, and hence $C_p^n$ is not a B-group in these cases. If $p = 2$ and $n \ge 4$ then the orbit of $v_1+v_2$ does not contain $v_1+v_2+v_3+v_4$. Therefore $C_2^m$ is not a B-group when $m$ is even and $m \ge 4$.

#### Exotic regular abelian subgroups of affine groups

An interesting feature of these examples, noted by Li in Remark 1.1 following the main theorem, is that $V \rtimes H$ may have regular abelian subgroups other than the obvious translation subgroup $V$. Let $t_v : V \rightarrow V$ denote translation by $v \in V$.
Take $n=2r+1$, fix $s \le r$, and consider the subgroup

$K = \langle (2,3)t_{v_1+v_2}, \ldots, (2s,2s+1)t_{v_1+v_{2s}}, t_{v_{2s+2}}, \ldots, t_{v_{2r+1}} \rangle$.

By the multiplication rule

$h t_v h' t_{v'} = h h' t_{vh' + v'}$

we see that $(2t,2t+1)t_{v_1+v_{2t}}$ has square $t_{v_{2t}+v_{2t+1}}$ and that the generators of $K$ commute. Therefore $K \cong C_4^s \times C_2^{2(r-s)}$, and no group of this form, with $r \ge 2$ is a B-group. (Since these groups are factorizable, this also follows from Wielandt’s result above.)

Li’s paper includes this example: the assumption that $n$ is odd, needed to ensure that $V$ is irreducible, is omitted. In fact it is an open problem to decide when $C_2^n$ is a B-group.

Perhaps surprisingly, this example does not generalize to odd primes. To see this, we introduce some ideas from an interesting paper by Caranti, Della Volta and Sala. Observe that if $K$ is a regular abelian subgroup of $G$ then for each $v \in V$, there exists a unique $g_v \in K$ such that $0g_v = v$. There exists a unique $h_v \in H$ such that $g_v = h_v t_v$. By the multiplication rule above we have

$h_u t_u h_v t_v = h_u h_v t_{uh_v + v} = h_vh_u t_{vh_u+u} = h_vt_vh_ut_u$

for all $u, v \in V$. Therefore $\{h_v : v \in V\}$ is an abelian subgroup of $H$ and $uh_v + v = vh_u + u$ for all $u,v \in V$. Replacing $v$ with $v+w$, we have

\begin{aligned} uh_{v+w} + (v+w) &= (v+w)h_u + u \\ &= vh_u + wh_u + u \\ &= uh_v + v + uh_w + w - u \end{aligned}

and so, cancelling $v+w$, we get the striking linearity property

$h_{v+w} = h_v + h_w - \mathrm{id}$.

Equivalently, $h_{v+w}-\mathrm{id} = (h_v-\mathrm{id}) + (h_w-\mathrm{id})$. Since $(h_v - \mathrm{id})(h_w - \mathrm{id}) = (h_vh_w - \mathrm{id}) - (h_v - \mathrm{id}) - (h_w - \mathrm{id})$, it follows that the linear maps $\{h_v - \mathrm{id} : v \in V \}$ form a subalgebra of $\mathrm{End}(V)$. (This is essentially Fact 3 in the linked paper.)

#### Abelian regular subgroups of odd degree affine symmetric groups

Suppose that $K$ is a regular abelian subgroup of $V \rtimes S_n$ and that there exists $v \in V$ such that $h_v \not= \mathrm{id}$. The matrix $M$ representing $h_v$ in the basis $v_2-v_1, \ldots, v_n-v_1$ of $V$ is a permutation matrix if $1 h_v = 1$. If $1h_v = a > 1$ and $bh_v = 1$ then from

$(v_i - v_1)h_v = (v_{ih}-1) - (v_a-v_1)$

we see that $M$ has $-1$ in each entry in the column for $v_a-v_1$, and the only other non-zero entries are a unique $1$ in the row for $v_i-v_1$, for each $i \not= b$. For example, $(1,2,3,4)$ is represented by

$\left( \begin{matrix} -1 & 1 & 0 \\ -1 & 0 & 1 \\ -1 & 0 & 0 \end{matrix} \right).$

By the linearity property above, $h_{2v} = 2h_v - \mathrm{id}$. But if $p > 2$ then $2M - I$ is not of either of these forms. Therefore $V$ is the unique regular abelian subgroup of $V \rtimes S_n$.

Suppose that $p = 2$. Suppose that $h_v$ has a cycle of length at least $4$. By relabelling, we may assume this cycle is $(1,2,\ldots,a)$ where $a$ is a 2-power. The matrix representing $h_v$ has $-1$ entries in column $2$, and the matrix representing $h_v^{-1}$ has $-1$ entries in column $a$. Therefore the matrix representing $h_v + h_v^{-1} + \mathrm{id}$ has $-1$ entries in both columns $2$ and $a$, and so is not of the permitted form. It follows that, when $p=2$ and $n=2r+1$ is odd, the only abelian regular subgroups of $V \rtimes S_n$ are isomorphic to $C_4^s \times C_2^{2r-2s}$. (In fact it seems they are precisely the subgroups constructed above.)

#### Abelian regular subgroups of even degree affine symmetric groups

Now suppose that $n$ is even. Let $U = V / \langle v_1 + \cdots + v_n \rangle$ and consider $U \rtimes S_n$. When $n=4$ the action of $S_4$ is not faithful: after factoring out the kernel $\langle (1,2)(3,4), (1,3)(2,4)\rangle$ we get $\mathbb{F}_2^2 \rtimes S_3 \cong S_4$, which has abelian regular subgroups $C_2 \times C_2$ and $C_4$. When $n=6$, there is an abelian regular subgroup isomorphic to $C_8 \times C_2$, generated by $t_{v_1+v_3}(3,4,5,6)$ and $t_{v_1+v_2}$. The obstruction seen above does not apply: for example, in the basis $v_1+v_3, v_1+v_4,v_1+v_5,v_1+v_6$, we have

$(3,4,5,6) + (3,6,5,4) + \mathrm{id} \mapsto \left( \begin{matrix} 1 & 1 & 0 & 1 \\ 1 & 1 & 1 & 0 \\ 0 & 1 & 1 & 1 \\ 1 & 0 & 1 & 1 \end{matrix} \right)$

which is the matrix representing $(1,2)(3,6,5,4)$, thanks to the relations present in the quotient module. However, in this case the action of $S_6$ on $\mathbb{F}_2^4$ is transitive (any element of $U$ is congruent to some $v_i + v_j$) and in fact $C_8 \times C_2$ is a B-group. (A related, remarkable fact, is that the action of $A_6$ on $U$ extends to a $2$-transitive action of $A_7$, giving an example of a $3$-transitive affine group, $U \rtimes A_7$.) A computer calculation shows that, up to conjugacy, there are also three abelian regular subgroups of $U \rtimes S_n$ isomorphic to $C_4 \times C_4$ and two isomorphic to $C_4 \times C_2 \times C_2$. If $n \ge 8$ then a similar argument to the odd degree case shows that any abelian regular subgroup has exponent at most $4$, so we get no new examples.

#### Elementary abelian subgroups other than $V$

The following lemma has a simple ad-hoc proof.

Lemma. If $n < p$ then the only regular abelian subgroups of $V \rtimes \mathrm{GL}(V)$ are elementary abelian.

Proof. Suppose that $K$ is an elementary abelian subgroup of $V \rtimes \mathrm{GL}(V)$. Let $h_v t_v \in K$. Since $h_v - 1$ is a nilpotent $n \times n$-matrix and $n \textless\; n$ we have $(h_v - 1)^p = 0$ The $p$-th power of $h_v t_v \in V$ is therefore $t_w$ where $w = v + vh_v + \cdots + vh_v^{p-1}$. Since each Jordan block in $h_v$ has size at most $p-1$, we have $w=0$. Therefore $K$ has exponent $p$. $\Box$

Note however that there exist elementary abelian subgroups other than the obvious $V$ whenever $n \ge 2$. Possibly they can be classified by the correspondence with nil-algebras in the Caranti, Della Volta, Sala paper.

#### Algebra groups

Given $n \in \mathbb{N}$ and a field $F$, define an algebra group to be a subgroup of $\mathrm{GL}_n(F)$ of the form $\{I + X : X \in A \}$ where $A$ is a nilalgebra of $n \times n$ matrices. (We may assume all elements of $A$ are strictly upper triangular.) The subalgebra property above shows that any abelian regular subgroup of an affine group of dimension $n$ over $\mathbb{F}_2$ is an extension

$1 \rightarrow C_2^r \hookrightarrow K \twoheadrightarrow \{I + X : X \in A \} \rightarrow 1$

of an algebra group by an elementary abelian 2-group. If $F$ has characteristic $2$ then, since

$(I+X)(I+Y) = (I+XY) + (I+X) + (I+Y)$

by the dual of the calculation above, a subgroup $G$ of $\mathrm{GL}_n(F)$ is an algebra group if and only if $\{ I + g : g \in G \}$ is additively closed.

An exhaustive search shows that if $n \le 4$ then every 2-subgroup of $\mathrm{GL}_n(\mathbb{F}_2)$ is an algebra group; note it suffices to consider subgroups up to conjugacy. But when $n = 5$ there are, up to conjugacy, 110 algebra subgroups of $\mathrm{GL}_5(\mathbb{F}_2)$, and 66 non-algebra subgroups, of which 34 are not isomorphic to an algebra subgroup. Of these:

• 1 is abelian, namely $C_8$;
• 4 are nilpotent of class 2, namely $\langle x, c : x^8 = c^2 = 1, x^c = x^5 \rangle$, $\langle x, y, c : x^4 = y^2 = c^4 = 1, x^c = xy, y^c = y \rangle$, $\langle x, y, c : x^2 = y^2 = c^8 = 1, x^c = xy, y^c = y \rangle$, $\langle x,y,c : x^4 = y^2 = c^4 = 1, x^c = xy, y^c = y\rangle$;
• 17 are nilpotent of class 3;
• 12 are nilpotent of class 4.

The appearance of $C_8$ is explained by the following lemma, which implies that there is an algebra group isomorphic to $C_{2^m}$ if and only if $m \le 2$.

Lemma. Let $g \in \mathrm{GL}_n(\mathbb{F}_2)$ have order $2^m$. Then $\langle g \rangle$ is an algebra group if and only if all Jordan blocks in $g$ have dimension at most $3$.

Proof. Let $1 + X$ be a Jordan block of $g$ of dimension $d$ and order $2^m$. The subalgebra of $d\times d$-matrices generated by $X$ has dimension $d-1$, and so has size $2^{d-1}$. Therefore if $\langle g \rangle$ is an algebra group then $2^{d-1} \le 2^m$. Hence $d-1 \le m$. On the other hand, since $X^{d-1} \not=0$, if $2^r < d$ then $(1+X)^{2^r} \not=0$, and so the order of $1+X$ is the minimal $s$ such that $2^s \ge d$. Hence $2^{m-1} < d$. Combining these inequalities we get

$2^{m-1} < d \le m+1$

and so $m \le 2$, and, since $d-1 \le m$, we have $d \le 3$. The converse is easily checked. $\Box$.

## Cheating ones way through linear algebra

March 5, 2016

Here are a few arguments that are possibly useful to prove things when one hasn’t set up all the usual machinery of vector spaces, subspaces and dimensions, presented in decreasing order of how seriously that could be used in a first course on matrix algebra. Since it fits well with the syllabus for the first year course I’m lecturing this term, I’m going to assume reduced row-echelon form is available. Throughout all matrices have coefficients in a field $F$.

### Invertible if and only if injective by row operations

Given a matrix $B$ in reduced row-echelon form it is almost trivial to write down all solutions to $B x = 0$. There are non-zero solutions if and only if not every column of $B$ contains a pivot. For example, the solutions to

$\left( \begin{matrix} 1 & 2 & 0 & 3 \\ 0 & 0 & 1 & 4 \\ 0 & 0 & 0 & 0 \end{matrix}\right) \left( \begin{matrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{matrix} \right) = \left( \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \right)$

are

$\left\{ \left( \begin{matrix} -2x_2-3x_4 \\ x_2 \\ -4x_4 \\ x_4 \end{matrix} \right) : x_2, x_4 \in F \right \};$

the pivot column variables are determined by the non-pivot variables, which may be chosen freely.

Let $f : F^n \rightarrow F^n$ be an invertible linear map represented by the matrix $A$. Obviously if $f$ is invertible then $f$ is injective. Conversely, suppose that $f$ is injective and that $A$ has reduced row-echelon form $B$, so $B = QA$ for some invertible matrix $Q$. If $B$ has a zero row then it has at most $n-1$ pivots, so by the remark above, $B$ has a non-zero kernel. Therefore $B$ has $n$ pivots and so is the identity matrix. Hence $I = QA$. Now this only shows that $A$ has a left inverse, but since $Q$ is a product of elementary matrices (corresponding to row operations), and each such elementary matrix is easily seen to be invertible (in the usual sense), $Q$ is invertible and so $A = Q^{-1}$ is invertible.

Very similarly one can show that $f$ is invertible if and only if $f$ is surjective.

### Linearly dependent if too many vectors

If $A$ is the $(n+1) \times n$ matrix whose rows are the components of $n+1$ vectors in $F^n$ (expressed in the standard basis) then the reduced row echelon form of $A$ has at most $n$ pivot columns, and so the bottom row is zero. Correspondingly an $F$-linear combination of the rows of $A$ is zero.

### Invertible if and only if injective by disguised minimal polynomial and Primary Decomposition Theorem

The vector space of all linear endomorphisms of $F^n$ is $n^2$ dimensional. By the previous result there is a non-trivial linear dependency between the powers of $A$. Let $c_0 I + c_1 A +\cdots + c_d A^d$ be a relation with $d$ minimal. If $c_0\not=0$ then formally multiplying through by $c_0^{-1}A^{-1}$ gives

$-c_0^{-1}c_1 I - c_0^{-1}c_2 A - \cdots - c_0^{-1}c_dA^{d-1}$

as a plausible (and clearly correct) guess for the inverse of $A$. If $c_0 = 0$ then $A^r g(A) = 0$ where $r \in \mathbb{N}$ and $g$ has degree $d-r$. By minimality of $d$, we have $g(A) \not=0$; if $w = g(A)v \not=0$ then $A^r w = 0$, so, choosing $s \in \mathbb{N}_0$ least such that $A^s w\not =0$, we find that $\ker A \not= 0$. So either $f$ is invertible, or $\ker f \not=0$, and so $f$ is not injective.

### Row rank is column rank

William P. Wardlaw published a very short proof in Mathematics Magazine (2005) 78 316—318. Let $A$ be a non-zero $n \times n$ matrix. Let $r$ be minimal such that there exists an $n \times r$ matrix $P$ and an $r \times n$ matrix $Q$ such that $PQ = A$. By minimality the rows of $Q$ are linearly independent, and clearly the row span of $Q$ contains the row span of $A$. Hence $r$ is the row rank of $A$. Dually, $r$ is the column rank of $A$.

### Linear dependencies by pigeonhole principle

Take $n+1$ vectors $v_1, \ldots, v_{n+1}$ in $\mathbb{F}_p^n$. There are $p^{n+1}$ linear combinations $a_1v_1 + \cdots + a_{n+1}v_{n+1}$ of these vectors, but only $p^n$ elements of $\mathbb{F}_p^n$, so two linear combinations with different coefficients are equal.

As I learned from this MathOverflow answer, one can make this argument work in $\mathbb{Z}^n$, and so in $\mathbb{Q}^n$ by scaling to remove denominators. Let $n+1$ non-zero elements of $\mathbb{Z}^n$ be given, and let $N$ be the largest component in absolute value. There are $(2M+1)^{n+1}$ $\mathbb{Z}$-linear combinations with coefficients in $\{-M,\ldots,M\}$, each of which has components in $\{-MN, \ldots, MN\}$. The pigeonhole principle strikes as soon as $(2MN+1)^n < (2M+1)^{n+1}$; clearly this is true for large enough $M$. For instance, since $(2MN+1)^n < 3^n N^n M^n$ and $2^n M^n M < (2M+1)^{n+1}$, it is sufficient if $(3/2)^n N^n < M$.

Question. Is there a trick to extend this argument to arbitrary fields?

### Modules over general rings

Yiftach Barnea pointed out to me that if $A$ is an $n \times n$ matrix with entries in a principal ideal domain $R$ then, regarded as a linear map $R^n \rightarrow R^n$, if $A$ is surjective then $A$ is injective. One proof (to my eyes, essentially the same as using the structure theorem for modules over a PID) is to use row and column operations to reduce to the case where $A$ is a diagonal matrix: then all the diagonal entries of $A$ must be invertible in $R$, so $A$ is itself invertible, and so injective. Clearly the converse holds if and only if $R$ is a field.

Suppose that $R$ is a unital ring containing a non-zero element $b$ with a left inverse. Then multiplication by $b$, i.e. $r \mapsto br$, is an injective linear map. This map is surjective if and only if $b$ has a right inverse. So for general rings, neither implication holds.

### Question.

Is there a nice characterization of rings $R$ such that if $f : R^n \rightarrow R^n$ is a surjective $R$-linear map then $f$ is injective?

My Ph.D student Bill O’Donovan told me about Hopfian objects. By definition, an $R$-module $M$ is Hopfian if and only if it has the property in the question, i.e. any surjective $R$-module endomorphism of $M$ is injective.

A related property is that $M$ is strongly Hopfian if and only if whenever $f : M \rightarrow M$ is an $R$-module homomorphism, the kernel chain

$\mathrm{ker} f \subseteq \mathrm{ker} f^2 \subseteq \mathrm{ker} f^3 \ldots$

stabilises. The relevant implication follows from the standard proof of Fitting’s lemma.

Claim. Strongly Hopfian implies Hopfian.

Proof. Suppose that $\mathrm{ker} f^n = \mathrm{ker} f^{n+1}$. Since $f$ restricts to a map $\mathrm{ker} f^i \rightarrow \mathrm{ker} f^{i-1}$ for any $i \ge 1$, we see that $f$ preserves $\mathrm{ker} f^n$, and acts on this space as a nilpotent map of degree $n$. Let $u \in \mathrm{ker} f^n$. Since $f$ is surjective, $u = f^n v$ for some $v \in M$, and by hypothesis, $f^n u = 0$. Hence $f^{2n} v = 0$. But $\mathrm{ker} f^{2n} = \mathrm{ker} f^n$, so $f^n v = 0$, hence $u=0$. Therefore $\ker f \subseteq \mathrm{ker} f^n = \{ 0 \}$ and $f$ is injective.

In particular, any Noetherian ring is Hopfian as a module for itself. There are Hopfian modules that are not strongly Hopfian modules, for example Proposition 2.7(4) in this paper by A. Hmaimou, A. Kaidi and E. Sánchez Campos gives the example of the $\mathbb{Z}$-module $\oplus (\mathbb{Z}/p_n \mathbb{Z}^n)$, where $p_n$ is the $n$th prime; the kernel chain for the endomorphism acting nilpotently with degree $n$ on the $n$th summand does not stabilise, but since any endomorphism preserves the summands, the module is Hopfian.

I asked on MathOverflow is there was a ring $R$ and a Hopfian $R$-module $M$ such that $M \oplus M$ was not Hopfian, and my worst suspicions were confirmed.

## The diagonal fallacy and James’ Submodule Theorem

December 9, 2014

The diagonal fallacy is my name for the mistaken belief that the direct sum decomposition of $\mathbb{R}^2$ as the direct sum $\langle (1,0) \rangle \oplus \langle (0,1) \rangle$ is in some way unique. It is frequently committed by undergraduates, many of whom have the even more mistaken belief that $\oplus$ somehow means $\cup$. (Probably they are being mislead by their deep insight that $\oplus$ and $\cup$ are both categorical coproducts.)

James’ Submodule Theorem is a fundamental theorem in the representation theory of the symmetric group. To give an illustrative example, fix a prime $p$, and let $M$ be the permutation module for $\mathbb{F}_pS_n$ acting on the set $\Omega$ of all ordered pairs $(i,j)$ with $1 \le i ,j \le n$ and $i\not= j$. Thus the elements of $M$ are formal linear combination of the elements of $\Omega$ with coefficients in the finite field $\mathbb{F}_p$. Let $\langle \ , \ \rangle$ be the bilinear form on $M$ defined on basis elements so that

$\langle (i,j), (i',j') \rangle = \begin{cases} 1 & (i,j) = (i',j') \\ 0 & (i,j)\not= (i',j'). \end{cases}$

Write $U^\perp$ for the orthogonal complement of a subspace $U$ of $M$ with respect to this form. Let

$b = \sum_{\sigma \in S_3} \sigma\; \mathrm{sgn}(\sigma)$.

The Specht module $S^{(n-2,1,1)}$ can be defined as the $\mathbb{F}_pS_n$-submodule of $M$ generated by the element

$e = (2,3)b = (2,3) - (3,2) + (3,1) - (1,3) + (1,2) - (2,1).$

Writing $U$ for $S^{(n-2,1,1)}$, a special case of James’ Submodule Theorem states that if $V$ is a $\mathbb{F}_pS_n$-submodule of $M$ then either $Vb = 0$, in which case $V \subseteq U^\perp$, or $Vb = \langle e \rangle$, in which case it follows from the definition of $U$ that $V \supseteq U$.

A corollary is that in any direct sum decomposition of $M$ there is a unique indecomposable summand containing $U$, namely the unique summand $Y$ such that $Yb \not= 0$. This summand $Y$ is unique up to isomorphism, by the Krullâ€“Schmidt theorem.

If $p$ is odd then $Y$ is in fact unique as a submodule of $M$, since then $Y$ is a summand of the submodule $\langle (i,j)-(j,i) : 1 \le i < j \le n\rangle$ of $M$, and this module is indecomposable if $p$ divides $n$ and otherwise semisimple with two non-isomorphic summands (so no `diagonal' submodules), namely the Specht modules $S^{(n-1,1)}$ and $S^{(n-2,1,1)}$. When $p=2$ the situation is more complicated, and I think it would be interesting to know whether $Y$ is unique as a submodule of $M$.

The following example shows one way uniqueness can fail.

Example. Let $F$ be a field and let $A = F[x]/(x^2)$. Then $A$ has a two-dimensional module $\langle v, s \rangle$ on which $x$ acts by $vx = s$, $sx = 0$, and a one-dimensional module $\langle w \rangle$ on which $x$ acts by $wx = 0$. There are distinct direct sum decompositions of $M = \langle v, s, w \rangle$, both with a unique summand containing $\langle s \rangle$, namely

$M = \langle v, s \rangle \oplus \langle w \rangle = \langle v + w, s \rangle \oplus \langle w \rangle.$

Note that $Mx = \langle s \rangle$, but evidently there is no ‘unique submodule theorem’ applicable to the image of the map $x : M \rightarrow M$.

## Jordan—Chevalley Decomposition

March 26, 2013

Let $C = \left( \begin{matrix} 0 & 1 \\ -b & -a \end{matrix} \right)$ be the companion matrix of the irreducible quadratic $f(x) = x^2 + ax+b$ over a field $K$. Let

$P = \left( \begin{matrix} C & 0 \\ I & C \end{matrix} \right)$

where $I$ is a $2 \times 2$ identity matrix and $0$ is the zero matrix. Here is a Question that is alarmingly unobvious: is $P$ semisimple?

Answer: it depends on the field and on the polynomial. For example, take $K = \mathbf{F}_2(t)$ where $t$ is transcendental and let $f(x) = x^2 + t$. Then

$P = \left( \begin{matrix} 0 & 1 & 0 & 0 \\ t & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 \\ 0 & 1 & t & 0 \end{matrix} \right)$

and an easy calculation shows that $f(P) = 0$. So $P$ has irreducible minimal polynomial and so must be semisimple. (Explicitly, if $e_1,e_2,e_3,e_4$ are the basis vectors corresponding to the rows of $P$ then the $K[M]$-module $\langle e_1,e_2,e_3,e_4 \rangle$ decomposes as

$\langle e_1, e_2 \rangle \oplus \langle e_3, e_1+e_4 \rangle$.)

Compare $P$ with the matrix

$Q = \left( \begin{matrix} 0 & 1 & 0 & 0 \\ t & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 1 & 0 & t & 0 \end{matrix} \right)$.

This matrix has minimal polynomial $g(x) = (x^2-t)^2$. One might hope that $Q$ could be put into Jordan Normal Form (with identity blocks replacing the usual $1$s down the subdiagonal), but of course this is false, because $P$ and $Q$ are not conjugate.

In fact if $K$ is any field and $P$ is a matrix with minimal polynomial $f(x)^m$ where $f(x) \in K[x]$ is irreducible, then $P$ can be put into Jordan Normal Form (with identity matrices on the subdiagonal) if and only if $f$ is separable. For a nice proof of this see Theorem 6.21 and Corollary 6.22 in Christopher Norman’s recent book Finitely Generated Abelian Groups and Similarity of Matrices over a Field. (I should add that this is one of its harder results. and that the general clarity and level of detail in the book will make it an excellent source for anyone learning the subjects in its title.)

Related to the example above is the following claim.

Claim $Q$ does not have a Jordan—Chevalley decomposition as a sum $Q = S + N$ where $S$ is semisimple, $N$ is nilpotent and $S$ and $N$ are polynomials in $Q$.

Proof: if $N = h(Q)$ then, since $h(Q)^m = 0$ for some $m$ we see from the minimal polynomial for $Q$ that $h(x)$ is divisible by $f(x) = x^2-t$. But

$f(Q) = \left( \begin{matrix} 0 & 0 \\ I & 0 \end{matrix} \right)$

and $Q - rf(Q)$ is not semisimple for any $r \in K$.

In fact the obstruction to the existence of a Jordan—Chevalley decomposition seen in this example is typical of the inseparable case. (This theorem is surely in the literature, but I do not know a reference.)

Theorem A matrix has a Jordan—Chevalley decomposition, in the sense defined above, if and only if it is conjugate to a matrix in Jordan Normal Form.

Outline proof Suppose that $C$ is an irreducible matrix with minimal polynomial $f(x)$. If $f(x)$ is separable then one can show by solving a system of equations in $f'(x)$ and its other derivatives, that there is a polynomial $k(x) \in K[x]$ such that if

$P = \left( \begin{matrix} C & 0 & 0 & \cdots & 0 & 0 \\ I & C & 0 & \cdots & 0 & 0 \\ 0 & I & C & \cdots & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & C & 0 \\ 0 & 0 & 0 & \cdots & I & C \end{matrix} \right)$

then $k(P)$ is equal to the block matrix with zero matrices on the diagonal and identity matrices below the diagonal. Now take $N = k(P)$ and $S = P - N$. (This extends to a sum of blocks of this form, and hence by primary decomposition to an arbitrary matrix.)

Conversely, suppose that $P$ is a cyclic matrix with minimum polynomial $f(x)^m$. If $f(x)$ is not separable then we can write $f(x) = h(x^p)+r$ for some polynomial $h(x)$ such that $h(0) = 0$ and some $r \in K$. If $N$ is a nilpotent matrix that is a polynomial in $P$ then $N = f(P)q(P)$ for some $q(x) \in K[x]$. So if $S = P - N$ is semisimple then $f(P - f(P)q(P)) = 0$. The left-hand side is equal to $h(P^p - f(P)^pq(P)^p) + r$, which in turn can be written as $f(P)$ plus the product of $f(P)^pq(P)^p$ and a two variable polynomial in $P$ and $f(P)^pq(P)^p$. It follows that $f(P)$ is equal to $f(P)^p a(P)$ for some polynomial $a(X) \in K[x]$. But $f(P)$ has nilpotency degree $m$, and $f(P)^p$ has nilpotency degree at most $m/p$, so we have a contradiction unless $m=1$ and $P$ is semisimple.