Q is for quantum

October 24, 2021

The purpose of this post (in progress as of late October 2021) is to serve partly as a review of Terry Rudolph’s book Q is for quantum and partly as a record of my thoughts after a few months of reading about quantum computation. The first part of Ruldolph’s book can be freely downloaded from his website.

Part 1: Q-computing

Q is for quantum begins with a version of the Stern–Gerlach experiment. We are asked to imagine that black and white balls are sent through a box called ‘PETE’: the afterword makes clear the name comes from a collaborator of Rudolph. After a black ball goes through PETE, it is observed to be equally likely to be black and white. The same holds for white balls.

Moreover, the black balls that come out are indistinguishable by any physical experiment from the black balls that come in, and the same holds for white balls. For example, if the white balls coming out are discarded, and the black balls passed through a second PETE box, then again the output ball is equally likely to be black and white.

But if two PETE boxes are connected so that a white ball put into the first one passes immediately into the second, the output is always white.

Naturally we are suspicious that, despite our earlier failure to distinguish balls output from PETE boxes from fresh balls, the first PETE box has somehow ‘tagged’ the input ball, so that when it goes into the second box, the second PETE box knows to output a white ball. When we test this suspicion by observing the output of the first PETE box, we find (as expected) that the output of the first box is 50/50, but so is the output of the second. Somehow observation has changed the outcome of the experiment.

This should already seem a little surprising. But if we accept that observed balls coming out of PETE are experimentally indistinguishable from fresh balls then, as Rudolph persuasively argues, we have to make a more radical change to our thinking: there is no way to interpret the state of an unobserved ball midway between two PETE boxes as either black or white. As Rudolph says, ‘We are forced to conclude that the logical notion of “or” has failed us’. This quickly leads to an informal idea of superposition and Rudolph’s central concept of a misty state. For example, the state | \psi\rangle shown above is drawn as the diagram below.

Rudolph then makes the bold step of supposing that when a black ball, represented by | 1\rangle is put through PETE, the output is \frac{1}{\sqrt{2}}( |0\rangle - |1\rangle); of course this is drawn in his book as another misty state, this time with a minus sign attached to the black ball. Does this contradict the experimental indistinguishability of output balls? Not according to Rudolph:

It isn’t a “physically different” type of black ball: if we looked at the ball at this stage we would just see it as randomly either white or black. No matter what we do, we won’t be able to see anything to tell us that when we see it as black it is actually “negative black”

In the mathematical formalism, this is the unobservability of a global phase: something one must accept as an exiom. Rudolph’s explanation of this point is characteristically clear. The behaviour of a white ball sent into two PETE boxes in immediate succession is easily explained using superposition: Rudolph’s ‘misty’ diagram on page 20 becomes the following calculation:

\begin{aligned} |0\rangle &\mapsto \frac{1}{\sqrt{2}} (|0\rangle + |1\rangle) \\ &\mapsto \frac{1}{\sqrt{2}} \bigl( \frac{1}{\sqrt{2}} (|0\rangle + |1\rangle) + \frac{1}{\sqrt{2}}( \frac{1}{\sqrt{2}} (|0\rangle - |1\rangle) \bigr) \\ &= |0\rangle. \end{aligned}

It is an impressive feature of Rudolph’s book that he is able to use the PETE box and his misty state diagrams to illustrate many of the key ideas in quantum computation. It is of course a routine translation to pass to the usual bra/ket formalism. As one might hope, this infusion of mathematics has some advantages. For instance, we can now require that states evolve by unitary transformations, making the ‘bold’ step above intuitive: once we know that H|0\rangle = \frac{1}{\sqrt{2}} (|0\rangle + |1 \rangle), unitary dynamics require that H |1 \rangle \mapsto \frac{1}{\sqrt{2}} (|0 \rangle - |1 \rangle). Thus, as my notation in the diagrams above suggests, the PETE box is the Hadamard gate

H = \frac{1}{\sqrt{2}} \left( \begin{matrix} 1 & 1 \\ 1 & -1 \end{matrix} \right).

If Rudolph’s white/black ball thought experiments and the associated formalism have a weakness, I think it may be more physical than mathematical.

For instance, the Stern–Gerlach experiment was first performed with silver atoms: the atomic number of silver is 47 = 2 + 8 + 18 + 18 + 1, and correspondingly there is a single electron in the 4f subshell, whose spin can be detected by passing the atoms through a magnetic field. Thus silver atoms replace balls and magnetic fields replace the PETE boxes. Using further magnetic fields one can separate the spin-up post-PETE atoms from the spin-down post-PETE atoms, and then send the atoms on separate journeys: if at the end they meet up (without observation at any stage), the same interference occurs, and the output is always spin-down. To stretch the analogy this far, Rudolph would have had to introduce some machine that separated the two colours, potentially confusing the story. (Why does observation by the machine not destroy the interference pattern?)

Further effects, for example that spin-up atoms are (essentially by definition) deflected up by up/down magnetic fields, but are equally likely to go either way in a left/right magnetic field, have no analogue at all. But Rudolph has already made a convincing (and entirely honest) case that quantum phenomena cannot be understood by classical physics or conventional probability.

Incidentally, I highly recommend this MIT lecture by Allan Adams for an entertaining treatment of the Stern–Gerlach experiment that has some features in common with Rudolph’s presentation, for instance, the use of colour to replace spin.

Playing the majority game with boolean formulae

September 26, 2021

In the majority game, there are n numbered balls, each either black or white. One knows that black balls are in the majority. At each step, one may ask

‘are balls i and j the same colour’

and get a truthful answer. How many questions does it take to find a black ball? This problem was first solved by Saks and Werman, who proved the elegant result that n-B(n) questions are necessary and sufficient, where B(n) is the number of 1s in the binary form of n.

It is a nice exercise to find a questioning strategy that meets this target: if you discover something fundamentally different to the ‘Binary Knight Hunt’ from Section 2 of this paper, I would like to know about it! (The name of this strategy comes from a reframing of the problem in which black balls become Knights, who always tell the truth, white balls become Knaves, who always lie, and a comparison between balls i and j becomes the question `Person i, is Person j a knight?’.) The harder part of the Saks—Werman result is that n-B(n) questions are necessary. This part was generalised by John Britnell and me to the case where black balls outnumber white balls by a specified excess.

A related problem asks for the minimum number of questions to identify the colour of every ball. After the Binary Knight Hunt finds a black ball, the question graph is a forest: turning it into a tree by connecting the component of the known black ball to all other components identifies all the balls in n-1 questions; the final question graph is a tree. When n is odd, there are 2^{n-1} possible sets of black balls (for each partition \{1,\ldots, n\} = A \cup B one may colour the larger set black), and so we must learn n-1 bits of information. Therefore this strategy is optimal. For even n the same result holds, but requires a bit more work to prove: see Theorem 4 in this later paper of mine.

The majority game for a boolean function

To state the intended generalisation we need some more notation.
Let c_i be the colour of ball i, identifying black with 0 and white with 1. A comparison of ball i with ball j reveals c_i + c_j. This is F(c_i,c_j) where F is the two-variable boolean formula

F(x,y) = x + y.

In this post we present some preliminary results and open problems on generalizations of the majority game in which f is replaced with an arbitrary boolean formula.

Note that, in the original game, we cannot know for sure a ball is of the minority colour unless it has been compared with a ball known to be of majority colour. This observation has no analogue when comparison is replaced with evaluation of an arbitrary boolean formula. For each boolean formula we therefore have three variants of the majority game.

  1. Find a ball of known colour.
  2. Find a ball of the majority colour.
  3. Find the colour of every ball.

For simplicity we shall assume from now on that n=2m+1 is odd.

Linear formulae

The relevant formulae are F_\ell(x_1,\ldots,x_{\ell}) = x_1 + \cdots + x_\ell. We immediately note that if \ell is odd then F_\ell(c_i,\ldots, c_i) = c_i, and so we can determine a ball’s colour in just one evaluation, making (1) trivial. For (2), the first m questions may find white balls, so m+1 evaluations are necessary and sufficient. For (3) the obvious strategy uses n questions.

Problem 1. Is it possible, using F_\ell when \ell is odd and \ell \ge 3, to find the colour of every ball using n-1 evaluations?

An attempt to block this strategy by requiring the arguments of F_\ell to be distinct is easily subverted, provided that n \ge 2\ell - 1. (Equivalently, m \ge \ell-1.) Begin by evaluating F(c_1,\ldots,c_{\ell-1},c_k) for \ell distinct choices of k \in \{\ell, \ldots, n). Order the results so that

\begin{aligned} c_1 + \cdots + c_{\ell-1} + c_{i_1} &= \cdots = c_1 + \cdots + c_{\ell-1} + c_{i_s} = 0 \\ c_1 + \cdots + c_{\ell-1} + c_{j_1} &= \cdots = c_1 + \cdots + c_{\ell-1} + c_{j_t} = 1.\end{aligned}

Since s+t = \ell, exactly one of s and t is odd. If it is s then

c_{i_1} + \cdots + c_{i_{s-1}} + c_{j_1} + \cdots + c_{j_t} = 0;

if it is t then

c_{i_1} + \cdots + c_{i_s} + c_{j_1} + \cdots + c_{j_{t-1}} = 0.

Therefore we obtain \ell-1 balls k_1, \ldots, k_{\ell-1} c_{k_{\ell-1}} such that c_{k_1} + \cdots + c_{k_{\ell-1}} = 0. Since F(c_{k_1}, \ldots, c_{k_{\ell-1}}, c_r) = c_r we are again in the happy position of being able to learn a ball’s colour with a single evaluation. This gives a strategy for game (1) using \ell+1 evaluations. For game (2) we use \ell + m evaluations

F(c_{k_1}, \ldots, c_{k_{\ell-1}}, c_j)

for m+1 distinct j, each not equal to any of k_1, \ldots, k_{\ell-1}; this is possible since m+1 + (\ell-1) \le m+1 + m \le n.

Problem 2. Find optimal strategies for games (1), (2) and (3) using F_\ell when repeated arguments are disallowed.

Problem 3. Show that game (1) cannot be won if n \le 2\ell - 2.

When \ell is even one can use a similar trick to reduce to the case \ell=2 by noting that F(x,\ldots, x, y) = x+y. If distinct arguments are required then one can use \ell-1 questions to find balls k_1, \ldots, k_{\ell-2} such that c_{k_1} + \cdots + c_{k_{\ell-2}} = 0; then

F(c_{k_1}, \ldots, c_{k_{\ell-2}}, c_i, c_j) = c_i + c_j.

Problem 4. Do these reductions give optimal strategies for the three problems?

The results so far suggest that games (1) and (2), with repeated arguments permitted, are the most appealing problems.

Majority vote

An important example of a non-linear boolean function is the 3-way majority vote function

M(x,y,z) = xy + yz + zx.

Since M(x,x,x) = x, we must require distinct arguments to avoid the reduction to Problem 1. With this condition, observe that if

M(x,y,z) \not= M(x,y,w)

then z \not= w, and since swapping z and w affected the majority, we must also have x \not= y; now M(x,y,u) = u for any u. This suggests the following strategy: evaluate M at disjoint triples c_i, c_j, c_k until both values 0 and 1 are seen. Let

M(u,v,w) \not= M(u',v',w').

Now, interpolating between the sets by evaluating M(u,v,w') and M(u,v',w'), one can find x and y with x \not= y. As above, one further question will win game (1) and one can use very similar ideas to win games (2) and (3).

Unfortunately, if there are few white balls, it may take many evaluations to find a triple in which white balls are in the majority. For instance, suppose that n=2^4-1 = 15 and there are just two white balls. We identify \{1,\ldots, 15\} with \mathbb{P}(\mathbb{F}_2^4) and cover the 2-subsets of points of \mathbb{P}(\mathbb{F}_2^4) using the

\displaystyle \binom{4}{2}_2 = \frac{(2^4-1)(2^4-2)}{(2^2-1)(2-1)} = 35

lines in \mathbb{P} (\mathbb{F}_2^4). Since there are \binom{15}{2} = 105 distinct 2-subsets, and each 3-subset contains 3 2-subsets, this is the minimum possible number of 3-subsets. Still the questioning strategy requires 35 evaluations (plus those to find x and y). Generally if n=2^r-1 then there are O(n^2) lines, and so O(n^2) evaluations may be necessary. Thus the attractive feature of the majority game (and many related Knights and Spies problems) that only O(n) questions are needed is lost.

It therefore seems natural to require that there are exactly m white balls and m+1 black balls. Suppose, again for simplicity, that m = 2p+1 is odd. If we evaluate M at p distinct triples (using 3p of the 4p+3 balls) and get 0 every time, then we have used up at least 2p black balls and at most p white balls. Hence any remaining triple must have white balls in the majority. This gives a strategy for games (1) and (2) using n/4 + O(1) questions, and for game (3) using 5n/4 + O(1) questions.

Problem 5. Are these strategies optimal?

Problem 6. Generalize to the majority-vote function with any odd number of variables.

The majority game with values in \mathbb{Z}/q\mathbb{Z}

Another generalization that might be of interest is to allow q different colours, labelled by elements of \mathbb{Z}/q\mathbb{Z}, with strictly more than n/q balls of colour 0. We now interpret + as addition in \mathbb{Z}/q\mathbb{Z}. Note this is not the same as the ‘plurality problem’ considered for three colours in this paper of Martin Aigner, Gianluca De Marco, Manuela Montangero, in which the result of a comparison is the familiar ‘same’ / ‘different’ from the majority game, rather than ‘0 so same’, ‘1 so different’, ‘2 so different’ as here. The authors’ main result is that between 3 \lfloor n/2 \rfloor - 1 and \lfloor 5n/3 \rfloor -2 comparisons are necessary and sufficient to find a black ball.

Problem 7. Since \log_2 q bits of information are learned in each evaluation, we can expect that fewer evaluations are necessary. What are optimal strategies for each game?

Finally one could consider a combined generalization in which an arbitrary function G : (\mathbb{Z}/p\mathbb{Z})^t \rightarrow \mathbb{Z}/p\mathbb{Z} is evaluated.

Engaging students from diverse backgrounds

September 17, 2021

On 14 September 2021 Royal Holloway Mathematics held a meeting Engaging students from diverse backgrounds, organised by Stefanie Gerke and me, and sponsored by the London Mathematical Society. The speakers were invited to give practical advice on the challenges of teaching university level mathematics. I won’t try to summarise all aspects of the four interesting talks, or all the discussion they stimulated, but here at least is a partial record.

Lara Alcock

Lara Alcock first spoke at Royal Holloway back in 2015. I wrote up this meeting here. For many of us, this was the first time we had heard anyone talk about research in mathematical pedagogy; the event spurred several of us to experiment with new teaching approaches, including the ‘flipped classroom’. Lara’s preferred model was clear from her title: Tilting the classroom: engaging students in large classes. As her abstract promised, she gave 18 (a mathematical 18, there were probably more) straightforward practices to improve student engagement in large lectures. Here are some that stood out for me.

  • Put administrative announcements on the board / projector as students come in.
  • Start with simple multiple choice questions that test students on material they are supposed to have prepared. I think something like this is essential: students won’t bother preparing unless they know we are serious about it, and this means investing time in live-sessions.
  • Give students `gappy’ notes or other activities that they complete during the lecture or during pauses in online videos. For instance, the matching exercise shown below, which seems a nice way to demystify the dry-as-dust field axioms.
  • Give students guidance on how to study: Lara mentioned a study of Bjork et al showing there are powerful misconceptions on this issue: one conclusion (see after Figure 3) is that learners typically underestimate the improvement they can make due to studying, but also underestimate the amount they will forget without periodic reinforcement.
  • Self-explanation training: how to think deeply about a proof in a way that goes beyond paraphrasing it.
  • Ask questions that may split the audience, for instance, which of \implies, \Leftarrow, \iff is correct in this gap? `Whatever you think, half the people disagree with you … do you want to change your mind?’.
  • Read and explain a proof to the person next to you (viable even with social distancing).

The context was a first year analysis course, a subject that is a notorious cause of student disengagement. It was clear from Lara’s talk and the comments from students that her approach worked, and that students rose to the challenge of her course, both when delivered using live large lectures and, post-Covid, with online lectures. Lara’s enthusiasm for the large lecture as an effective way to teach students shone through: she ended by remarking `I’d much rather have large lectures back: I miss the atmosphere’. For more on of Lara’s approach to analysis teaching, I highly recommend her book, How to think about analysis.

Barrie Cooper

Barrie Cooper’s talk came second and touched on many themes developed more fully in the other three talks. His title was Competencies, challenge, and co-curation: selected strategies for inclusive assessment in mathematics. He started by asking the audience what the terms ‘diversity’ and ‘inclusion’ meant to them: answers included the students’ race, gender and social capital (a useful notation I think: easy to take it for granted when one has it, and easy to assume others have it when they don’t), and the preparedness of our students for academic study. He then discussed strategies for designing inclusive assessments. One key point was that we should make clear what is expected at each level: what do students need to do to pass, to get a 2:1, and to get a first? Not making these expectations clear may particularly hurt students coming from disadvantaged backgrounds.

Barrie pointed out that designing inclusive assessment meant assessing different skills, and assessing in different ways. For example, his course (on graph theory) had only 30% of credit on the final exam, which was open book. Part of the balance was made up of ‘competency based assessment’: basic tests of understanding on which the required pass mark was 100%. Tools such as Stack, Numbas and, more ambitiously, GitHub Classroom were mentioned.

Barrie finished by going into more detail on ‘co-curation’ in the topical context of epidemic modelling on a dynamic network with probabilistic infection. Students used the R-package epimodel, working in small groups to generate results by experimentation with the model, and sharing whatever resources they found most useful — not always those that Barrie had anticipated. It all sounded like a great exercise to me: both mathematically deep, but also bringing in wider concerns, and open-ended in that (as we’ve seen many times in the last 18 months) there’s no unique right answer.

Matthew Inglis

Matthew Inglis‘ title was Individual differences in students’ use of optional learning resources and concerned two cohorts of students: one doing a mathematics degree, the other studying mathematics as part of an engineering degree. The resources available to them were

  • Traditional live lectures;
  • Mathematics Learning Support Centre (like a continuous workshop with a staff member always on call);
  • Online video lectures and other E-resources.

Usage was tracked throughout the term. It was quickly clear that there were four distinct clusters: the `traditionalists’ who attended live lectures, the `MLSCers’ who made high use of the support centre, the `E-learners’, and (his term) the `slackers’, who did not make much use of anything.

As his graph above shows, the clusters were very pronounced: despite all our efforts with `active blended learning’, it seems students stuck with one learning mode throughout their course. Matthew presented extensive data showing that the traditionalists and MLSCers did best at the exam, the E-learners did significantly worse, and the slackers did very badly indeed. The most significant correlation across all clusters for attainment was with live-lecture attendance, even after normalising the data to take into account the different courses.

Matthew ended by speculating on what would happen if the online lectures were withdrawn? Would the E-learners simply become slackers? Or would they move into another cluster? Since the E-learners did less well than the other two clusters, is producing online videos a good investment of our time? When I lectured cryptography last year, I put a vast effort into making short online videos: even without editing (but some re-recording) I think this took me far longer than delivering the lectures. They were very popular with the students, and attainment was good, but I certainly can’t give an unqualified `yes’ in answer to Matthew’s question. As he said, offering students too much choice makes work for their lecturers.

Maurice Chiodo

After a break for lunch we finished with Maurice Chiodo‘s talk Societal responsibility and ethical conduct in mathematics. He started by challenging the idea that pure mathematics is a universal force for good, or, at worst, morally neutral. Instead, like any powerful tool, it can be used for good or harm, and the harm may arise inadvertently.

A compelling example of this is the use of machine learning algorithms in predictive policing: concentrating police in one area may well find more crime, but at the expense of leaving other crime undetected, and creating a feedback loop that damages community relations. Other examples included the epidemic modelling from Barrie Cooper’s talk, the A-level results fiasco in September 2020, and what we know from the Snowden leaks about mass internet surveillance.

A recurring theme was that models based on unsound implicit assumption could be mathematically sound, but lead to the wrong conclusions. For instance, incredibly enough, the model used to predict the spread of Covid-19 in care homes failed to consider that many staff (often employed by agencies) moved between care homes. Another instance comes from finance: given M independent risky assets each with a normal \mathrm{N}(0,\sigma^2) return, their mean has variance \sigma^2/ M, so mathematically is a safer investment. (This is the basis of the infamous Collatoralized Debt Obligation, peddled enthusiastically by investment banks such as Goldman Sachs who rewarded their clients by then betting against the asset they had just purchased.) As the 2008 Financial Crisis showed, the independence assumption is utter garbage. Yet another worrying example comes from the use of machine learning in sentencing algorithms: the algorithm faithfully learns to perpetuate past racial biases.

Maurice then made the key point that students `fail to hear us say that there are ethical issues’. If they are important (and they are), why don’t we tell them? Successfully introducing ethics into the mathematics curriculum has three requirements:

  • A seminar series or lecture series or course on societal responsibility in mathematics;
  • Mathematical exercises with an ethical component;
  • Students must see that all their lecturers take ethics seriously.

As a step towards this, next time I lecture a cryptography course I will set explicit questions on the ethnical issues, rather than just leave them to asides in lectures or printed notes. I hope Maurice’s talk will encourage others to make similar changes: this seems to be to be a valuable way we can broaden our curriculum and maybe even increase student engagement.

Maurice made many further interesting points. For instance, mathematics students appear to gravitate to a utilitarian perspective on ethics; he mentioned the Trolley Problem as a way to counter this. (Even before the Fat Man makes his appearance one can challenge strictly utilitarian `body-count’ ethics by making the person on the second track a close relative, rather than the typical stranger.) He finished by saying that `He was far more scared of 10 mathematicians than 10 lawyers … so why is it that the lawyers get ethical training but we don’t?’


We were lucky to get four interesting talks by excellent speakers. We had postponed the meeting from May 2020 in the hope of holding it face-to-face, but still, the online format enabled a broader attendance from people around the UK.

All four talks had practical advice, but also surprises to offer: how many of us would have predicted the four sharp clusters in Matthew Inglis’ talk (well perhaps, the existence of the ‘slacker’ cluster was no surprise), or the enthusiasm for live lectures shown by Lara’s students.

The final discussion session also offered much to think about: for instance, my not-so-secret view is that university education by intelligent, well-read, liberally minded people has a civilizing influence and value that goes beyond the subject specific skills and knowledge we hope to impart; but this view can rightly be challenged as patronising to our students. Barrie’s talk in particular made me realise that ‘decolonizing the curriculum’ isn’t just the buzz-phrase de jour: there are real practical changes we should all be making to ensure that the mathematics we love so much remains accessible to our changing student intake.

Comments are very welcome and can be posted below.

The Hamming code, quadratic residue codes and an exceptional isomorphism

July 4, 2021

As a pleasant way to pass a Sunday afternoon — or at least, more pleasant than flat-hunting in Bristol in a wildly inflated property market — let me offer yet another way to prove the exceptional isomorphism \mathrm{GL}_3(\mathbb{F}_2) \cong \mathrm{PSL}_2(\mathbb{F}_7). An earlier post uses modular representation theory to do in about a screenful of WordPress. Here we do it using the well known result that, up to position permutation, there is a unique perfect 1-error correcting linear binary code of length 7.

The Hamming code

We define the [7,4,3]Hamming code C to be the linear code of length 7 and dimension 4 with parity check matrix

H = \left( \begin{matrix} 1 & 0 & 1 & 0 & 1 & 0 & 1 \\ 0 & 1 & 1 & 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 1 & 1 & 1 & 1 \end{matrix} \right).

Since each non-zero binary word of length 3 appears exactly once as a column of H, the syndromes for the 7 possible one bit errors, namely

H(1,0,0,0,0,0,0)^t, \ldots, H(0,0,0,0,0,0,1)^t

are distinct. Hence C is 1-error correcting, and therefore has minimum distance \ge 3. Since the disjoint Hamming balls of radius 1 about the codewords each contain 1 + \binom{7}{1} = 8 binary words of length 7, and 8 \times 2^4 = 2^7 = |\mathbb{F}_2^7|, these balls partition \mathbb{F}_2^7. Therefore C is a perfect 1-error correcting linear binary code.

Lemma. The automorphism group of C is a subgroup of S_7 containing \mathrm{GL}_3(\mathbb{F}_2).

Proof. Each g \in \mathrm{GL}_3(\mathbb{F}_2) permutes the 7 non-zero elements of \mathbb{F}_2^3. Let \sigma_g \in S_7 be the position permutation of the columns of H induced by g. For example,

g = \left( \begin{matrix} 0 & 0 & 1 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{matrix} \right) \implies gH = \left( \begin{matrix} 0 & 0 & 0 & 1 & 1 & 1 & 1 \\ 1 & 0 & 1 & 0 & 1 & 0 & 1 \\ 0 & 1 & 1 & 0 &0 & 1 & 1 \end{matrix} \right)

and so in this case \sigma_g = (1,4,2)(3,5,6). (Since I am hyper-paranoid about such things, I will verify that g \mapsto \sigma_g is a homomorphism by comparing

g(g'H) = g (H . \sigma_{g'}) = (gH) \cdot \sigma_{g'} = H \cdot \sigma_g \sigma_{g'}

with (gg') H = H \cdot \sigma_{gg'}, where \cdot denotes the column permutation action.) Now writing P(\sigma) for the permutation matrix of \sigma (acting on the right as usual when composing permutations left to right, so P(\sigma)_{ij} = [i\sigma = j]), we have

v (gH)^t = v (HP(\sigma_g))^t = v P(\sigma_g)^t H^t = v P(\sigma_g^{-1}) H^t = (v \cdot \sigma_g^{-1}) H^t

for any v \in \mathbb{F}_2^7. If v \in C then the left-hand side is

v H^t g^t = 0 g^t = 0.

Therefore v \cdot \sigma_g^{-1} \in C, and indeed the image of the homomorphism \sigma is a group of position permutations of C. For instance, in the example above, (1,1,1,0,0,0,0)(gH)^t = 0 and correspondingly, (1,1,1,0,0,0,0)\sigma_g = (1,0,0,1,1,0,0) is a codeword in C. Since the group homomorphism g \mapsto \sigma_g is injective, the lemma follows. \Box.

Another nice way to prove the lemma is to calculate the 7 words in C of weight 3. By the definition above, these have non-zero entries in the positions

\{1,2,3\}, \{1,4,5\}, \{2,4,6\}, \{1,6,7\}, \{2,5,7\}, \{3,4,7\}, \{3,5,6\},

as shown below as the lines in the Fano plane \mathbb{P}^2(\mathbb{F}_7^3).

Quadratic residue codes

We may construct \mathbb{F}_{2^3} as \mathbb{F}_2(\alpha) where \alpha has minimal polynomial x^3 + x + 1. The binary quadratic residue code of length 7 is the cyclic code D whose generator polynomial has as its roots the powers of \alpha corresponding to the 3 quadratic residues modulo 7, namely 1, 2 and 4. That is,

g(x) = (x-\alpha)(x-\alpha^2)(x-\alpha^4).

Since these roots are conjugate under the Frobenius automorphism \beta \mapsto \beta^2 of \mathbb{F}_{2^3}, the generator polynomial is the minimum polynomial of \alpha; that is g(x) = x^3 + x + 1. Thinking of D as the ideal \langle x^3+x+1 \rangle of \mathbb{F}_2[x]/\langle x^7+1 \rangle, we have v(x) \in D if and only if v(\alpha) = v(\alpha^2) = v(\alpha^4) = 0. From this it is easy to see that D has minimum weight 3 and so, as before, D is a perfect 1-error correcting binary code. (More generally, in a quadratic residue code of odd prime length n, the minimum distance d satisfies d^2 \ge n.) The same argument holds for the cyclic code E = \langle x^3 + x^2 + 1 \rangle whose generator polynomial is (x-\alpha^3)(x-\alpha^5)(x-\alpha^6) is defined using the 3 non-residues modulo 7.

By the uniqueness result mentioned earlier, the codes C, D and E are all equivalent. Fix a bijection \star : E \rightarrow D.

We now identify the positions of D with \mathbb{F}_7. Since D is cyclic, the position permutation j \mapsto j + 1 is an automorphism of D. This gives one element of \mathrm{PSL}_2(\mathbb{F}_7). To obtain the rest, we need the following two claim.

Claim. If r \in \{1,2,4\} then the position permutation j \mapsto rj preserves D.

Proof. Thinking of D as an ideal, we have v(x) \in D if and only if v(\alpha) = v(\alpha^2) = v(\alpha^4) = 0, so if and only if v(x^2) \in D. This deals with the case r=2, and r=4 is obtained by squaring. \Box.

Claim. If s \in \{3,5,6\} then the position permutation j \mapsto tj sends D to E. The position permutation 0 \mapsto 0 and t \mapsto t^{-1} sends D to E.

Proof. For the first part, use that the generator polynomial for E is defined using the 3 non-residues modulo 7. The second part follows similarly, since \beta is a root of x^3 + x + 1 if and only if \beta^{-1} is a root of x^3 + x^2 + 1. \Box

We may therefore define an action of \mathrm{PSL}_2(\mathbb{F}_7) on D by identifying the positions of codewords with \mathbb{F}_7 and then applying the corresponding position permutation, followed by \star if (according to the second claim above), we end in E rather than in D. Using that \mathrm{PSL}_2(\mathbb{F}_7) is a simple group, we see this homomorphism is injective.


We have shown that the automorphism group, G say, of a perfect 1-error correcting linear binary code contains subgroups S and T isomorphic to the finite simple groups \mathrm{GL}_3(\mathbb{F}_2) and \mathrm{PSL}_2(\mathbb{F}_7) of order 168. Setting H = S \cap T we have |G| \ge 168^2 / |H|. Since G \le S_7, and 168 = 2^3 . 3 .7 and 7! = 2^4.3^2. 5.7, we have |H| \ge 2^2 . 7 = 28. Hence H has index at most 6 in S. The coset action of S on H now gives a homomorphism S \rightarrow S_6; since \mathrm{GL}_3(\mathbb{F}_2) is simple and has elements of order 7, this homomorphism must be trivial. Therefore H = S. Similarly H = T. Hence S = T and we have the required automorphism.

Collisions in Greg Egan’s Orthogonal physics

April 12, 2021

In Greg Egan’s remarkable Orthogonal Trilogy, physics differs from ours in one crucial respect: the Lorentz metric c^2\mathrm{d}t^2 - \mathrm{d}\mathbf{x}^2 is replaced with the Riemannian metric \mathrm{d}t^2 + \mathrm{d}\mathbf{x}^2, putting time on the same footing as space. As one would expect from this author, the consequences are worked out in great detail, and are integral to the plot. This comes at some predictable expense to characterisation and versimilitude: still I’ve found the first two books sufficiently interesting that I can (almost) ignore that much of the exposition consists of the characters giving each other physics tutorials, conducting experiments (which remarkably always work, even if the results often come as a surprise) or listening to each other give lectures and seminars. That said, the physics is leavened by some set-piece ethical dilemmas, and there is also a well-developed biological theme, concentrated on the unique (but not completely implausible) morphology of the Orthogonal race.

The purpose of this post is to do one of the exercises left implicitly to the reader in the second book, The Eternal Flame, in which it is observed that when a photon is deflected by a stationary particle of ordinary matter (called a `luxagen’ in the trilogy), it may glance off at two different angles. Moreover, there is a maximum deflection angle, which is independent of the energy of the photon.

Here I’ll show that this follows from the relation between energy and momentum in Orthogonal physics, and the two conservation laws, and implies that Orthogonal photons are heavier than luxagens. I’ll also show that the same behaviour holds in our universe, whenever the moving particle is heavier than the particle it strikes. My solution uses piles and piles of algebra to arrive at a surprisingly simple answer. Egan’s book hints at a simple trigonometric solution: made more plausible because his characters are of course brilliant at Euclidean geometry, thanks to the double motivation from their physics and our shared mathematics. Or maybe there is a better way to organize the algebra, perhaps starting `We take the Lagrangian …’, or maybe `By working in the reference frame in which the centre of momentum is fixed …’. Either way, I would like to know it.

Much of my interest arises from how Egan (and his characters) arrive at this setup: for more background see the supplementary material on Egan’s website, or read the first book in the trilogy, The Clockwork Rocket. At the time of writing I still have the third book to look forward to.

Space-time and the energy-momentum relation

There is 4-dimensional space-time. In our universe, when we measure a time interval T by the distance cT that light travels in this interval, space-time has the Lorentz metric c^2 \mathrm{d}t^2 - \mathrm{d}x^2 - \mathrm{d}y^2 - \mathrm{d}z^2. For example, a photon may move from (0,0,0,0) to (1,c,0,0) in one unit of time, and, corresponding to the fact that no time passes for the photon, the Lorentz distance between these space-time points is zero.

In the Orthogonal universe space-time has the Riemannian metric k^2 \mathrm{d}t^2 + \mathrm{d}x^2 + \mathrm{d}y^2 + \mathrm{d}z^2, where k is a constant with dimension distance/time; by choosing units appropriately (thus sweeping under the carpet the first half of the first book on Yalda’s experiments on Mount Peerless, the beautiful dual Pythagorean Theorem, and the fact that in Orthogonal physics, the speed of light is not constant), we may assume that the numerical value of k is 1. We take the following (and only the following, for the time being) as further axioms:

  1. Physical laws are invariant under the group of transformations preserving the Lorentz/Riemannian metric.
  2. Every particle has a symmetry invariant tangent vector (kv_t,v_x,v_y,v_z) to its world line called its 4velocity with units distance/time; in our universe k = c and in the Orthogonal universe k = 1. A stationary observer measures the particle’s velocity as the 3-vector (v_x,v_y,v_z)/v_t, as illustrated in the diagram below (click for a larger pdf).
  3. Multiplying the 4-velocity, normalized to have length c (our universe) or 1 (the Orthogonal universe), by the mass M of a particle gives its 4-momentum (E/c, \mathbf{p}), where \mathbf{p} is the 3-vector momentum.

Note that (1) only makes sense because we measure time and distance in compatible units: thus v_t is dimensionless and whenever a time appears, such as t_1 in the diagram above, it is multiplied by c. Therefore the symmetry group of space-time can reasonably mix them up.

The way I imagine (2), half-remembered from when I did special relativity 20 years ago (my reference frame), is that the xt-plane has clocks every metre, connected by rigid rods, which record the time of passing of every particle. The particle is at position x_0 (which conveniently enough happens to have a clock) at time ct_0, and at position x_1 (similarly conveniently placed) at time ct_1. The stationary observer can therefore collect the measurements from the clocks (simply by travelling to each one), and approximate the particle’s velocity as (x_1-x_0)/(t_1-t_0) and so v_x/v_t \approx (x_1-x_0)/(t_1-t_0). Everyday objects in our universe, such as cars, trains and coronavirus, move at a negligible fraction of the speed of light, so c(t_1-t_0) is typically huge compared to x_1-x_0, and corresponding the velocity measured by the observer is a tiny fraction of the speed of light.

(Note this is not the velocity that an observer travelling with the particle would measure: this requires the notion of proper time; this is the usual way to define the 4-vector velocity, but except for the informal diagram and the motivation in the previous paragraph, we’re not going to use it here.)

In general, writing \mathbf{v} for the 3-velocity, and as usual in physics, \mathbf{v}^2 for the square of its 3-vector Euclidean norm

\displaystyle ||v|| = \sqrt{v_x^2+ v_y^2+v_z^2},

axiom (2) implies that (kv_t,v_x,v_y,v_z) = (kv_t, \mathbf{v}v_t). By symmetry invariance, the Lorentzian norm squared of the tangent vector is (c^2 - \mathbf{v}^2) v_t^2, and the Euclidean norm squared is (1+\mathbf{v}^2)v_t^2 (using the assumption that k=1). Now since (c^2 - \mathbf{v}^2)v_t^2 has units distance/time, and we care only about the direction of the tangent vector, not its magnitude, it is most convenient (and usual in physics, although I’ve never seen it clearly explained why) to require, as in axiom (3), that the Lorentzian length is c.

Thus in our universe (c^2-\mathbf{v}^2)v_t^2 = c^2, and so v_t = c/\sqrt{c^2-\mathbf{v}^2} = 1/\sqrt{1-\mathbf{v}^2/c} = \gamma(\mathbf{v}), where \gamma(\mathbf{v}) is the usual dimensionless Lorentz factor. In the Orthogonal universe v_t = 1/\sqrt{1+v^2}; this is again dimensionless because the 1 in the numerator implicitly has units distance/time.

Therefore the 4-vector velocity is

\displaystyle (cv_t,v_x,v_y,v_z) = (c\gamma(\mathbf{v}), \mathbf{v}\gamma(\mathbf{v}) ) = \bigl(\frac{c}{\sqrt{1-\mathbf{v}^2/c^2}}, \frac{\mathbf{v}}{\sqrt{1-\mathbf{v}^2/c^2}}\bigr)

in our universe and

\displaystyle (v_t,v_x,v_y,v_z) =  \bigl(\frac{1}{\sqrt{1+\mathbf{v}^2}}, \frac{\mathbf{v}}{\sqrt{1+\mathbf{v}^2}} \bigr)

in the Orthogonal universe.

According to (3) the familiar 3-vector momentum is obtained by multiplying the spatial component of each of these 4-vector velocities by the rest mass M. In our universe, this agrees with special relativity: a particle moving at 3-vector velocity \mathbf{v} is heavier according to a stationary observer by the factor \gamma(\mathbf{v}). In the Orthogonal universe, we find instead that the 3-vector momentum is M\mathbf{v}/\sqrt{1+\mathbf{v}^2}, so fast moving particles are lighter according to a stationary observer.

Again by (3), the energy E of the particle is given by E/c = Mc \gamma(\mathbf{v}) in our universe, and E = M/\sqrt{1-\mathbf{v}^2} in the Orthogonal universe. Substituting in the 4-vectors, we find that in our universe (E/c, M\gamma(\mathbf{v})\mathbf{v}) has Lorentzian length Mc, and in the Orthogonal universe, (E, M\mathbf{v}/\sqrt{1+\mathbf{v}^2}) has Euclidean length M. Equivalently, energy and momentum are related in our universe by

(E/c)^2 - \mathbf{p}^2 = M^2c^2

and in the Orthogonal universe by

E^2 + \mathbf{p}^2 = M^2.

The former is one statement of the energy-momentum relation between the energy E and the 3-momentum \mathbf{p}. Observe that the Orthogonal energy-momentum relation can be stated very neatly as E = M \cos \Gamma and || \mathbf{p} || = M \sin \Gamma, where \Gamma \in [0, \pi/2]. The analogue for our universe, using the identity \cosh^2 \Gamma - \sinh^2\Gamma = 1, is E = Mc^2 \sinh \Gamma and ||\mathbf{p} || = Mc \cosh \Gamma. We show both by the triangles below, noting that only the Orthogonal triangle is a genuine geometric representation, rather than a helpful aide-memoire.

Collision setup

We now need one final pair of axioms

  1. In any system of particles, the sum of all 3-vector momenta is conserved;
  2. In any system of particles, the sum of all energies (as recorded in the time component of 4-vector momenta) is conserved.

The diagrams below show a stationary particle with mass m hit by a heavy particle with mass M moving in the x direction (and also in time, of course). Again click on the diagram for a larger pdf.

We can suppose that after the collision motion is contained in txy-space (of which the diagram above shows only the xy-plane), so all z-coordinates are zero: let \phi (oriented clockwise) and \theta (oriented anticlockwise) be the deflection angles of the lighter and heavier particle. The 4-momenta conservation equations are, in our universe,

\begin{aligned} mc + Mc \cosh \Gamma  &= mc \cos \delta + Mc \cos \Delta \\ Mc \sinh \Gamma &= mc \sinh \delta \cos \phi + Mc \sinh \Delta \cos \theta \\ 0 &= -mc \sinh \delta \sin \phi  + mc \sinh \Delta \sin \theta. \end{aligned}

Notice that these are homogeneous in c. We can therefore divide through by c and obtain the equations for the Orthogonal universe by replacing hyperbolic functions with trigonometric functions throughout. As a further small simplification we introduce T = m + M\, \mathrm{co}\ \Gamma and observe that for our universe

(T-m)^2/M^2 - 1 = \cosh^2 \Gamma - 1 = \sinh^2 \Gamma

and for the Orthogonal universe,

-(T-m)^2/M^2 + 1 = -\cos^2 \Gamma + 1 = \sin^2 \Gamma.

Therefore a unified system of equations is

\begin{aligned} T &= m\, \mathrm{co}\; \delta + M\, \mathrm{co}\; \Delta  \\ \sqrt{|(T-m)^2 - M^2|}  &= m\, \mathrm{si}\; \delta \cos \phi + M\, \mathrm{si}\; \Delta \cos \theta \\ 0 &= -m\, \mathrm{si}\; \delta \sin \phi + M\, \mathrm{si}\; \Delta \sin \theta \end{aligned}

where \mathrm{co}, \mathrm{si} are \cosh, \sinh in our universe and \cos, \sin in the Orthogonal universe. The expression (T-m)^2 - M^2 is positive in our universe and negative for the Orthogonal universe; it has the three constants m, M and T (all with units of mass) which we regard as determined by the experimental setup. The unknowns are \delta, \Delta, \phi and \theta, appearing only on the right-hand sides.

Numerical solutions

It is routine to solve these equations numerically using NSolve in Mathematica. As an example, we suppose that the heavier particle has three times the mass of the lighter, so m/M = 1/3. The graph below show the post-collision velocity V of the heavier particle for each of the two possible deflection angles \theta, as the energy of the heavier particle increases from least (red) to greatest (violet).

This agrees with the experiment conducted in Orthogonal where the heavier particle is a photon, and the characters observe the deflection angle by visual observation of a system of mirrors. The graph below the has the same axes, comparing the post-collision velocity in the Orthogonal universe (solid) and our universe (dashed), choosing the energies to get comparable shapes. Notice that the maximum deflection angle does not depend on the energy of the heavier particle, or even on which universe we are in: we show below that its sine is the ratio of the masses. Thus in these graphs it is \sin^{-1} 1/3 \approx 0.33984.

The graph below is the equivalent of the first graph showing the final energy of the heavier particle. If you are surprised that red is now at the top, note that in Orthogonal physics, the energy is M/\sqrt{1+V^2}, so is lower for faster moving particles.

Tangent space equation

Differentiating the momentum conservation equations using that \frac{\mathrm{d}}{\mathrm{d}\alpha} \, \mathrm{si}\ \alpha = \mathrm{co}\ \alpha, we obtain

\begin{aligned} 0 &= m\, \mathrm{co}\; \delta \cos \phi\, \mathrm{d}\delta - m\, \mathrm{si}\; \delta \sin \phi \,\mathrm{d}\phi \\ &\qquad\qquad + M\, \mathrm{co}\; \Delta \cos \theta\, \mathrm{d}\Delta - M\, \mathrm{si}\; \Delta \sin \theta\, \mathrm{d} \theta, \\ 0 &= -m\, \mathrm{co}\; \delta \sin \phi \mathrm{d}\delta - m\, \mathrm{si}\; \delta \cos \phi\, \mathrm{d}\phi \\ &\qquad\qquad - M\, \mathrm{co}\; \Delta \sin \theta\, \mathrm{d}\Delta - M\, \mathrm{si}\; \Delta \cos \theta\, \mathrm{d} \theta. \end{aligned}

At an extremum for \theta, we have d \theta = 0. Multiplying the top equation by \cos \phi and the bottom equation by \sin \phi and subtracting, we obtain

\begin{aligned} 0 &= m\, \mathrm{co}\; \delta \cos^2 \phi\, \mathrm{d}\phi - m \, \mathrm{si}\; \delta \sin \phi \cos \phi\, \mathrm{d}\phi \\ & \qquad\qquad\qquad\qquad\qquad\qquad + M\, \mathrm{co}\; \Delta \cos \theta \cos \phi\, \mathrm{d}\Delta \\ &\ + m\, \mathrm{co}\; \delta \sin^2 \phi\, \mathrm{d}\phi + m \, \mathrm{si}\; \delta \cos \phi \sin \phi\, \mathrm{d}\phi \\ &\qquad\qquad\qquad\qquad\qquad\qquad -M\, \mathrm{co}\; \Delta \sin \theta \sin \phi\, \mathrm{d}\Delta \\ & = m\, \mathrm{co}\; \delta (\cos^2 \phi + \sin^2 \phi)\, \mathrm{d}\delta \\ &\qquad\qquad + M\, \mathrm{co}\; \Delta (\cos \theta \sin \phi - \sin \theta \cos \phi)\, \mathrm{d}\Delta \\ &= m\, \mathrm{co}\; \delta\, \mathrm{d}\delta + M \, \mathrm{co}\;\Delta \cos (\theta +\phi) \,\mathrm{d}\Delta. \end{aligned}

Differentiating the equation from energy conservation, using that \frac{\mathrm{d}}{\mathrm{d}\alpha} \cosh \alpha = \sinh \alpha and \frac{\mathrm{d}}{\mathrm{d}\alpha} \cos \alpha = -\sin \alpha, so the same sign appears both times, we get

0 = m \, \mathrm{si} \; \delta \, \mathrm{d}\delta + M \, \mathrm{si}\; \Delta \, \mathrm{d}\Delta.

Comparing these relations between the independent differentials \mathrm{d}\delta and \mathrm{d}\Delta, we see that the coefficients must be in the same ratio: that is

\displaystyle \frac{\mathrm{si}\;\delta}{\mathrm{co}\; \delta} = \frac{\mathrm{si}\;\Delta}{\mathrm{co}\;\Delta \cos (\theta + \phi)},

or equivalently,

\displaystyle \frac{\mathrm{ta}\; \Delta}{\mathrm{ta}\; \delta} = \cos (\theta + \phi)

where \mathrm{ta} is \mathrm{tanh} in our universe and \mathrm{tan} in the Orthogonal universe.

Solution for maximum deflection

First step: solving for \delta and \Delta

The sum of the squares of the two equations for momentum is

\begin{aligned} |(T&-m)^2 - M^2| \\ &= \sqrt{|(T-m)^2 - M^2|}^2 + 0^2 \\ &= (m\, \mathrm{si}\; \delta \cos \phi + M\, \mathrm{si}\; \Delta \cos \theta)^2 \\ &\qquad + (-m\, \mathrm{si}\; \delta \sin \phi + M\, \mathrm{si}\; \Delta \sin \theta)^2 \\ &= m^2\, \mathrm{si}^2\; \delta (\cos^2 \phi + \sin^2\phi) + M^2\, \mathrm{si}^2\; \Delta (\cos^2 \theta + \sin^2\theta) \\ &\qquad + 2mM\, \mathrm{si}\; \delta\, \mathrm{si}\; \Delta (\cos \phi \cos \theta + \sin \phi \sin \theta) \\ &= m^2\, \mathrm{si}^2 \; \delta + M^2 \, \mathrm{si}^2\; \Delta + 2mM\, \mathrm{si}\;\delta \, \mathrm{si}\; \Delta \, \cos(\theta+\phi). \end{aligned}

By the equation from the tangent space, at maximum deflection \cos(\theta + \phi) = \frac{\mathrm{ta}\; \Delta}{\mathrm{ta}\; \delta}. Substituting and using

\mathrm{si}\;\delta \, \mathrm{si}\;\Delta\, \frac{\mathrm{ta}\; \Delta}{\mathrm{ta}\; \delta} = \mathrm{co}\; \delta \, \mathrm{si}\;\Delta\, \mathrm{ta}\; \Delta

we get

|(T-m)^2 - M^2|  = m^2\, \mathrm{si}^2 \: \delta + M^2 \, \mathrm{si}^2\: \Delta + 2mM\, \mathrm{co}\; \delta \, \mathrm{si}\;\Delta\, \mathrm{ta}\; \Delta

in which the only unknowns are \delta and \Delta. The only equation we have not yet used is the energy conservation equation

T = m \, \mathrm{co} \; \delta + M \, \mathrm{co}\;\Delta.

It implies that

m \, \mathrm{co}\;\delta = T - M\, \mathrm{co}\; \Delta

and, by squaring this relation and subtracting m^2, that

\pm m^2 \, \mathrm{si}^2 \; \delta = (T - M\, \mathrm{co}\; \Delta)^2 - m^2

where the sign is + for our universe and - for the Orthogonal universe. Using these two relations to eliminate m\, \mathrm{co}\; \delta and m^2\, \mathrm{si}^2 \; \delta and recalling that (T-m)^2 - M^2 is positive for our universe and negative for the Orthogonal universe, we obtain

\begin{aligned} \pm \bigl((T-m)^2 - M^2\bigr)  &= \pm \bigl( (T - M\, \mathrm{co}\; \Delta)^2 - m^2\bigr) + M^2 \, \mathrm{si}^2\: \Delta \\ & \qquad\qquad\qquad + 2M (T - M\, \mathrm{co}\; \Delta) \, \mathrm{si}\;\Delta\, \mathrm{ta}\; \Delta \end{aligned}

with the same conventions for the universe dependent signs. Now only \mathrm{\Delta} appears. After multiplying through by \pm we can simplify the right-hand side as follows

\begin{aligned} \bigl((T&-m)^2 - M^2\bigr) \\ &= (T - M\, \mathrm{co}\; \Delta)^2 - m^2 \pm M^2 \, \mathrm{si}^2 \; \Delta \\ &= T^2 + M^2 \, \mathrm{co}^2\; \Delta - 2MT\, \mathrm{co}\; \Delta \\ &\qquad - m^2 \pm M^2\, \mathrm{si}^2 \Delta \pm 2MT\, \frac{\mathrm{si}^2\; \Delta}{\mathrm{co}\; \Delta} \mp 2M^2 \, \mathrm{si}^2 \Delta \\ &= T^2 + M^2 (\mathrm{co}^2\;\Delta \mp \mathrm{si}^2\;\Delta) -m^2 - \frac{2MT}{\mathrm{co}\;\Delta} (\mathrm{co}^2\;\Delta \mp \mathrm{si}^2\;\Delta) \\ &= T^2 + M^2 - m^2- \frac{2MT}{\mathrm{co}\; \Delta}. \end{aligned}

It seems somewhat remarkable to me that all the signs cancel, and we are left with such a simple equation. Perhaps this is a sign that there is some more direct derivation that I am missing. Anyway, cancelling the factors of T^2 on each side and rearranging, we get the final equation

\displaystyle M^2 -m^2 + mT = \frac{MT}{\mathrm{co} \; \Delta}

which clearly has unique solution

\mathrm{co}\; \Delta \displaystyle = \frac{MT}{M^2-m^2+mT}.

Using the relation from energy conservation m \, \mathrm{co}\; \delta = T - M\, \mathrm{co}\;\Delta we immediately get

\displaystyle m\,\mathrm{co}\; \delta = T - \frac{M^2T}{M^2-m^2+mT} = \frac{-m^2T + mT^2}{M^2-m^2+ mT}


\mathrm{co}\; \delta = \displaystyle \frac{T(T-m)}{M^2-m^2+mT}.

Final step: finding \phi and \theta

Substituting these expressions for \mathrm{co}\; \delta and \mathrm{co}\; \Delta into the equation from the tangent space equation \cos (\theta+\phi) = \frac{\mathrm{ta}\; \Delta}{\mathrm{ta}\; \delta} we obtain

\begin{aligned} \cos^2 (\theta+\phi) &= \frac{\mathrm{ta}^2\; \Delta}{\mathrm{ta}^2 \; \delta} \\ &= \frac{\mathrm{co}^2 \; \Delta - 1}{\mathrm{co}^2 \; \delta - 1} \frac{\mathrm{co}^2 \; \delta}{\mathrm{co}^2\; \Delta} \\ &= \frac{M^2T^2 - (M^2-m^2+mT)^2}{T^2(T-m)^2 - (M^2-m^2+mT)^2} \frac{T^2(T-m)^2}{M^2T^2}. \end{aligned}

Again there is a surprising simplification: the numerator M^2T^2 - (M^2-m^2+mT)^2 of the left-hand fraction becomes M^2T^2 - (MT)^2 = 0 if m = M, and so vanishes if m = \pm M. (Of course only the specialization m = M is physically meaningful.) The quotient by M^2-m^2 is T^2-M^2+m^2-2mT. Hence we have

\begin{aligned} M^2&T^2 - (M^2- m^2+mT)^2 \\ & \quad = (M-m)(M+m)(T+M-m)(T-M-m).\end{aligned}

Similarly one can show that the denominator factorizes as

\begin{aligned} T^2(T-m)^2 & - (M^2-m^2+mT)^2 \\ &\quad= (T^2+M^2-m^2)(T+M-m)(T-M-m). \end{aligned}

Therefore the expression above for \cos^2 (\theta + \phi) simplifies to

\cos^2 (\theta + \phi) = \displaystyle \frac{(M-m)(M+m)}{T^2+M^2-m^2} \frac{(T-m)^2}{M^2}.

Using the momentum conservation equation

m\,\mathrm{si}\;\delta \sin \phi = M\, \mathrm{si}\; \Delta \sin \theta

and the same factorizations to simplify \frac{\mathrm{co}^2\;\Delta -1}{\mathrm{co}^2 \; \delta -1} we get

\begin{aligned} 1 &= \frac{M^2\, \mathrm{si}^2 \; \Delta \sin^2 \theta}{m^2\,\mathrm{si}^2 \; \delta \sin^2 \phi} \\ &= \frac{M^2(\mathrm{co}^2 \;\Delta - 1)\sin^2 \theta}{m^2(\mathrm{co}^2 \;\delta - 1)\sin^2 \phi} \\ &= \frac{M^2(M-m)(M+m)}{m^2(T^2+M^2-m^2)} \frac{\sin^2\theta}{\sin^2\phi}.\end{aligned}


\displaystyle \sin^2 \phi = \frac{M^2(M^2-m^2)}{m^2(T^2+M^2-m^2)} \sin^2 \theta.

Rewriting the equation for \cos^2 (\theta + \phi) so it uses only sines and then substituting for \sin \phi using the equation above gives

\begin{aligned} &\frac{T-m}{M}\sqrt{\frac{M^2-m^2}{T^2+M^2-m^2}} \\ &= \sqrt{1-\sin^2 \theta}\sqrt{1-\sin^2\phi} - \sin \theta \sin \phi \\ &= \sqrt{1-\sin^2 \theta}\sqrt{1 - \frac{M^2(M^2-m^2)}{m^2(T+M^2-m^2)} \sin^2 \theta} \\ & \qquad\qquad\qquad\qquad\qquad - \sin^2 \theta \frac{M}{m} \sqrt{\frac{M^2-m^2}{T^2+M^2-m^2}}. \end{aligned}

Hence writing S for \sin^2 \theta we get by a rearrangement and squaring the following quadratic equation in S:

\begin{aligned}  (1-S)\bigl(1 - &  \frac{M^2(M^2-m^2)}{m^2(T^2+M^2-m^2)} S \bigr) \\ &\quad = \bigl( \frac{T-m}{M} + \frac{SM}{m}\bigr)^2 \frac{M^2-m^2}{T^2+M^2-m^2}. \end{aligned}

It is routine to check that the difference between the two sides is

\displaystyle \frac{(SM^2 - m^2)(M^2-m^2+mT)^2}{M^2m^2(T^2+M^2-m^2)}.

and so the unique solution is S = m^2/M^2. (This is one of several places where we can see that there is a maximum deflection angle if and only if m \le M.) Since \theta is positive, we have \sin \theta = m/M, and, as claimed some time ago, the sine of the maximum deflection angle is simply the ratio of the masses.

The equation for \sin^2 \phi above now implies that

\displaystyle \sin^2 \phi = \frac{M^2(M^2-m^2)}{m^2(T^2+M^2-m^2)} \frac{m^2}{M^2} = \frac{M^2-m^2}{T^2+M^2-m^2}

and hence

\displaystyle \sin \phi = \sqrt{\frac{M^2-m^2}{T^2+M^2-m^2}}.


A somewhat remarkable feature of the unique solution for the maximum deflection

\begin{aligned} \sin \phi &=\sqrt{\frac{M^2-m^2}{T^2+M^2-m^2}} \\ \sin \theta &= \frac{m}{M} \\ \mathrm{co}\; \delta &=  \frac{T(T-m)}{M^2-m^2+mT} \\ \mathrm{co}\; \Delta &= \frac{TM}{M^2-m^2+mT} \end{aligned}

where T = m + M \, \mathrm{co}\; \Gamma is the total energy (up to the factor c^2 in our universe), is that it depends on which universe we are working in only by a change from the hyperbolic function \cosh (our universe) to the trigonometric function \cos (Orthogonal universe). Moreover, the maximum deflection angle \theta does not depend on the universe at all.

Signed binomial matrices of small order

December 28, 2020

Let B(n) denote the n \times n Pascal’s Triangle matrix defined by B(n)_{xy} = \binom{x}{y}, where, as in the motivating paper, we number rows and columns of matrices from 0. More generally, let B_a(n) be the shifted Pascal’s Triangle matrix defined by B_a(n)_{xy} = \binom{x+a}{y+a}. Let J(n)_{xy} = [x+y=n-1] be the involution reversing the order of rows (or columns) of a matrix, and let

D^\pm(n) = \mathrm{Diag}(1,-1,\ldots, (-1)^{n-1}).

Since n is fixed throughout, we shall write B, B_a, J and D^\pm for readability in proofs.

Claim 1. For any a \in \mathbb{N}, and any n \ge 2, the matrix D^\pm(n)B_a(n) is an involution.

Proof. Since n\ge 2, the matrix is not the identity. Since \bigl( D^\pm B_a \bigr)_{xy} = (-1)^x \binom{x+a}{y+a} we have

\begin{aligned} \bigl( D^\pm B_a D^\pm B_a \bigr)_{xz} &= (-1)^x \sum_y (-1)^y \binom{x+a}{y+a} \binom{y+a}{z+a} \\ &= (-1)^x \sum_y \binom{x+a}{z+a}\binom{x-z}{y-z}(-1)^y \\ &= (-1)^{x+z} \binom{x+a}{z+a} \sum_r \binom{x-z}{r}(-1)^r \\ &= (-1)^{x+z} \binom{x+a}{z+a} [x=z] (-1)^{x+z} \\ &= [x=z] \end{aligned}

as required. \quad\Box

Claim 2. We have B(n)^{-1} J(n) B(n) = D^\pm(n) J(n) B(n) J(n) and B(n) J(n) B(n)^{-1} = J(n) B(n) J(n) D^\pm(n).

Proof. The alternating sum used to prove Claim 1 can be used to show that B(n)^{-1}_{xy} = (-1)^{x+y} \binom{x}{y}. Hence

\begin{aligned} \bigl(B(n)&J(n)B(n)^{-1} \bigr)_{xz} \\ &= \sum_{y} (-1)^{x+y} \binom{x}{y} \binom{n-1-y}{z} \\ &= (-1)^x\sum_y (-1)^y \binom{x}{y} (-1)^{n-1-y-z} \binom{-z-1}{n-1-y-z} \\ &= (-1)^{x+n-1-z} \binom{x-z-1}{n-1-z} \\ &= (-1)^{x+n-1-z} \binom{n-1-x}{n-1-z} (-1)^{n-1-z} \\ &= (-1)^x \binom{n-1-x}{n-1-z} \\ &= \bigl( D^\pm(n)J(n)B(n)J(n) \bigr)_{xz}\end{aligned}

as required. The second identity is proved very similarly. \quad\Box

Claim 3. For any n \ge 2, the matrix D^\pm(n)J(n)B(n) has order 3 if n is odd and order 6 if n is even. \quad\Box

Proof. Since \bigl( D^\pm J B \bigr)_{xy} = (-1)^x \binom{n-1-x}{y} we have

\begin{aligned} \bigl( D^\pm & J B D^\pm J B D^\pm J B \bigr)_{xw} \\ &= (-1)^x \sum_y \sum_z (-1)^{y+z} \binom{n-1-x}{y}\binom{n-1-y}{z} \\ & \hspace*{3in} \binom{n-1-z}{w} \\ &= (-1)^{x+n-1} \sum_z \Bigl( \sum_y \binom{n-1-x}{y} \binom{-z-1}{n-1-y-z} \Bigr) \\& \hspace*{3in} \binom{n-1-z}{w} \\ &= (-1)^{x+n-1} \sum_z \binom{n-2-x-z}{n-1-z}\binom{n-1-z}{w} \\ &= (-1)^{x} \sum_z (-1)^{z}\binom{x}{n-1-z}\binom{n-1-z}{w} \\ &= (-1)^{x+n-1} \sum_r (-1)^r \binom{x}{r} \binom{r}{w} \\&= (-1)^{x+n-1} (-1)^x [x=w] = (-1)^{n-1}. \end{aligned}

It is easily seen that the matrix is not a scalar multiple of the identity, therefore it has the claimed order. \quad\Box

Alternative proof. Another proof uses Claims 1 and 2, as follows

\begin{aligned} (JBD^\pm)^3 &= JBD^\pm JBD^\pm (D^\pm J)B D^\pm \\ &= JB\bigl( D^\pm JBJ\bigr) D^\pm (-1)^{n-1}BD^\pm \\ &= (-1)^{n-1} JB((B^{-1}JB ) JD^\pm BD^\pm \\ &= (-1)^{n-1}BD^\pm BD^\pm \\ &= (-1)^{n-1}I\end{aligned}

with the end as before. \quad\Box

Let X^\circ denote the half-turn rotation of an n \times n matrix X, as defined by X^\circ = J(n)XJ(n). By Claim 3,

B(n)D^\pm(n)B(n)^\circ D^\pm = BD^\pm J B J D^\pm

is conjugate to JD^\pm BD^\pm JB and so to (-1)^{n-1} (D^\pm JB)(D^\pm JB). Hence this matrix has order 3 when n is odd and order 6 when n is even. We state without proof some further identities along these lines.

Claim 4. The matrix B(n)D^\pm(n)B_n(n)^\circ D^\pm(n) has order 3. If n is even then B(n)D^\pm(n)B_1(n)^\circ D^\pm(n) has order 12. If n is odd then B_1(n)D^\pm(n)B_1(n)^\circ D^\pm(n) has order 3.

The second case seems particularly remarkable. There are some obstructions (related to root systems) to integer matrices having finite order. These identities were discovered as a spin-off of a somewhat involved construction with linear algebra; I have no idea how to motivate them in any other way. For instance, how looking at

B(6)D^\pm(6)B_1(6)D^\pm(6) = \left( \begin{matrix} 1 & -6 & 15 & -20 & 15 & -6 \\ 1 & -5 & 10 & -10 & 5 & -1 \\ 1 & -4 & 6 & -4 & 1 & 0 \\ 1 & -3 & 3 & -1 & 0 & 0 \\ 1 & -2 & 1 & 0 & 0 & 0 \\ 1 & -1 & 0 & 0 & 0 & 0 \end{matrix} \right)

would one guess that it has order 12? A computer search found many more examples involving more lengthy products of the signed binomial matrices and their rotations, for instance

B(5)D^\pm B_4(5)^\circ D^\pm B(5)D^\pm B_8(5)^\circ D^\pm 

has order 12.

Completely monotone sequences

December 20, 2020

A sequence \lambda_0, \lambda_1, \ldots, \lambda_n is monotone decreasing if \lambda_0 \ge \lambda_1 \ge \lambda_2 \ge \ldots, or equivalently, if \lambda_j - \lambda_{j+1} \ge 0 for all j \in \mathbb{N}_0. We will decide once and for all to deal with decreasing sequences only, and say that such a sequence is completely monotone if all its iterated difference, including the zero differences, are non-negative. That is \lambda_j \ge 0, \lambda_j - \lambda_{j+1} \ge 0, \lambda_j - 2\lambda_{j+1} + \lambda_{j+2} \ge 0, \lambda_j - 3\lambda_{j+1} + 3\lambda_{j+2} - \lambda_{j+3} \ge 0 for all j \in \mathbb{N}_0, and so on. Equivalently,

\displaystyle \sum_{i=0}^k (-1)^i \binom{k}{i}\lambda_{j+i} \ge 0

for all j, k \in \mathbb{N}_0. One family of examples I stumbled across, in what seemed like a completely unrelated context of random walks defined on posets with involution, is the two-parameter family

\displaystyle \lambda^{(a,b)}_j = \frac{\binom{a+b}{a}(a+b+1)}{\binom{a+b+j}{b}(a+b+j+1)}.

For instance, \lambda^{(0,0)}_j = \frac{1}{j+1} for each j \in \mathbb{N}_0. For a direct proof that this sequence is completely monotone, we use the \beta-integral \int_0^1 x^a(1-x)^b \mathrm{d}x = \frac{1}{\binom{a+b}{a}(a+b+1)} to write

\displaystyle\begin{aligned}\sum_{i=0}^k& (-1)^i \binom{k}{i} \frac{\lambda^{(a,b)}_j}{\binom{a+b}{a}(a+b+1)} \\ &= \sum_{i=0}^k (-1)^i \binom{k}{i} \int_0^1 x^{a+j+i}(1-x)^b \mathrm{d}x \\ &= \sum_{i=0}^k \int_0^1 x^{a+j}(1-x)^{b+k} \\ &= \frac{1}{\binom{a+b+j+k}{b+k}(a+b+j+k+1)} \\ &= \frac{\lambda^{(a,b+k)}_j}{\binom{a+b+k}{a}(a+b+k+1)}.\end{aligned}

In the same context, I came across some multivariable polynomials that I conjecture are always positive when evaluated at a completely monotone sequence. These polynomials include

\displaystyle \begin{aligned}f_1(x_0,x_1) &= x_0-x_1 \\ f_2(x_0,x_1,x_2) &= x_0^2 - 2x_0x_1 + 2x_0x_2 - x_1x_2 \\ f_3(x_0,x_1,x_2,x_3) &= x_0^3 - 3 x_0^2 x_1 + 5 x_0^2 x_2 - 3 x_0 x_1 x_2\\ & \quad - 3 x_0^2 x_3 + 5 x_0 x_1 x_3 - 3 x_0 x_2 x_3 + x_1 x_2 x_3. \end{aligned}

Their positivity is obvious for f_1, because f_1 is equal to a difference. For f_2 positivity follows from

\displaystyle f_2(x_0,x_1,x_2) = x_0(x_0-2x_1+x_2) + (x_0-x_1)x_2.

Despite much struggling, I was unable to find a similar expression for f_3 as a sum of products of positive differences. The main purpose of this post is to show that such expressions exist for linear functions, but not in general. I therefore may well have been on a wild-goose chase, hardly for the first time.

Linear polynomials

Fix n \in \mathbb{N}. Let \mathcal{C} \subseteq \mathbb{R}^n be the cone of all completely monotone sequences, as defined by the inequalities at the top of this post. Let \mathcal{D} \subseteq \mathbb{R}^n be the cone spanned by the coefficients in these inequalities.

To make these cones more explicit, the following notation is helpful. Let \Delta^k \lambda_j = \sum_{i=0}^k (-1)^i \binom{k}{i}\lambda_{j+i}. Then \Delta^{k-1} \lambda_j - \Delta^{k-1} \lambda_{j+1} = \Delta^{k} \lambda_j, and so \Delta^k \lambda_{n-1-k} + \Delta^{k-1} \lambda_{n-k} = \Delta^{k-1}\lambda_{n-1-k}. It follows inductively that \mathcal{D} is the set of non-negative linear combinations of the vectors v^{(k)} defined by

v^{(k)}_{n-1-i} = (-1)^i\binom{k}{i}

For instance, if n=5 then v^{(0)}, v^{(1)}, v^{(2)}, v^{(3)}, v^{(4)} are the rows of the 5 \times 5 matrix below

\left( \begin{matrix} \cdot & \cdot & \cdot & \cdot & 1 \\ \cdot & \cdot & \cdot & 1 &  -1 \\ \cdot & \cdot & 1 & -2 & 1 \\ \cdot & 1 & -3 & 3 & -1 \\ 1 & -4 & 6 & -4 & 1 \end{matrix} \right)

Suppose that \sum_{i=0}^{n-1} a_i \lambda_i \ge 0 for all completely monotone sequences. That is, a \cdot \lambda \ge 0 for all \lambda \in \mathcal{C}; we write this as a \cdot \mathcal{C} \ge 0. By Farkas’ Lemma applied to the cone \mathcal{D}, either

  • a \in \mathcal{D}, or
  • there exists \lambda \in \mathbb{R}^n such that \lambda \cdot \mathcal{D} \ge 0 and a \cdot \lambda < 0.

In the `either’ case, since \lambda \cdot \mathcal{D} \ge 0, the sequence \lambda is completely monotone. But then a \cdot \lambda < 0 contradicts the hypothesis on a. Therefore a \in \mathcal{D}. So we have a simple necessary and sufficient condition for a linear polynomial to be non-negative on all completely monotone sequences: this is the case if and only if the coefficients are a linear coefficients of the vectors v^{(0)}, v^{(1)}, \ldots, v^{(n-1)}.

Quadratic example

I had hoped that something similar might work for quadratic and higher degree polynomials based on analogously defined cones coming from the coefficients of products in the differences. Here is an counterexample to this hope.

Take n=3 and order monomials lexicographically x_0^2, x_0x_1, x_0x_2, x_1^2, x_1x_2, x_2^2. The homogeneous quadratics in the difference polynomials x_0, x_1,x_2, x_0-x_1,x_1-x_2,x_0-2x_1+x_2 span the same subspace as the rows of the matrix below.

\left( \begin{matrix} 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 1 & -1 \\ 0 & 0 & 0 & 1 & -2 & 1 \\ 0 & 0 & 1 & 0 & -2 & 1 \\ 0 & 1 & -1 & -2 & 3 & -1 \\ 1 & -4 & 2 & 4 & -4 & 1 \end{matrix}\right)

For instance, the penultimate row has the coefficients for (\lambda_0-2\lambda_1+\lambda_2)(\lambda_1-\lambda_2). Define
g(x_0,x_1,x_2) = (x_0-x_1)(x_0-x_1-x_2)+x_1^2.

Claim. g(\lambda_0, \lambda_1, \lambda_2) \ge 0 for all completely monotonic sequences (\lambda_0,\lambda_1,\lambda_2).

Proof. Since the polynomial is homogeneous, we may assume that \lambda_0 = 1. Since g is linear in x_2, when evaluated at 1,\lambda_1,\lambda_2, it is negative if and only if

\displaystyle \lambda_2 \ge \frac{(1-\lambda_1)^2+\lambda_1^2}{1-\lambda_1} = \frac{1-2\lambda_1 + 2\lambda_1^2}{1-\lambda_1}

This inequality implies that

\displaystyle \lambda_1 - \lambda_2 \le \frac{-1 + 3\lambda_1 - 3\lambda_1^2}{1-\lambda_1}

but the quadratic 1 - 3y + 3y^2 - 3y = 3(y-1/2)^2 + \frac{1}{4} is always at least \frac{1}{4}. Hence \lambda_1 -\lambda_2 \le -\frac{1}{4}, contradicting complete monotonicity. \qquad\Box

Claim. g is not a positive linear combination of homogeneous quadratics in the differences x_0,x_1,x_2, x_0-x_1,x_1-x_2, x_0-2x_1+x_2.

Proof. It is equivalent to show that the coefficients of g, namely (1, -2, -1, 2, 1, 0) are not in the cone of positive linear combinations of the 6 \times 6 matrix above. Using the computer algebra Magma, one can compute its dual cone, which turns out to be the set of positive linear combinations of the rows of the matrix below.

\left(\begin{matrix} 1 & 0 & 0 & 0 & 0 & 0 \\ 4 & 1 & 0 & 0 & 0 & 0 \\ 2 & 1 & 1 & 0 & 0 & 0 \\ 4 & 2 & 0 & 1 & 0 & 0 \\ 4& 3 & 2 & 2 & 1 & 0 \\ 1 & 1 & 1 & 1 & 1 & 1 \end{matrix}\right)

In particular, the dual cone contains (2,1,1,0,0,0), and since (1, -2, -1, 2, 1, 0) \cdot (2,1,1,0,0,0) = -1, (1,-2,-1,2,1,0) is not in the dual of the dual cone, as required.\qquad\Box

I suspect that there is no easy way (or maybe even no algorithmic way?) to decide if a homogeneous polynomial in a completely monotone sequence is non-negative, but would be very happy to be proved wrong on this.

Immanants and representations of the infinite symmetric group

October 12, 2020

This is a reminder to the author of a wild question. (I won’t even call it an idea.) Given a n \times n matrix X and a symmetric group character \chi, the immanant d_\chi(X) is defined by

\displaystyle \sum_{\sigma \in \mathrm{Sym}_n} \chi(\sigma) \prod_{i=1}^n X_{(i,\sigma(i))}.

Thus when \chi is the sign character the immanant is the familiar determinant, and when \chi is the trivial character, it is the permanent, of interest partly because of the Permanent dominance conjecture and its starring role in Valiant’s programme to prove the algebraic analogue of the P \not= NP conjecture. Very roughly stated, a more general conjecture is that all immanants, except for the determinant (or multiples of it), are hard to compute.

My wild question is: can one generalize immanants to representations of infinite symmetric groups, and does this give any extra freedom to in some sense `interpolate’ from the trivial to the sign character?

A game on tournaments with multiple edges

August 21, 2020

Here is a quick post to record a combinatorial game that I think might be of interest, but I don’t have time to think about seriously now.

When Person A plays Person B on the online chess server on which I spend far too much time, the server looks to see who has the longest run of consecutive games as black, and chooses that person to be white. Ties are broken randomly. This is a simple but effective rule: for instance it guarantees that in a series of games, the players alternate colours.

Now suppose a team of people want to manipulate the server into giving one of their members a long run of consecutive games as white. (This would of course be strictly against the server policies.) How big must the team be to guarantee that some member gets n consecutive games as white?

A quick look at the small cases reveals the answer. With three people, having arbitrary past histories (this matters below), A plays B and the black player, B say, then plays C. If B is white, then when A plays B again, either A or B establishes a run of two games as white. If B is black (for instance this always happens if C has had a run of two black games), then when A plays C, either A or C gets the same run. Note that in the first case, A never plays C, but instead plays B twice.

With four people, we use the strategy above to give A a run of two white games, and then using B, C, D, give B say, another run of two white games. Now when A and B play, a run of three white games is established. Continuing inductively we see that n people can force n-1 consecutive white games.

One might also ask for the analogous result where multiple games between the same people are not allowed. If the aim is instead to maximize the number of games as white, then we obtain the combinatorial game where the ‘Picker’ picks an edge of the complete graph K_n and the ‘Chooser’ chooses its orientation. Since in the end all edges are picked exactly once, the role of the ‘Picker’ is defunct: the Chooser can choose an n vertex tournament before play begins, and follow its recommendations. By Theorem 1.2 in this article of Alspach and Gavlas, when n is odd, K_n can be decomposed into n(n-1)/2m edge cycles of length m if and only if m divides n. Directing each cycle in an arbitrary way then gives a tournament where every vertex has the same in- and out-degree. A simple solution when n is prime is to take m=n and use the cycles (0,d,2d,\ldots,(n-1)d) for 1 \le d \le (n-1)/2 on the vertex set \{0,1,\ldots, n-1\}, working modulo n. When n is even, a discrepancy of 1 between the in- and out-degrees of every vertex is inevitable, and it’s not hard to use the construction for n odd to see that this can be achieved.

Returning to the original setting where we care about runs of white games, this shows that the Chooser can guarantee the longest run is at most the maximum out-degree, namely \lfloor n/2 \rfloor.

I’ll end with two natural questions suggested by this quick analysis.

Question 1

Let G be a finite graph. In each step the Picker chooses an edge of G, and the Chooser directs it; multiple picks are allowed, and the Chooser may choose a different orientation on different picks. Say that vertex v has a run of length m if in the most recent m picks involving this vertex, it was always the source. What is the maximum run length that the Picker can force somewhere in this graph?

Question 2

As Question 1, but now each edge may only be picked once.

For the first in the case of the complete graph, the analysis above shows that the Picker can force a run of length n-1, and since the chooser can make the next directed edge involving this vertex an in-edge, this is best possible. For the second in the case of the complete graph, the Picker cannot do better than \lfloor n/2 \rfloor, but I do not know if this is always possible.

Flipped classroom and online learning: Personal FAQ

June 7, 2020

I spent most of June 2 and 3 attending two meetings about online learning: Flexible education (organized by Royal Holloway), at which a College wide teaching model was unveiled, and Teaching and Learning Mathematics Online (organized by Michael Grove, Rachel Hilliam and Kevin Houston). A recurring theme was ‘Active Blended Learning’ or the ‘Flipped Classroom’.  The purpose of this post is to record some observations made by speakers at these talks and my reflections on the research literature on these pedagogic models. Please may I emphasis that this post has absolutely no official status.

Does the College model require the flipped classroom?

This isn’t said explicitly, but I think it is implicit throughout. For example in the seven step programme that we are supposed to follow for each ‘teaching week’, we read

  • 3. Engage with learning material. Provides time for students to independently or in groups to engage with learning materials.
  • 4. Learning activity. Enables the practical and critical application of learning through individual or group activities.
  • 5. Learning Check. Allows the student and lecturer(s) to check that the student learned the content and achieved the learning outcome(s) for the week through an activity or formative/summative assessment.

All this seems to sit fairly happily in a framework where students are expected to work off-line and then come together for synchronous problem solving sessions, maybe checking their understanding before/after by an online quiz. It does not, as as I see it, fit with the traditional model of three live lectures per week.

What is the research evidence for the flipped classroom and active blended learning?

Freeman et al Active learning increases student performance in science, engineering, and mathematics, PNAS 111 (2014) 410–8415 is a meta-analysis of 225 studies that compared student performance in courses lectured in the traditional style and student performance courses with ‘at least some active learning’. The conclusion is clear: active learning improved student performance by about 1/2 of a standard deviation.

More strikingly, the average failure rate was 21.8% under active learning compared to 33.8% under traditional lecturing. Measures of student engagement and satisfaction were not considered in this meta-analysis; my own guess is that the reduction in failure rate is correlated with greater engagement by the weaker students.

A survey talk by Robert Talbot, updating the conclusions of his 2017 book Flipped Learning: A Guide for Higher Education, also has clear conclusions. From the linked slides (emphasis preserved):

  • Students in FL courses typically show either greater gains in measures of learning than students in traditional courses or else the differences are not statistically significant (slide 16).
  • Students show higher satisfaction with FL and the active learning techniques once FL is in place (slide 18).
  • Students often highly negative about FL when first introduced, ‘even while acknowledging benefits of increased group work, more instructor attention, and better grades’ (slide 18).

In Jacqueline O’Flaherty and Craig Philips The use of flipped classrooms in higher education, Internet and Higher Education 25 (2015) 85–95 the authors conclude in Section 4.4:

Our review indicates a number of positive student learning outcomes from this pedagogical approach,

but add:

This review found very few studies that actually demonstrated robust evidence to support that the flipped learning approach is more effective than conventional teaching methods. Only one study used empirical validation to show that a structured flipped classroom in comparison to the traditional one could effectively engage students in deep learning [Hung, Flipping the classroom for English language learners to foster active learning]. Whilst some studies referred to a modest improvement of academic performance, through outcomes of increased examination scores or improved student satisfaction, further research is required in this area …

The cited study was a randomised controlled trial comparing two flipped models with the traditional lecture model on the same content on 75 students. Taking this as the minimum standard for a reliable study is clearly a high (but not unreasonable) barrier when collecting evidence.

Finally I’ll mention a study by James McEvoy, Interactive problem-solving sessions in an introductory bioscience course engaged students and gave them feedback, but did not increase their exam scores. James ‘partially-flipped’ a Royal Holloway biology course by replacing one of two weekly lectures with an interactive problem solving class (see page 5 for details of how this was run). After the change there was a significant improvement in student responses to the two questions ‘The teaching style was engaging‘ and ‘I received feedback on my progress during the course‘. However other measures of student engagement, and (as the title makes clear) exam performance were not significantly affected; there was however a reduction in failure rate.

McEvoy’s findings are consistent with the Freeman et al study and my belief that the flipped classroom may benefit weakest students the most, by helping them to get somewhere, rather than nowhere (as alas, is too often the case in mathematics courses).

What is ‘flipped learning’ anyway?

Talbot’s suggested definition (see about half way down) is

Flipped Learning is a pedagogical approach in which first contact with new concepts moves from the group learning space to the individual learning space in the form of structured activity, and the resulting group space is transformed into a dynamic, interactive learning environment where the educator guides students as they apply concepts and engage creatively in the subject matter.

He is clear that flipped learning does not necessarily require videos, and that a methodological flaw in some metastudies is that they exclude teaching models where videos were not used. Neither does flipped learning necessarily require lecturing. Instead a key feature is that the first contact with new concepts comes from a structured activity, done on one’s own. On my reading, the instructors’s duty is to plan this activity, and then enable students to apply the ideas creatively in individual and group work. Reassuringly (see slide 20), he reports

Implementation matters but not as much as simply offloading direct instruction into structured student activity and using the class time for active learning.

How does one make the flipped classroom work?

Here are some points that I jotted down from the various talks. Sue Pawley’s talk at TALMO Is there anyone out there? A guide to interactive activities in the online environment was especially useful.

  • Make all live sessions worth attending. Don’t deliver the lecture (again), but instead get students to work in groups, learning from each other and you. Make live sessions motivating. Plan your teaching. Think context not content.
  • Getting feedback from students is difficult: do not expect many students to speak in front of their peers. They will never speak unless recording is switched off. Quizzes are good and technologically easy. I want a system where students can collaborate in small groups on a ‘digital whiteboard’; at the moment this seems to require high-end tablets.
  • Give students a ‘scaffold’. This was stressed by speakers at both events. Do not just say ‘read this’. Do not bombard students with content. Instead say ‘read this, watch that, do this quiz to check your (basic) understanding’.
  • Do not use peer review since this just adds to the fear of being wrong. But critiquing anonymous wrong answers (we could even fabricate them) can be useful.
  • As an example of what can be done, when everyone involved has a good internet connection and a latest-model iPad Pro, this video has three of the Royal Holloway mathematics staff (two of them pretending to be students) collaborating on a simple geometry problem.
  • One thought from me: in all the online talks, the convener collated questions in the chat and then selected the most useful (or most upvoted) to ask. This seems a clear improvement on the traditional model where the most assertive person gets to set the initial subject and tone of the questions.

What ideas are there for quiz questions?

George Kinnear’s talk at TALMO Using quizzes to deliver a course online had many useful ideas, going beyond the usual multiple choice model.

  • Faded worked examples: leave gaps in a model proof or solution for students to fill in.
  • Invite students to create examples (I think they will need a lot of hand-holding).
  • The ‘test effect’: recalling information from memory is beneficial (in the context of the always-connected internet generation, I think this deserves saying). For instance, before a section on integration, he got students to make a table of important functions (left column) and their derivatives (right column).

I very much hope we will be able to use the STACK computer algebra system (which integrates with Moodle) to set questions. It can validate quite complicated algebraic answers and give immediate feedback. For instance, it was used to mark the integration quiz above, and even suggest important functions the students had not used.

My own view is that quiz questions used to review material should be very basic, while questions in interactive workshops should be much tougher and focus on potential misconceptions and common errors.

What about flipping more advanced courses?

All the talks I’ve been to were on fairly basic courses: this includes Kevin Houston’s talk at an LMS Education Day in 2016. There is little research on flipping advanced courses. I believe it can work, e.g. Kinnear’s idea of the faded worked example can be adapted to the faded proof. And advanced courses offer many opportunities for non-trivial (for students) questions about relaxing hypotheses, or strengthening conclusion, as well as reversing implications, chaining implications, spotting contrapositive restatements that are helpful, and so on.

Is this consistent with academic freedom?

I hope so, but at a recent EPMS meeting, the Head of School admitted some freedom would be lost. Personally I’m keen on moving to a flipped model, but I will defend to the (academic) death the freedom of colleagues to teach as they see fit, even when I firmly believe what they are proposing will be ineffective. As the slide below from James McEvoy’s talk shows, the lecturer’s preference for the flipped/traditional model is correlated with student performance.

Student performance is correlated with lecturer's preference

I’m also concerned about the lack of consultation over the College’s model: I dislike having it patiently explained to me that what they’ve decided is for the best. (Of course consultation is not the same as listening, as we all know well.) By contrast, the Maths model has been extensively consulted on and we are still considering how some parts, e.g. group work, will work in the light of concerns of colleagues.