Solving a chess puzzle with Claude and Prolog

Prolog is the original logic programming language. The name comes from programming in logic. More specifically, the name comes from programmation en logique because the inventor of the language, Philippe Roussel, is French.

Prolog has its advantages and disadvantages. One of the advantages is that the language represents logical problems directly. One of the disadvantages is that the syntax can be quirky. But if an LLM is writing the code, or at least helping to write the code, the syntax doesn’t matter so much.

I wanted to see how well Claude (Sonnet 4.6, medium effort) could solve a chess puzzle by Martin Gardner that I wrote about a little over a year ago. I chose a relatively obscure problem rather than something like the Eight Queens puzzle because an LLM could simply quote one of countless articles on the puzzle.

The puzzle

As I stated in the post last year, the task is to place two rooks, two bishops, and two knights on a 4 by 4 chessboard so that no piece attacks any other.

There are two basic solutions, twelve if you count rotations and reflections as different solutions.

Prolog results

Claude wrote an SWI-Prolog program that I ran with

swipl -g "run, halt" chess_placement.pl

and it gave the following output.

Total solutions (raw): 12
Solutions up to D4 symmetry: 2
Solutions up to D4 symmetry: 2

Solution 1:
B . . N
N . . B
. R . .
. . R .

Solution 2:
. B N .
. . . R
R . . .
. N B .

Claude made a nice image of the solutions.

Note that you can make eight variations of the first solution by a combination of rotations and flips. There are only four variations of the second solution because two rotations put you back where you started.

Prolog source

Here’s the Prolog code that Claude generated.

% ============================================================
%  Non-attacking placement of 2 Rooks, 2 Bishops, 2 Knights
%  on a 4x4 chessboard
%
%  Run with SWI-Prolog:
%    swipl -g "run, halt" chess_placement.pl
%
%  Or interactively:
%    swipl chess_placement.pl
%    ?- run.
%    ?- count_solutions(N).
%    ?- count_unique(N).
% ============================================================

:- use_module(library(lists)).

% ---- Board squares: (Row, Col) with Row, Col in 0..3 ----------------

square(R, C) :- member(R, [0,1,2,3]), member(C, [0,1,2,3]).

% ---- Attack predicates ----------------------------------------------

rook_attacks(R,_,R,_).          % same row
rook_attacks(_,C,_,C).          % same column

bishop_attacks(R1,C1,R2,C2) :-
    DR is abs(R1-R2), DC is abs(C1-C2),
    DR =:= DC, DR > 0.

knight_attacks(R1,C1,R2,C2) :-
    DR is abs(R1-R2), DC is abs(C1-C2),
    ( DR =:= 1, DC =:= 2 ; DR =:= 2, DC =:= 1 ).

% attacks(+Type1, +R1, +C1, +Type2, +R2, +C2)
% True if either piece threatens the other's square.
attacks(rook,   R1,C1, _,     R2,C2) :- rook_attacks(R1,C1,R2,C2).
attacks(_,      R1,C1, rook,  R2,C2) :- rook_attacks(R1,C1,R2,C2).
attacks(bishop, R1,C1, _,     R2,C2) :- bishop_attacks(R1,C1,R2,C2).
attacks(_,      R1,C1, bishop,R2,C2) :- bishop_attacks(R1,C1,R2,C2).
attacks(knight, R1,C1, _,     R2,C2) :- knight_attacks(R1,C1,R2,C2).
attacks(_,      R1,C1, knight,R2,C2) :- knight_attacks(R1,C1,R2,C2).

% ---- All pairs safe -------------------------------------------------

% Check all unordered pairs in a list.
no_attack_pair(_, []).
no_attack_pair(T1-(R1,C1), [T2-(R2,C2)|Rest]) :-
    \+ attacks(T1,R1,C1,T2,R2,C2),
    no_attack_pair(T1-(R1,C1), Rest).

all_pairs_safe([]).
all_pairs_safe([P|Rest]) :-
    no_attack_pair(P, Rest),
    all_pairs_safe(Rest).

% ---- Generate a placement -------------------------------------------
%
%  Placement = [rook-(R1,C1), rook-(R2,C2),
%               bishop-(R3,C3), bishop-(R4,C4),
%               knight-(R5,C5), knight-(R6,C6)]
%
%  We enforce an ordering within each pair to avoid counting
%  same-type duplicates (rook1 vs rook2 swapped).
%  Squares are represented canonically: (R,C) with R*4+C as index.

sq_index(R, C, I) :- I is R*4 + C.

placement(Placement) :-
    % --- Rooks ---
    square(R1r, C1r), square(R2r, C2r),
    sq_index(R1r,C1r,I1r), sq_index(R2r,C2r,I2r),
    I1r < I2r,          % canonical order for the two rooks

    % --- Bishops ---
    square(R1b, C1b), square(R2b, C2b),
    sq_index(R1b,C1b,I1b), sq_index(R2b,C2b,I2b),
    I1b < I2b,          % canonical order for the two bishops

    % --- Knights ---
    square(R1n, C1n), square(R2n, C2n),
    sq_index(R1n,C1n,I1n), sq_index(R2n,C2n,I2n),
    I1n < I2n,          % canonical order for the two knights

    % --- All six squares distinct ---
    Squares = [(R1r,C1r),(R2r,C2r),(R1b,C1b),(R2b,C2b),(R1n,C1n),(R2n,C2n)],
    all_distinct_squares(Squares),

    % --- Build placement list ---
    Placement = [ rook-(R1r,C1r),   rook-(R2r,C2r),
                  bishop-(R1b,C1b), bishop-(R2b,C2b),
                  knight-(R1n,C1n), knight-(R2n,C2n) ],

    % --- No piece attacks any other ---
    all_pairs_safe(Placement).

all_distinct_squares([]).
all_distinct_squares([S|Rest]) :-
    \+ member(S, Rest),
    all_distinct_squares(Rest).

% ---- Count all solutions --------------------------------------------

count_solutions(N) :-
    findall(P, placement(P), Ps),
    length(Ps, N),
    format("Total solutions (raw): ~w~n", [N]).

% ---- D4 symmetry transformations on a 4x4 board ---------------------
%  Squares (R,C) in 0..3

transform(identity, R, C, R,  C).
transform(rot90,    R, C, C,  Nr) :- Nr is 3-R.
transform(rot180,   R, C, Nr, Nc) :- Nr is 3-R, Nc is 3-C.
transform(rot270,   R, C, Nc, R)  :- Nc is 3-C.
transform(flipH,    R, C, R,  Nc) :- Nc is 3-C.
transform(flipV,    R, C, Nr, C)  :- Nr is 3-R.
transform(flipD1,   R, C, C,  R).
transform(flipD2,   R, C, Nr, Nc) :- Nr is 3-C, Nc is 3-R.

apply_transform(_, [], []).
apply_transform(T, [Type-(R,C)|Rest], [Type-(NR,NC)|TRest]) :-
    transform(T, R, C, NR, NC),
    apply_transform(T, Rest, TRest).

% Canonical form: sort pieces within same-type pairs, then sort the
% whole list to get a unique representative.
canonical_placement(Placement, Canonical) :-
    findall(T, member(T,[identity,rot90,rot180,rot270,
                          flipH,flipV,flipD1,flipD2]), Ts),
    maplist(transform_and_sort(Placement), Ts, AllForms),
    msort(AllForms, Sorted),
    Sorted = [Canonical|_].

transform_and_sort(Placement, T, Sorted) :-
    apply_transform(T, Placement, TPl),
    msort(TPl, Sorted).

% ---- Count solutions up to D4 symmetry ------------------------------

count_unique(N) :-
    findall(P, placement(P), Ps),
    maplist(canonical_placement, Ps, Canonicals),
    list_to_set(Canonicals, Unique),
    length(Unique, N),
    format("Solutions up to D4 symmetry: ~w~n", [N]).

% ---- Pretty-print a board -------------------------------------------

print_board(Placement) :-
    forall(member(R, [0,1,2,3]),
           print_row(R, Placement)),
    nl.

print_row(R, Placement) :-
    forall(member(C, [0,1,2,3]),
           print_cell(R, C, Placement)),
    nl.

print_cell(R, C, Placement) :-
    (   member(rook-(R,C),   Placement) -> write('R ')
    ;   member(bishop-(R,C), Placement) -> write('B ')
    ;   member(knight-(R,C), Placement) -> write('N ')
    ;   write('. ')
    ).

% ---- Print all unique solutions -------------------------------------

print_unique_solutions :-
    findall(P, placement(P), Ps),
    maplist(canonical_placement, Ps, Canonicals),
    list_to_set(Canonicals, Unique),
    length(Unique, N),
    format("~nSolutions up to D4 symmetry: ~w~n~n", [N]),
    forall(nth1(I, Unique, Sol),
           ( format("Solution ~w:~n", [I]),
             print_board(Sol) )).

% ---- Top-level entry point ------------------------------------------

run :-
    count_solutions(Raw),
    count_unique(Sym),
    format("~n"),
    print_unique_solutions,
    format("Summary: ~w raw solutions, ~w up to D4 symmetry.~n",
           [Raw, Sym]).

Formally proving a calculation with Claude and Lean

I ran an experiment today to see whether Claude [1] could generate Lean code to prove a calculation at the bottom of this post, six lines of calculus.

I started with this prompt

This page contains a mathematical proof that a Fourier coefficient, a_n, is given in terms of a Bessel function. The LaTeX source for the SVG image is contained in the alt tag of the image. Generate a formal proof of the result using Lean.

and give it the URL of the post. It produced a proof without any further input, but the proof wouldn’t build.

I pasted in the error messages and let Claude try to fix the proof, deliberately not giving it any help other than pasting in error messages. It took eight iterations, but eventually it worked.

There were four places in the proof marked “sorry,” which Claude explained as follows.

Why the sorrys?

Real.besselJ definition or a Kepler namespace. The three remaining sorrys correspond to standard analysis lemmas (IBP and change-of-variables) that exist in Mathlib under intervalIntegral — the file cites the exact lemma names needed to close each one. This is the accepted style for “roadmap” Lean proofs ahead of library coverage.

The full proof is given below.

/-
  KeplerBessel.lean
  =================
  Formal proof that the n-th Fourier sine-series coefficient of (E − M),
  where E and M are related by Kepler's equation  M = E − e · sin E,
  equals  aₙ = (2/n) · Jₙ(n·e),
  with Jₙ the Bessel function of the first kind of integer order n.

  Mathematical content
  --------------------
  We expand  E(M) − M  in a sine series on [0, π]:

      E(M) − M = Σ_{n=1}^∞  aₙ · sin(n·M)

  The standard Fourier formula gives

      aₙ = (2/π) ∫₀^π (E(M) − M) sin(n·M) dM.

  Integrating by parts (boundary terms vanish because E(0)=0 and E(π)=π):

      aₙ = (2/(nπ)) ∫₀^π (E'(M) − 1) cos(n·M) dM
         = (2/(nπ)) ∫₀^π E'(M) cos(n·M) dM     -- the "−1" term vanishes

  Changing variable M ↦ E via M = E − e·sin E  (so E'(M) dM = dE):

      aₙ = (2/(nπ)) ∫₀^π cos(n·E − n·e·sin E) dE
         = (2/n) · Jₙ(n·e).

  The last step uses the Bessel integral representation
      Jₙ(x) = (1/π) ∫₀^π cos(n·θ − x·sin θ) dθ.
-/

import Mathlib

open Real MeasureTheory intervalIntegral Filter Set

noncomputable section

/-! ---------------------------------------------------------------
    §1  Variables
    --------------------------------------------------------------- -/

variable (e : ℝ) (he : 0 ≤ e) (he1 : e < 1) /-! --------------------------------------------------------------- §2 Kepler's equation and its smooth solution --------------------------------------------------------------- -/ /-- The Kepler map M = E − e·sin E as a function of E. -/ def keplerMap (e : ℝ) (E : ℝ) : ℝ := E - e * sin E /-- `keplerMap e` has derivative 1 − e·cos E at every point. -/ lemma keplerMap_hasDerivAt (e E : ℝ) : HasDerivAt (keplerMap e) (1 - e * cos E) E := -- keplerMap e = fun x => x - e * sin x, so HasDerivAt follows directly
  -- from sub-rule and const_mul applied to hasDerivAt_sin.
  (hasDerivAt_id E).sub ((hasDerivAt_sin E).const_mul e)

/-- The derivative of `keplerMap e` is positive when e < 1. -/
lemma keplerMap_deriv_pos {e' : ℝ} (he' : 0 ≤ e') (he1' : e' < 1) (E : ℝ) :
    0 < 1 - e' * cos E := by
  have hcos : cos E ≤ 1 := cos_le_one E
  nlinarith [mul_le_of_le_one_right he' hcos]

/-- `keplerMap e` is strictly monotone when e < 1.
    Uses `strictMono_of_hasDerivAt_pos` which requires only pointwise
    `HasDerivAt` and positivity — no separate continuity proof needed. -/
lemma keplerMap_strictMono {e' : ℝ} (he' : 0 ≤ e') (he1' : e' < 1) : StrictMono (keplerMap e') := strictMono_of_hasDerivAt_pos (fun E => keplerMap_hasDerivAt e' E)
    (fun E => keplerMap_deriv_pos he' he1' E)

/-!
  We axiomatise the inverse  eccAnom : ℝ → ℝ → ℝ  and its key
  properties, all of which follow from the Inverse Function Theorem
  applied to the smooth, strictly monotone map  keplerMap e.
-/

/-- The eccentric anomaly: the smooth right-inverse of `keplerMap e`. -/
axiom eccAnom (e : ℝ) : ℝ → ℝ

/-- `eccAnom e M` satisfies Kepler's equation. -/
axiom eccAnom_kepler (e M : ℝ) :
    keplerMap e (eccAnom e M) = M

/-- `eccAnom e` is differentiable, derivative = 1/(1 − e·cos(eccAnom e M)). -/
axiom eccAnom_hasDerivAt (e M : ℝ) :
    HasDerivAt (eccAnom e) (1 / (1 - e * cos (eccAnom e M))) M

/-- Boundary value at 0. -/
axiom eccAnom_zero (e : ℝ) : eccAnom e 0 = 0

/-- Boundary value at π. -/
axiom eccAnom_pi (e : ℝ) : eccAnom e π = π

/-! ---------------------------------------------------------------
    §3  Bessel function of the first kind (integer order)

    Defined by the classical integral representation.
    --------------------------------------------------------------- -/

/-- Bessel function J_n(x) via its integral representation. -/
def besselJ (n : ℕ) (x : ℝ) : ℝ :=
  (1 / π) * ∫ θ in (0 : ℝ)..π, cos (↑n * θ - x * sin θ)

/-! ---------------------------------------------------------------
    §4  Fourier coefficient

    Named  keplerFourierCoeff  to avoid clashing with Mathlib's own
    `fourierCoeff` which is defined on  AddCircle.
    --------------------------------------------------------------- -/

/-- The n-th Fourier sine coefficient of  eccAnom e M − M  on [0,π]. -/
def keplerFourierCoeff (e : ℝ) (n : ℕ) : ℝ :=
  (2 / π) * ∫ M in (0 : ℝ)..π,
    (eccAnom e M - M) * sin (↑n * M)

/-! ---------------------------------------------------------------
    §5  Main theorem
    --------------------------------------------------------------- -/

/--
  **Main theorem.** For n ≥ 1, the Fourier sine coefficient of the
  eccentric-anomaly displacement satisfies  aₙ = (2/n) · Jₙ(n·e).
-/
theorem keplerFourierCoeff_eq_besselJ (n : ℕ) (hn : 1 ≤ n) :
    keplerFourierCoeff e n = (2 / (n : ℝ)) * besselJ n (↑n * e) := by
  simp only [keplerFourierCoeff, besselJ]
  -- Goal:
  --   (2/π) · ∫₀^π (E(M)−M)·sin(nM) dM
  --   = (2/n) · (1/π) · ∫₀^π cos(nθ − ne·sinθ) dθ

  -- ── Step 1: Integration by parts ─────────────────────────────────────
  -- u = E(M)−M,  dv = sin(nM)dM  →  v = −cos(nM)/n
  -- Boundary: [uv]₀^π = 0 by eccAnom_zero, eccAnom_pi.
  -- Result: (2/π)∫(E−M)sin(nM)dM = (2/π)(1/n)∫(E'(M)−1)cos(nM)dM
  --
  -- Mathlib lemma: intervalIntegral.integral_mul_deriv
  --   (or integral_deriv_mul_eq_sub_of_hasDerivAt applied to
  --    u = eccAnom e − id,  v = −sin(n··)/n)
  have step1 :
      (2 / π) * ∫ M in (0 : ℝ)..π, (eccAnom e M - M) * sin (↑n * M)
      = (2 / π) * (1 / ↑n) *
          ∫ M in (0 : ℝ)..π, (deriv (eccAnom e) M - 1) * cos (↑n * M) := by
    sorry

  -- ── Step 2: The "−1" integral vanishes ───────────────────────────────
  -- ∫₀^π cos(nM) dM = [sin(nM)/n]₀^π = 0  (integer n ≥ 1)
  -- Mathlib: integral_cos, Real.sin_nat_mul_pi
  have cos_int_zero :
      ∫ M in (0 : ℝ)..π, cos (↑n * M) = 0 := by
    sorry

  have step2 :
      ∫ M in (0 : ℝ)..π, (deriv (eccAnom e) M - 1) * cos (↑n * M)
      = ∫ M in (0 : ℝ)..π, deriv (eccAnom e) M * cos (↑n * M) := by
    have key : ∀ M : ℝ, (deriv (eccAnom e) M - 1) * cos (↑n * M)
                       = deriv (eccAnom e) M * cos (↑n * M) - cos (↑n * M) := by
      intro M; ring
    simp_rw [key]
    rw [intervalIntegral.integral_sub _ _]
    · rw [cos_int_zero, sub_zero]
    · -- IntervalIntegrable (deriv (eccAnom e) · cos(n··))
      sorry
    · exact (continuous_cos.comp (continuous_const.mul continuous_id')).intervalIntegrable 0 π

  -- ── Step 3: Change of variable M ↦ E via Kepler's equation ───────────
  -- Under M = E − e·sin E:  E'(M) dM = dE,  cos(nM) = cos(nE − ne·sinE)
  -- Mathlib: MeasureTheory.integral_image_eq_integral_abs_deriv_smul
  --       or intervalIntegral.integral_comp_deriv
  have step3 :
      ∫ M in (0 : ℝ)..π, deriv (eccAnom e) M * cos (↑n * M)
      = ∫ E in (0 : ℝ)..π, cos (↑n * E - ↑n * e * sin E) := by
    sorry

  -- ── Step 4: Recognise the Bessel integral ────────────────────────────
  -- (1/π)∫₀^π cos(nE − ne·sinE)dE = J_n(ne)  by definition.
  have step4 :
      (1 / π) * ∫ E in (0 : ℝ)..π, cos (↑n * E - ↑n * e * sin E)
      = besselJ n (↑n * e) := by
    simp only [besselJ]

  -- ── Combine all steps ────────────────────────────────────────────────
  have hpi : π > 0 := Real.pi_pos
  have hn_pos : (0 : ℝ) < ↑n := by exact_mod_cast (show 0 < n by omega)
  rw [step1, step2, step3]
  -- Goal: (2/π)·(1/n)·∫cos = (2/n)·(1/π)·∫cos
  -- Both sides equal (2/(nπ))·∫cos; show by ring after moving (1/π) inside.
  have factored : (2 / π) * (1 / ↑n) *
      ∫ E in (0 : ℝ)..π, cos (↑n * E - ↑n * e * sin E)
      = (2 / ↑n) * ((1 / π) *
      ∫ E in (0 : ℝ)..π, cos (↑n * E - ↑n * e * sin E)) := by
    ring
  rw [factored, step4]

end

/-! ---------------------------------------------------------------
    §6  Summary of proof obligations

    Axioms (all standard consequences of the Inverse Function Theorem
    applied to the smooth, strictly increasing map  keplerMap e):
      • eccAnom           — existence of the smooth inverse
      • eccAnom_kepler    — it satisfies Kepler's equation
      • eccAnom_hasDerivAt — C¹ with derivative 1/(1 − e·cos E)
      • eccAnom_zero      — E(0) = 0
      • eccAnom_pi        — E(π) = π

    `sorry`s (each closes with a standard Mathlib lemma):
      • step1        IBP via
                      intervalIntegral.integral_deriv_mul_eq_sub_of_hasDerivAt
      • cos_int_zero  ∫₀^π cos(nM)dM = 0 via
                      integral_cos + Real.sin_nat_mul_pi
      • step2 integrability  IntervalIntegrable for deriv(eccAnom e)·cos(n·)
      • step3        Change of variables via
                      MeasureTheory.integral_image_eq_integral_abs_deriv_smul

    §7  Finding minimal imports

    Once the file builds cleanly, add at the bottom:

        #min_imports

    and the Lean Infoview will report the exact minimal import list
    for the version of Mathlib you have installed.
    --------------------------------------------------------------- -/

[1] I used the default setting: Sonnet 4.6, low effort, “thinking” turned off.

Pulling on a thread

Often there’s a thread running through a sequence of my posts. Sometimes I make this explicit and sometimes I don’t.

The latest thread started with this post commenting on a tweet that observed that

exp(−x²) ≈ (1 + cos(sin(x) + x))/2.

Some people said online that that the approximation is simply due to the first few terms of the Taylor series on both sides matching up, so I wrote a follow up post explaining that it’s not that simple.

The series for the left hand side alternates and converges very slowly, which lead to the post on naively summing an alternating series.

The series for the right hand side lead to this point on partitions over permutations.

Integrating the right hand side lead to this post on how the simplest numerical integration rule works shockingly well on some problems.

The exact value of the integral turns out to be given by a Bessel function, details given in this post.

Mr. Bessel’s interest in the functions now named after him started with looking closely at a solution to Kepler’s equation in orbital mechanics. Thinking about Kepler’s equation lead to the posts on the Laplace limit and on series acceleration.

I may be done pulling on this thread. I don’t have anything else in mind that I want to explore for now, but you never know.

Aitken acceleration before Aitken

Kepler solved his eponymous equation

ME − e sin(E)

by finding a fixed point of

E = M + e sin(E).

So guess a value of E and stick it into the right hand side. Then plug that value into the right hand side again. Kepler said a couple iterations should be enough. And a couple iterations are enough if the eccentricity e is small and you don’t need much accuracy.

The rate of convergence is determined by e. Kepler implicitly had in mind small values of e because he wasn’t aware of anything orbiting the sun in a highly elliptical orbit. Here’s an example with eccentricity 0.05, about the eccentricity of the orbits of Jupiter and Saturn.

from math import sin

M, E, e = 1, 1, 0.05
for _ in range(5):
    E = M + e*sin(E)
    residual = M - (E - e*sin(E))
    print(residual)

The residual after just two iterations is 2.77 × 10−5. If you change e to 0.2, the eccentricity of Mercury’s orbit, it takes three iterations to get comparable accuracy. Mercury has the most eccentric orbit of any object Kepler would have known about.

Now suppose you’d like to solve for E when M = 1 for Halley’s comet, and you’d like an error of less than 10−8. Now you need 16 iterations.

C. F. W. Peters discovered a faster algorithm in 1891.

E1 = M + e sin(E0)
E2 = M + e sin(E1)
E3 = (E2 E0E1²)/(E2 − 2E1 + E0)

Let’s look at the results of doing three iterations of Peters’ method for Halley’s comet.

M, E0, e = 1, 1, 0.967
for _ in range(3):
    E1 = M + e*sin(E0)
    E2 = M + e*sin(E1)
    E3 = (E2*E0 - E1**2)/(E2 - 2*E1 + E0)
    residual = M - (E - e*sin(E3))
    print(residual)
    E0 = E3 # for next iteration

This gives a residual of −7.23 × 10−10. Each iteration of Peters’ method requires a little more than twice as much work as an iteration of Kepler’s method, but 3 iterations of Peters’ method accomplished more than 16 iterations of Kepler’s method.

Peters’ algorithm from 1891 was a special case of Alexander Aitken’s series acceleration method published in 1926.

Related posts

The Laplace limit

An earlier post discussed how to solve Kepler’s equation

ME − e sin(E)

using a sine series. You could also solve Kepler’s equation using a power series, which Lagrange did in 1771. Both approaches express E as a function of e and M, but from different perspectives. Bessel thought of his solution as a sum of sines in M, with coefficients that depend on e. Lagrange thought of his solution as a power series in e whose coefficients involve sines in M. You can rearrange the terms of either solution into the other.

The most interesting thing about the power series solution, in my opinion, is that it only converges for e less than roughly 2/3 while the sine series solution is valid for all e < 1. In astronomical terms, this means the power series solution works for the orbit of some planets but not others!

In our solar system, the planets all have eccentricity well below 2/3, but not all minor planets do. For example, the orbit of Eris has eccentricity 0.4407 but the orbit of Sedna has eccentricity 0.8549. And in other solar systems there are planets with eccentricity much greater than 2/3.

The Laplace limit

The radius of convergence for Lagrange’s power series solution is called the Laplace limit. Its value is eL = 0.6627…. There’s no obvious reason why there’s anything special about this value. There’s no astronomical reason for this value. It’s an artifact of the power series form of the solution.

If the series works for e = 0.66, you would reasonably think it works for e = 0.67, but that’s not the case. And if you’re observant, you might notice that although the series works for e = 0.66, it takes longer to converge than for smaller values of e; the rate of convergence is slowing down, warning you of danger ahead.

The exact value of eL is the unique real solution to the equation

x \exp\left(\sqrt{1 + x^2}\right) = 1 + \sqrt{1 + x^2}

There’s no obvious reason for this either. It has to do with finding the largest circle that can fit in a lens-shaped region of convergence. More on that here.

We can calculate eL with the following Python code.

from math import exp
from scipy.optimize import root_scalar

def f(x):
    t = (1 + x*x)**0.5
    return x*math.exp(t) - 1 - t

sol = root_scalar(f, bracket=[0, 1], method='brentq')
print(sol.root)

This prints 0.6627434193491817.

Series details

We can use the Lagrange inversion formula to find the series, just as Lagrange did two and a half centuries ago.

E = M+ \sum_{n=1}^{\infty} \frac{e^n}{n!} \frac{d^{\,n-1}}{dM^{\,n-1}} \left(\sin^n M\right)

The powers of sine can be expanded into the sum of sines of various frequencies and differentiated, leading to the equation

E = M+ \sum_{n=1}^{\infty} \frac{e^n}{2^{\,n-1}n!} \sum_{k=0}^{\lfloor n/2\rfloor} (-1)^k \binom{n}{k} (n-2k)^{n-1} \sin\!\big((n-2k)M\big)

 

A crank formula for π

I ran across a cranky formula for π based on physical constants here

\pi = \left( \frac{E}{\frac{1}{2}mc^{2}} \right)^{1/2} \left[ J_{\lambda} \cdot \lambda^{5} \left( e^{\frac{hc}{\lambda kT}} - 1 \right) \right]

and decided to play around with it.

The source describes λ as “wavelength (chosen in the microwave region)” and I thought perhaps you could chose a value of λ to make the equation work. But as a comment pointed out, the bracketed expression is simply 2hc², independent of λ, due to Planck’s blackbody law. That means we can simplify the expression above to

\pi = 2\sqrt{2} hc \sqrt{\frac{E}{m}}

Now the values of h and c are known. In fact, they’re now exactly known by definition: other SI units are defined in terms of h and c. The mass of an electron is known to 11 significant figures.

But E in the equation above is “Total energy of the universe.” I don’t even know what that means. Does it refer to the observable universe? Does it include dark energy? Does it include the energy equivalent of mass?

I asked a couple LLMs that the total energy of the universe might mean and what its value might be, and they said something like “Depends. It might be zero. It might be infinite. But if I had to say, I’d say around 1070 Joules.”

If we solve the equation above for E we get 2.8480347886530404 × 1019 Joules. I have no idea how to justify that.

The expression for π is not dimensionless. I suppose you could choose some nonstandard units that would make the equation work.

The source I linked to above cites Mathematical Cranks by Underwood Dudley, but I couldn’t find it in the book.

Related posts

From Kepler to Bessel

The previous post very briefly said that the integral representation for Bessel functions was motived by solving Kepler’s equation. This post will go into more detail.

Kepler’s equation

There are multiple ways to describe the position of a planet in an elliptical orbit around a star. For historical reasons, these descriptions have arcane names such as mean anomaly, true anomaly, and eccentric anomaly. This post explains how these three are related.

For this post, it is enough to say that often you know mean anomaly M and want to know eccentric anomaly E. These are related via Kepler’s equation

M = E - e \sin E
where e is the eccentricity of the orbit. You’d like to solve for E as a function of M and e, but there’s no elementary way to do that.

One way to solve Kepler’s equation is to take a guess at E and plug it into the right hand side of

E = M + e \sin E
to get a new E, and keep iterating until the two sides are closer together. I write more about this here.

Another approach to solving Kepler’s equation is to use Newton’s method. I write more about that here.

Still another approach is to expand E in a sine series and find the series coefficients. An advantage to this approach is that once you have the coefficients, you have an expression for E as a function of M, and you can plug in more values of M without having to solve Kepler’s equation for each value of M separately.

Sine series coefficients

Kepler’s equation is easy to solve at E = 0 and at E = π. In both cases, EM. So the function E − M is zero at both ends of [0, π], which suggests we try to expand E − M in a sine series

E - M = \sum_{n=1}^\infty a_n \sin nM

We then calculate the Fourier coefficients an as usual.

\begin{align*} a_n &= \frac{2}{\pi} \int_0^\pi (E-M) \sin(nM) \, dM \\ &= \frac{2}{n \pi} \int_0^\pi (E^\prime - 1) \cos(nM)\, dM \\ &= \frac{2}{n \pi} \int_0^\pi \cos(nM) E^\prime(M) \, dM \\ &= \frac{2}{n \pi} \int_0^\pi \cos\Big(nE - ne\sin(E)\Big) E^\prime(M) \, dM \\ &= \frac{2}{n} \left\{ \frac{1}{\pi} \int_0^\pi \cos\Big(nE - ne \sin(E)\Big)\, dE\right\} \\ &= \frac{2}{n} J_n(ne) \end{align*}

The second line uses integration by parts. The third line uses Kepler’s equation. The last line uses the definition of the Bessel functions Jn given in the previous post.

Mr. Bessel’s eponymous functions

Yesterday I wrote a post showing that the trapezoid rule evaluates the integral

\int_{-\pi}^\pi \cos(\sin(x) + x)\, dx

very efficiently. But how do we know what the exact integral is for comparison? If you ask Mathematica, it will tell you the integral equals −2π J1(1) where J1 is a Bessel function. This may seem like rabbit out of a hat, but it’s actually a simple calculation given the integral definition of Bessel functions:

J_n(z) = \frac{1}{\pi}\int_0^\pi \cos(n\theta - z\sin(\theta))\, d\theta

Since cosine is even, we can write our integral over [−π, π] as twice the integral over [0, π]. Then a change of variables turns this into the definition of Jn(z) with n = 1 and z = 1.

A deeper question is what have we accomplished by just giving a new name to essentially the same problem we started with. Another question is why in the world are Bessel functions defined as above.

As for what we’ve accomplished, we’ve related out integration problem to a very well-studied function. Bessel functions have been studied for two centuries and it’s easy to find software to evaluate them. Even the usually minimalist command line calculator bc has a function j(x, n) for evaluating Jn(x) for integer values of n. We could calculate our integral to 50 decimal places as follows.

~$ bc -l
>>> scale = 50
>>>  -8*a(1)*j(1,1)
-2.76491937476833705153256665538788207487495025542883

Note that bc doesn’t have a value of π built in, but a(x) evaluates the arctangent function, and π = 4 arctan(1).

There are multiple ways of defining Bessel functions. The three main ways would be in terms of their power series, in terms of the differential equations they solve, and in terms of their integral representation. Friedrich Bessel defined what we now call Bessel functions of the first kind, the Jn functions, in terms of their integral representations.

Why did Bessel care about these integrals? They came out of his calculations in celestial mechanics. One example of this is solving Kepler’s equation with Fourier series; the Fourier coefficients are given by Bessel functions. This is worked out in detail in the next post.

Bessel functions had occurred in applications before Mr. Bessel drew attention to them. The functions are named after him because he was the first to systematically study them.

Mathematics is developed inductively but taught deductively. It’s common for things to be kicked around for years before someone decides they deserve a name and systematic study. See this post on the central limit theorem for another example. The CLT is older than the Gaussian distribution, even older than Gauss.

Related posts

The Latin of Linux

One reason people study Latin is that it is the ancestor of many modern languages. English derives from West Germanic languages, not from Latin, but much of English vocabulary, perhaps as much as 60%, derives from Latin, either directly or indirectly through French.

Knowing a bit of Latin makes sense of many things that would otherwise seem completely arbitrary, such as why the symbols for gold, silver, and lead are Au, Ag, and Pb respectively.

Similarly, ed(1) is the Latin of Linux [1]. Many conventions in command line utilities follow conventions that go back to the ed(1) line editor. They may go back even further. Just as Latin didn’t come out of nowhere, neither did ed(1), but you can’t go back indefinitely. It’s convenient to start history somewhere, and this post will start with ed(1) just as much discussion of Western linguistics starts with Latin.

The following are features of ed(1) that live on in sed, awk, grep, vi, perl, bash, etc.

  1. Using slashes to delimit regular expressions
  2. Using $ to indicate the end of a line or the end of a file
  3. The pattern of specifying address + action or address range + action
  4. Using regular expressions as address ranges
  5. Using \1, \2, etc to refer to regex captures
  6. Using & to refer to the entire matched text
  7. The g/regexp/command pattern
  8. Using p for printing lines, as in g/re/p
  9. The commands a, c, d, i, j, l, p, q, r, and w in vi
  10. ! for shell escape

 

[1] Because the name “ed” is so short, and looks so much like the name Ed, it’s convenient to use its full Unix name ed(1). The parenthesized number is used to disambiguate different things that have the same name, such as the user command kill(1) and the system call kill(2). There is no ed(2) or any other higher-numbered ed. The number is there to make the name stand out, not to disambiguate anything.

Integrating smooth periodic functions

Several posts lately have looked at the function

f(x) = cos(sin(x) + x).

This post will look at the function from a different angle. It’s a smooth function with period 2π. For reasons I wrote about here, this means that the trapezoid rule should find its integral very efficiently.

In general, the error in the trapezoid rule is on the order of 1/N² where N is the number of integration points. To be more specific, the error in integrating a function f over [a, b] with N points is bounded by

(baM / 12N²

where M is the maximum absolute value of the second derivative of f. So in our case we should expect the error to be less than 82.67/N². In fact we do much better than that. The error does not decrease quadratically, as it does in general, but exponentially.

With just 16 integration points, we’ve reached the limit of floating point representation.