Perfect numbers

A perfect number is a positive integer equal to the sum of its proper divisors, all divisors less than itself. The first three examples are as follows.

6 = 1 + 2 + 3
28 = 1 + 2 + 4 + 7 + 14
496 = 1 + 2 + 4 + 8 + 16 + 31 + 62 + 124 + 248

Even and odd perfect numbers

Every known perfect number is even. No one has proved that odd perfect numbers don’t exist, but people keep proving properties than an odd perfect number must have, and maybe some day this list of properties will contain a contradiction, proving that such numbers don’t exist. For example, an odd perfect number, if it exists, must have over 1,500 digits.

Mersenne primes

A Mersenne prime is a prime number of the form 2p− 1. Euclid proved that if M is a Mersenne prime, then M(M + 1)/2 is an even perfect number [1]. Leonhard Euler proved the converse of Euclid’s theorem two millennia later: all even perfect numbers have the form M(M + 1)/2 where M is a Mersenne prime.

There are currently 52 known Mersenne primes. The number of Mersenne primes is conjectured to be infinite, and the Mersenne primes discovered so far have roughly fit the projected distribution of such primes, so there is reason to suspect that there are infinitely many perfect numbers. There are at least 52.

Triangular numbers

By Euler’s theorem, all even perfect numbers have the form M(M + 1)/2 , and so all even perfect numbers are triangular numbers.

P = 1 + 2 + 3 + … + M

Binary numbers

Even perfect numbers have the form 2p−1(2p − 1), and so this means all perfect numbers when written in binary consist of p ones followed by p − 1 zeros.

For example,

496 = 31 × 32 / 2 = 24(25 − 1)

and 496 = 111110000two, five ones followed by four zeros.

Related posts

[1] Euclid didn’t use the term “Mersenne prime” because he lived 17 centuries before Marin Mersenne, but he did prove that if 2p − 1 is prime, then 2p−1(2p − 1) is perfect.

The mathematics of GPS

The basic idea of GPS is that if you know the distance to several satellites, you can figure out your position. But you don’t actually know, or need to know, the distance to the satellites: you know the time (according to each satellite’s clock) when the signals were sent, and you know the time (according to your clock) when the signals arrived.

The atomic clocks on satellites are synchronized with each other to within a nanosecond, but they’re not synchronized with your clock. There is some offset t between your clock and the satellites’ clocks. Presumably t is small, but it matters.

If you observe m satellites, you have a system of m equations in 4 unknowns:

|| aix || = tit

where ai is the known position of the ith satellite in 3 dimensions, x is the observer’s position in three dimensions, and ti is the difference between the time when the signal left the ith satellite (according to its clock) and the time when the signal arrived (according to the observer’s clock). This assumes we choose units so that the speed of light is c = 1.

So we have a system of m equations in 4 unknowns. It’s plausible there could be a unique solution provided m = 4. However, this is not guaranteed.

Here’s an example to suggest why there may not be a unique solution. Suppose t is known to be 0. Then observing 3 satellites will give us 3 equations in 3 unknowns. Each ti determines a sphere of radius ti. Suppose two spheres intersect in a circle, and the third sphere intersects this circle in two points. This means we have two solutions to our system of equations.

In [1] the authors thoroughly study the solution to the GPS system of equations. They allow the satellites and the observer to be anywhere in space and look for conditions under which the system has a unique solution. In practice, GPS satellites are approximately confined to a sphere (more on that here) and the observer is also approximately confined to a sphere, namely the earth’s surface, but the authors do not take advantage of these assumptions.

The authors also assume the problem is formulated in n dimensional space, where n does not necessarily equal 3. It’s complicated to state when the system of equations has a unique solution, but allowing n to vary does not add to the complexity.

I’m curious whether there are practical uses for the GPS problem when n > 3. There are numerous practical problems involving the intersections of spheres in higher dimensions, where the dimensions are not Euclidean spacial dimensions but rather abstract degrees of freedom. But off hand I cannot think of a problem that would involve the time offset that GPS location finding has.

[1] Mireille Boutin, Gregor Kemperc. Global positioning: The uniqueness question and a new solution method. Advances in Applied Mathematics 160 (2024)

GPS satellite orbits

GPS satellites all orbit at the same altitude. According to the FAA,

GPS satellites fly in circular orbits at an altitude of 10,900 nautical miles (20,200 km) and with a period of 12 hours.

Why were these orbits chosen?

You can determine your position using satellites that are not in circular orbits, but with circular orbits all the satellites are on the surface of a sphere, and this insures that certain difficulties don’t occur [1]. More on that in the next post.

To maintain a circular orbit, the velocity is determined by the altitude, and this in turn determines the period. The period T is given by

T = 2\pi \sqrt{\frac{r^3}{\mu}}

where μ is the “standard gravitational parameter” for Earth, which equals the mass of the earth times the gravitational constant G.

The weakest link in calculating of T is r. The FAA site says the altitude is 20,200 km, but has that been rounded? Also, we need the distance to the center of the earth, not the altitude above the surface, so we need to add the radius of the earth. But the radius of the earth varies. Using the average radius of the earth I get T = 43,105 seconds.

Note that 12 hours is 43,200 seconds, so the period I calculated is 95 seconds short of 12 hours. Some of the difference is due to calculation inaccuracy, but most of it is real: the orbital period of GPS satellites is less than 12 hours. According to this source, the orbital period is almost precisely 11 hours 58 minutes.

The significance of 11 hours and 58 minutes is that it is half a sidereal day, not half a solar day. I wrote about the difference between a sidereal day and a solar day here. That means each GPS satellite returns to almost the same position twice a day, as seen from the perspective of an observer on the earth. GPS satellites are in a 2:1 resonance with the earth’s rotation.

(But doesn’t the earth rotate on its axis every 24 hours? No, every 23 hours 56 minutes. Those missing four minutes come from the fact that the earth has to rotate a bit more than one rotation on its axis to return to the same position relative to the sun. More on that here.)

Update: See the next post on the mathematics of GPS.

[1] Mireille Boutin, Gregor Kemperc. Global positioning: The uniqueness question and a new solution method. Advances in Applied Mathematics 160 (2024)

Ramanujan’s master theorem

Ramanujan's passport photo from 1913

A few weeks ago I wrote about the Mellin transform. Mitchell Wheat left comment saying the transform seems reminiscent of Ramanujan’s master theorem, which motivated this post.

Suppose you have a function f that is nice enough to have a power series.

f(z) = \sum_{k=0}^\infty a_k z^k

Now focus on the coefficients ak as a function of k. We’ll introduce a function φ that yields the coefficients, with a twist.

f(z) = \sum_{k=0}^\infty \frac{\varphi(k)}{k!} (-z)^k

and so φ(k) = (−1)k k! ak. Another way to look at it is that f is the exponential generating function of (−1)k φ(k).

Then Ramanujan’s master theorem gives a formula for the Mellin transform of f:

\int_0^\infty z^{s-1} f(z) \, dz = \Gamma(s) \varphi(-s)

This equation was the basis of many of Ramanujan’s theorems.

Related posts

Maybe Copernicus isn’t coming

Before Copernicus promoted the heliocentric model of the solar system, astronomers added epicycle on top of epicycle, creating ever more complex models of the solar system. The term epicycle is often used derisively to mean something ad hoc and unnecessarily complex.

Copernicus’ model was simpler, but it was less accurate. The increasingly complex models before Copernicus were refinements. They were not ad hoc, nor were they unnecessarily complex, if you must center your coordinate system on Earth.

It’s easy to draw the wrong conclusion from Copernicus, and from any number of other scientists who were able to greatly simplify a previous model. One could be led to believe that whenever something is too complicated, there must be a simpler approach. Sometimes there is, and sometimes there isn’t.

If there isn’t a simpler model, the time spent searching for one is wasted. If there is a simpler model, the time searching for one might still be wasted. Pursuing brute force progress might lead to a simpler model faster than pursuing a simpler model directly.

It all depends. Of course it’s wise to spend at least some time looking for a simple solution. But I think we’re fed too many stories in which the hero comes up with a simpler solution by stepping back from the problem.

Most progress comes from the kind of incremental grind that doesn’t make an inspiring story for children. And when there is a drastic simplification, that simplification usually comes after grinding on a problem, not instead of grinding on it.

3Blue1Brown touches on this in this video. The video follows two hypothetical problem solvers, Alice and Bob, who attack the same problem. Alice is the clever thinker and Bob is the calculating drudge. Alice’s solution of the original problem is certainly more elegant, and more likely to be taught in a classroom. But Bob’s approach generalizes in a way that Alice’s approach, as far as we know, does not.

Related posts

Trigonometric interpolation

Suppose you want to interpolate a set of data points with a combination of sines and cosines.

One way to approach this problem would be to set up a system of equations for the coefficients of the sines and cosines. If you have N data points, you will get a system of N equations in N unknowns. The system will have a unique solution, though this is not obvious a priori.

Another approach would be to use the discrete Fourier transform (DFT). This is the approach that would commonly be used in practice. It’s even further from obvious a priori that this would work, but it does. (The DFT is so often computed using the FFT algorithm that the transform is often referred to by the algorithm name. If you’d like, mentally substitute FFT for DFT in the rest of the post.)

There are multiple ways to motivate the DFT, and the way suggested by the name is to derive the DFT as a discrete approximation to the (continuous) Fourier transform. Why should a discrete approximation to an integral transform also solve an interpolation problem? This doesn’t sound inevitable, or even plausible, but it is the case.

Another way to motivate the DFT is as the least-squares solution to fitting a sum of sines and cosines to a set of points. Since this is phrased as an optimization problem rather than an interpolation problem, it is clear that it will have a solution. However, it is not clear that the error in the optimal fit will in fact be zero. Furthermore, the equation for the coefficients in the solution is the same as the equation for the DFT. You can find a derivation in [1].

Example

Let’s take the vector [3, 1, 4, 1, 5, 9] and find trig functions that pass through these points. We can use the FFT as implemented in Python’s SciPy library to find a set of complex exponentials that pass through the points.

    from scipy.fft import fft
    from numpy import exp, array, pi, round

    x = array([3, 1, 4, 1, 5, 9])
    y = fft(x)

    N = len(x)
    z = [sum([exp(2j*pi*k*n/N)*y[k] for k in range(N)])/N for n in range(N)]

Aside from rounding errors on the order of 10−15 the vector z equals the vector x.

Turning the expression for z above into a mathematical expression, we have

f(z) = y0 + y1 exp(nπi/3) + y2 exp(2nπi/3) + y3 exp(nπi) + y4 exp(4nπi/3) + y5 exp(5nπi/3)

where the y‘s come from the FFT above.

To find sines and cosines we need to use Euler’s formula

exp(iθ) = cos(θ) + i sin(θ)

Because started with real data x, there will be symmetries in the FFT components x that simplify the reduction of the complex function f to a real-valued function g using sines and cosines; some of the components will be conjugate and so the complex parts cancel out.

6 g(x) = y0 + (y1 + y5) cos(πx/3) + (y2 + y4) cos(2πx/3) + y3 cos(πx)
+ i (y1y5) sin(πx/3) + (y2y4) cos(2πx/3)

and so

g(x) = 3.833 + 0.8333 cos(πx/3) − 1.833 cos(2πx/3) + 0.1666 cos(πx)
− 2.5981 sin(πx/3) − 2.0207 cos(2πx/3)

Here’s a plot that verifies that g(x) passes through the specified points.

Related posts

[1] William L. Briggs and Van Emden Henson. The DFT: An Owner’s Manual for the Discrete Fourier Transform. SIAM 1995.

Moments with Laplace

This is a quick note to mention a connection between two recent posts, namely today’s post about moments and post from a few days ago about the Laplace transform.

Let f(t) be a function on [0, ∞) and F(s) be the Laplace transform of f(t).

F(s) = \int_0^\infty e^{-st} f(t) \,dt

Then the nth moment of f,

m_n = \int_0^\infty t^n \, f(t)\, dt

is equal to then nth derivative of F, evaluated at 0, with an alternating sign:

(-1)^n F^{(n)}(0) = m_n

To see this, differentiate with respect to s inside the integral defining the Laplace transform. Each time you differentiate you pick up a factor of −t, so differentiating n times you pick up a term (−1)n tn, and evaluating at s = 0 makes the exponential term go away.

Related posts

When do moments determine a function?

Two girls playing on a seesaw

The use of the word “moment” in mathematics is related to its use in physics, as in moment arm or moment of inertia. For a non-negative integer n, the nth moment of a function f is the integral of xn f(x) over the function’s domain.

Uniqueness

If two continuous functions f and g have all the same moments, are they the same function? The answer is yes for functions over a finite interval, but no for functions over an unbounded interval.

Existence

Now let’s consider starting with a set of moments rather than starting with a function. Given a set of moments m0, m1, m2, … is there a function that has these moments? Typically no.

A better question is what conditions on the moments are necessary for there to exist a function with these moments. This question breaks into three questions

  1. The Hausdorff moment problem
  2. The Stieltjes moment problem
  3. The Hamburger moment problem

depending on whether the function domain is a finite interval, a half-bounded interval, or the real line. For each problem there are known conditions that are necessary and sufficient, but the conditions are different for each problem.

Interestingly, each of the three names Hausdorff, Stieltjes, and Hamburger are well known. Felix Hausdorff is best known for his work in topology: Hausdorff spaces, etc. Thomas Stieltjes is best known for the Riemann-Stieltjes integral, and for his work on continued fractions. Hans Ludwig Hamburger is not as well known, though his last name is certainly familiar.

Finite moments

A practical question in probability is how well a finite number of moments determine a probability distribution. They cannot uniquely determine the distribution, but the do establish bounds for how different the two distributions can be. See this post.

Related posts

Band-limited expansion

The band-limited expansion of the function f(x) is given by

f(x) = \sum_{k=-\infty}^\infty f(kh) \, \text{sinc}\left(\frac{x - kh}{h}\right)
where sinc(x) = sin(πx)/πx. This is also called the sinc expansion, or the Whittaker cardinal after its discoverer E. T. Whittaker [1].

This is called the band-limited expansion of f because each term in the infinite sum is band-limited, i.e. only has Fourier spectrum within a finite band, because the Fourier transform of the sinc function is a step function supported between −1/2 and 1/2. [2]

The band-limited expansion has a lot of nice mathematical properties, leading Whittaker to call it “a function of royal blood in the family of entire functions, whose distinguished properties separate it from its bourgeois brethren.”

We can find a band-limited approximation for f by taking only a finite number of terms in the sum. An advantage of the band-limited approximation over a truncated Fourier series is that the former converges faster, making it useful in numerical algorithms [3]. Here’s an example of approximating the function exp(−x²) by taking h = 1 and using three terms, i.e. k running from −1 to 1.

You can improve the accuracy of the approximation by decreasing the size of h or by increasing N. This post explains how to pick the trade-off between h and N to minimize approximation error.

Related posts

[1] E. T. Whittaker, On the functions which are represented by the expansions of the interpolation theory, Proc. Roy. Soc. Edinburgh, 35 (1915), pp. 181–194.

[2] You may get a different interval of support if you use a different convention for defining the Fourier transform. Unfortunately there are many conventions.

[3] Frank Stenger. Numerical Methods based on Whittaker Cardinal, or Sinc Functions. Source: SIAM Review, Apr., 1981, Vol. 23, No. 2 (Apr., 1981), pp. 165-224

Delay differential equations

Sometimes the future state of a system depends not only on the current state (position, velocity, acceleration, etc.) but also on the previous state. Equations for modeling such systems are known as delay differential equations (DDEs), difference differential equations, retarded equations, etc. In a system with hysteresis, it matters not only where you are but how you got there.

The most basic theory of delay differential equations is fairly simple. Suppose you have an equation like the following.

u′(t) + u(t − ω) = f(t).

To uniquely determine a solution, you’d need an initial condition. And we’d need more than the value of u(0). We’d need a function g(t) that give the value of u on the entire interval [0, ω].

So we initially have the value of u over [0, ω]. Next, over the interval [ω, 2ω] the value of u(t − ω) is known. We could replace that term in the DDE with g(t), And after we’ve solved our equation over [ω, 2ω], we can use the solution to solve the equation over [2ω, 3ω]. This process is called the method of steps.

Although you can solve DDEs using the method of steps, this might not be the best approach. It might be more computationally efficient, or theoretically convenient, to use another method to solve such equations, such as Laplace transforms. The method of transforms might convince you that a solution exists, but it might not, for example, be the best way to determine the limiting behavior of solutions.