Solving hard problems

We help companies solve hard problems in mathematics, statistics, and computing. Let’s explore how we might work together.

GPS satellite orbits

Posted on 15 November 2024 by John

GPS satellites all orbit at the same altitude. According to the FAA,

GPS satellites fly in circular orbits at an altitude of 10,900 nautical miles (20,200 km) and with a period of 12 hours.

Why were these orbits chosen?

You can determine your position using satellites that are not in circular orbits, but with circular orbits all the satellites are on the surface of a sphere, and this insures that certain difficulties don’t occur [1]. More on that in the next post.

To maintain a circular orbit, the velocity is determined by the altitude, and this in turn determines the period. The period T is given by

$T = 2\pi \sqrt{\frac{r^3}{\mu}}$

where μ is the “standard gravitational parameter” for Earth, which equals the mass of the earth times the gravitational constant G.

The weakest link in calculating of T is r. The FAA site says the altitude is 20,200 km, but has that been rounded? Also, we need the distance to the center of the earth, not the altitude above the surface, so we need to add the radius of the earth. But the radius of the earth varies. Using the average radius of the earth I get T = 43,105 seconds.

Note that 12 hours is 43,200 seconds, so the period I calculated is 95 seconds short of 12 hours. Some of the difference is due to calculation inaccuracy, but most of it is real: the orbital period of GPS satellites is less than 12 hours. According to this source, the orbital period is almost precisely 11 hours 58 minutes.

The significance of 11 hours and 58 minutes is that it is half a sidereal day, not half a solar day. I wrote about the difference between a sidereal day and a solar day here. That means each GPS satellite returns to almost the same position twice a day, as seen from the perspective of an observer on the earth. GPS satellites are in a 2:1 resonance with the earth’s rotation.

(But doesn’t the earth rotate on its axis every 24 hours? No, every 23 hours 56 minutes. Those missing four minutes come from the fact that the earth has to rotate a bit more than one rotation on its axis to return to the same position relative to the sun. More on that here.)

Update: See the next post on the mathematics of GPS.

[1] Mireille Boutin, Gregor Kemperc. Global positioning: The uniqueness question and a new solution method. Advances in Applied Mathematics 160 (2024)

Ramanujan’s master theorem

Posted on 14 November 2024 by John

Ramanujan's passport photo from 1913

A few weeks ago I wrote about the Mellin transform. Mitchell Wheat left comment saying the transform seems reminiscent of Ramanujan’s master theorem, which motivated this post.

Suppose you have a function f that is nice enough to have a power series.

$f(z) = \sum_{k=0}^\infty a_k z^k$

Now focus on the coefficients a_k as a function of k. We’ll introduce a function φ that yields the coefficients, with a twist.

$f(z) = \sum_{k=0}^\infty \frac{\varphi(k)}{k!} (-z)^k$

and so φ(k) = (−1)^k k! a_k. Another way to look at it is that f is the exponential generating function of (−1)^k φ(k).

Then Ramanujan’s master theorem gives a formula for the Mellin transform of f:

$\int_0^\infty z^{s-1} f(z) \, dz = \Gamma(s) \varphi(-s)$

This equation was the basis of many of Ramanujan’s theorems.

Linear combination of sine and cosine as phase shift

Posted on 12 November 2024 by John

Here’s a simple calculation that I’ve done often enough that I’d like to save the result for my future reference and for the benefit of anyone searching on this.

A linear combination of sines and cosines

a sin(x) + b cos(x)

can be written as a sine with a phase shift

A sin(x + φ).

Going between {a, b} and {A, φ} is the calculation I’d like to save. For completeness I also include the case

A cos(x + ψ).

Derivation

Define

f(x) = a sin(x) + b cos(x)

and

g(x) = A sin(x + φ).

Both functions satisfy the differential equation

y″ + y = 0

and so f = g if and only if f(0) = g(0) and f′(0) = g′(0).

Setting the values at 0 equal implies

b = A sin(φ)

and setting the derivatives at 0 equal implies

a = A cos(φ).

Taking the ratio of these two equations shows

b/a = tan(φ)

and adding the squares of both equations shows

a² + b² = A².

Equations

First we consider the case

a sin(x) + b cos(x) = A sin(x + φ).

Sine with phase shift

If a and b are given,

A = √(a² + b²)

and

φ = tan⁻¹(b / a).

If A and φ are given,

a = A cos(φ)

and

b = A sin(φ)

from the previous section.

Cosine with phase shift

Now suppose we want

a sin(x) + b cos(x) = A cos(x + ψ)

If a and b are given, then

A = √(a² + b²)

as before and

ψ = − tan⁻¹(a / b).

If A and ψ are given then

a = − A sin(ψ)

and

b = A cos(ψ).

Resolving a mysterious problem with find

Posted on 12 November 2024 by John

Suppose you want to write a shell script searches the current directory for files that have a keyword in the name of the file or in its contents. Here’s a first attempt.

find . -name '*.py' -type f -print0 | grep -i "$1"
find . -name '*.py' -type f -print0 | xargs -0 grep -il "$1"

This works well for searching file contents but behaves unexpectedly when searching for file names.

If I have a file named frodo.py in the directory, the script will return

grep: (standard input): binary file matches

Binary file matches?! I wasn’t searching binary files. I was searching files with names consisting entirely of ASCII characters. Where is a binary file coming from?

If we cut off the pipe at the end of the first line of the script and run

find . -name '*.py' -type f -print0

we get something like

.elwing.py/.frodo.py/gandalf.py

with no apparent non-ASCII characters. But if we pipe the output through xxd to see a hex dump, we see that there are invisible null characters after each file name.

One way to fix our script would be to add a -a option to the call to grep, telling to treat the input as ASCII. But this will return the same output as above. The output of find is treated as one long (ASCII) string, which matches the regular expression.

Another possibility would be to add a -o flag to direct grep to return just the match. But this is less than ideal as well. If you were looking for file names containing a Q, for example, you’d get Q as your output, which doesn’t tell you the full file name.

There may be better solutions [1], but my solution was to insert a call to strings in the pipeline:

find . -name '*.py' -type f -print0 | strings | grep -i "$1"

This will extract the ASCII strings out of the input it receives, which has the effect of splitting the string of file names into individual names.

By default the strings command defines an ASCII string to be a string of 4 or more consecutive ASCII characters. A file with anything before the .py extension will necessarily have at least four characters, but the analogous script to search C source files would overlook a file named x.c. You could fix this by using strings -n 3 to find sequences of three or more ASCII characters.

If you don’t have the strings command installed, you could use sed to replace the null characters with newlines.

find . -name '*.py' -type f -print0 | sed 's/\x0/\n/g' | grep -i "$1"

Note that the null character is denoted \x0 rather than simply \0.

[1] See the comments for better solutions. I really appreciate your feedback. I’ve learned a lot over the years from reader comments.

The Postage Stamp Problem

Posted on 10 November 2024 by John

I recently stumbled upon the Postage Stamp Problem. Given two relatively prime positive numbers a and b, show that any sufficiently large number N, there exists nonnegative integers x and y such that

ax + by = N.

I initially missed the constraint that x and y must be positive, in which result is well known (Bézout’s lemma) and there’s no requirement for N to be large. The positivity constraint makes things more interesting.

5 cent and 21 cent stamps

The problem is called the Postage Stamp Problem because it says that given any two stamps whose values are relatively prime, say a 5¢ stamp and a 21¢ stamp, you can make any sufficiently large amount of postage using just those two stamps.

A natural question is how large is “sufficiently large,” and the answer turns out to be all integers larger than

ab − a − b.

So in our example, you cannot make 79¢ postage out of 5¢ and 21¢ stamps, but you can make 80¢ or any higher amount.

If you’ve been reading this blog for a while, you may recognize this as a special case of the Chicken McNugget problem, which you can think of as the Postage Stamp problem with possibly more than two stamps.

Impersonating an Edwardian math professor

Posted on 10 November 2024 by John

I’ve read some math publications from around a century or so ago, and I wondered if I could pull off being a math professor if a time machine dropped me into a math department from the time. I think I’d come across as something of an autistic savant, ignorant of what contemporaries would think of as basic math but fluent in what they’d consider more advanced.

There are two things in particular that were common knowledge at the time that I would be conspicuously ignorant of: interpolation tricks and geometry.

People from previous eras knew interpolation at a deeper level than citing the Lagrange interpolation theorem, out of necessity. They learned time-saving tricks have since been forgotten.

The biggest gap in my knowledge would be geometry. Mathematicians a century ago had a far deeper knowledge of geometry, particularly synthetic geometry, i.e. geometry in the style of Euclid rather than in the style of Descartes.

Sometimes older math books use notation or terminology that has since changed. I imagine I’d make a few gaffs, not immediately understanding a basic term or using a term that wasn’t coined until later.

If I had to teach a class, I’d choose something like real and complex analysis. Whittaker & Watson’s book on the subject was first published in 1902 and remains a common reference today. The only thing I find jarring about that book is that “show” is spelled “shew.” Makes me think of Ed Sullivan. But I think I’d have a harder time teaching a less advanced class.

Maybe Copernicus isn’t coming

Posted on 5 November 2024 by John

Before Copernicus promoted the heliocentric model of the solar system, astronomers added epicycle on top of epicycle, creating ever more complex models of the solar system. The term epicycle is often used derisively to mean something ad hoc and unnecessarily complex.

Copernicus’ model was simpler, but it was less accurate. The increasingly complex models before Copernicus were refinements. They were not ad hoc, nor were they unnecessarily complex, if you must center your coordinate system on Earth.

It’s easy to draw the wrong conclusion from Copernicus, and from any number of other scientists who were able to greatly simplify a previous model. One could be led to believe that whenever something is too complicated, there must be a simpler approach. Sometimes there is, and sometimes there isn’t.

If there isn’t a simpler model, the time spent searching for one is wasted. If there is a simpler model, the time searching for one might still be wasted. Pursuing brute force progress might lead to a simpler model faster than pursuing a simpler model directly.

It all depends. Of course it’s wise to spend at least some time looking for a simple solution. But I think we’re fed too many stories in which the hero comes up with a simpler solution by stepping back from the problem.

Most progress comes from the kind of incremental grind that doesn’t make an inspiring story for children. And when there is a drastic simplification, that simplification usually comes after grinding on a problem, not instead of grinding on it.

3Blue1Brown touches on this in this video. The video follows two hypothetical problem solvers, Alice and Bob, who attack the same problem. Alice is the clever thinker and Bob is the calculating drudge. Alice’s solution of the original problem is certainly more elegant, and more likely to be taught in a classroom. But Bob’s approach generalizes in a way that Alice’s approach, as far as we know, does not.

Trigonometric interpolation

Posted on 5 November 2024 by John

Suppose you want to interpolate a set of data points with a combination of sines and cosines.

One way to approach this problem would be to set up a system of equations for the coefficients of the sines and cosines. If you have N data points, you will get a system of N equations in N unknowns. The system will have a unique solution, though this is not obvious a priori.

Another approach would be to use the discrete Fourier transform (DFT). This is the approach that would commonly be used in practice. It’s even further from obvious a priori that this would work, but it does. (The DFT is so often computed using the FFT algorithm that the transform is often referred to by the algorithm name. If you’d like, mentally substitute FFT for DFT in the rest of the post.)

There are multiple ways to motivate the DFT, and the way suggested by the name is to derive the DFT as a discrete approximation to the (continuous) Fourier transform. Why should a discrete approximation to an integral transform also solve an interpolation problem? This doesn’t sound inevitable, or even plausible, but it is the case.

Another way to motivate the DFT is as the least-squares solution to fitting a sum of sines and cosines to a set of points. Since this is phrased as an optimization problem rather than an interpolation problem, it is clear that it will have a solution. However, it is not clear that the error in the optimal fit will in fact be zero. Furthermore, the equation for the coefficients in the solution is the same as the equation for the DFT. You can find a derivation in [1].

Example

Let’s take the vector [3, 1, 4, 1, 5, 9] and find trig functions that pass through these points. We can use the FFT as implemented in Python’s SciPy library to find a set of complex exponentials that pass through the points.

    from scipy.fft import fft
    from numpy import exp, array, pi, round

    x = array([3, 1, 4, 1, 5, 9])
    y = fft(x)

    N = len(x)
    z = [sum([exp(2j*pi*k*n/N)*y[k] for k in range(N)])/N for n in range(N)]

Aside from rounding errors on the order of 10⁻¹⁵ the vector z equals the vector x.

Turning the expression for z above into a mathematical expression, we have

f(z) = y₀ + y₁ exp(nπi/3) + y₂ exp(2nπi/3) + y₃ exp(nπi) + y₄ exp(4nπi/3) + y₅ exp(5nπi/3)

where the y‘s come from the FFT above.

To find sines and cosines we need to use Euler’s formula

exp(iθ) = cos(θ) + i sin(θ)

Because started with real data x, there will be symmetries in the FFT components x that simplify the reduction of the complex function f to a real-valued function g using sines and cosines; some of the components will be conjugate and so the complex parts cancel out.

6 g(x) = y₀ + (y₁ + y₅) cos(πx/3) + (y₂ + y₄) cos(2πx/3) + y₃ cos(πx)
+ i (y₁ − y₅) sin(πx/3) + (y₂ − y₄) cos(2πx/3)

and so

g(x) = 3.833 + 0.8333 cos(πx/3) − 1.833 cos(2πx/3) + 0.1666 cos(πx)
− 2.5981 sin(πx/3) − 2.0207 cos(2πx/3)

Here’s a plot that verifies that g(x) passes through the specified points.

[1] William L. Briggs and Van Emden Henson. The DFT: An Owner’s Manual for the Discrete Fourier Transform. SIAM 1995.

Moments with Laplace

Posted on 4 November 2024 by John

This is a quick note to mention a connection between two recent posts, namely today’s post about moments and post from a few days ago about the Laplace transform.

Let f(t) be a function on [0, ∞) and F(s) be the Laplace transform of f(t).

$F(s) = \int_0^\infty e^{-st} f(t) \,dt$

Then the nth moment of f,

$m_n = \int_0^\infty t^n \, f(t)\, dt$

is equal to then nth derivative of F, evaluated at 0, with an alternating sign:

$(-1)^n F^{(n)}(0) = m_n$

To see this, differentiate with respect to s inside the integral defining the Laplace transform. Each time you differentiate you pick up a factor of −t, so differentiating n times you pick up a term (−1)ⁿ tⁿ, and evaluating at s = 0 makes the exponential term go away.

The impossible puzzle

Posted on 4 November 2024 by Wayne Joubert

It’s fascinating that there’s such a thing as the World Jigsaw Puzzle Championship. The winning team of the two-person thousand-piece puzzle round can assemble a Ravensburger puzzle in less than an hour—that’s about 3 -1/2 seconds per piece.

It makes you wonder, how could you measure the hardness of a jigsaw puzzle? And what would a “hardest puzzle” look like?

You can find some puzzles that are claimed to be “hardest.” A first step to creating a hard puzzle is removing the front image entirely, eliminating visual cues. Some puzzles do this. This can be partly simulated with a normal puzzle by putting it together with the image side face down.

But you can do more: make every piece a square exactly the same size, and the “tab“ and “blank“ on each of the four sides of each square exactly the same size, shape and alignment. One can see that this gives you 2⁴ = 16 different unique kinds of puzzle piece, however, due to symmetries you actually have six different kinds of puzzle piece. Now, you are left with a bare minimum of shape cues to help you assemble the puzzle.

For the sake of the argument, one can ignore the “edge” pieces. One can see this from the fact that for a rectangular puzzle, as you scale up the size, the number of edge pieces proportional to the interior pieces of the puzzle becomes smaller and smaller. For example for a 36X28=1008 piece puzzle, there are (2*36+2*28-4)=124 edge pieces, about 12 percent of the whole. The ratio shrinks as you scale up puzzle size. And there is such a thing as a 42,000-piece jigsaw puzzle.

The solution complexity class for assembling such a puzzle is known to be NP-complete. This means that the best known algorithms require compute time that grows exponentially as the number of pieces is increased.

Of course there are simpler special cases. Consider a checkerboard configuration, in which “black square“ location pieces have four tabs and “red square” locations have four blanks, thus only two different kinds of piece. A competent puzzler could assemble this one piece per second, 17 minutes for a 1000-piece puzzle.

However, as far as we know, the number of “special cases” like this is vanishing small for large puzzles. An intuition for this can be taken from Kolmogorov complexity theory, using a counting argument to show that only very few long strings can be compressed to a short program.

Most cases are really hard, for example, constructed by making a totally random assignment for selecting the configuration of each piece-to-piece boundary. Solving this is really hard because you may have to backtrack: you may have the puzzle largely complete, but due to ambiguities, you may need to backtrack because of a wrong decision made early.

A very weak upper bound in the assembly time can be derived by considering the brute force case. For a 1000-piece puzzle, there are 1000! (factorial) possible ways to (attempt to) assemble the puzzle. Suppose one places every piece and then fully backtracks for each trial—this gives 1000 x 1000! puzzle piece placement steps. Using Stirling’s approximation, this is approximately 10⁴⁴⁶⁸ placement steps.

A human, assuming generously a rate of one step per second, at 31 million seconds/year nonstop, would take on the order of 10⁴⁴⁶⁰ years. Assuming 10 billion simultaneous puzzlers—order 10⁴⁴⁵⁰ years.

Now assume the on-record world’s fastest computer, the ORNL Frontier system (though see here), roughly 10 exaflops (10¹⁹) mixed precision, and assuming one puzzle step per operation (overly optimistic). Assume further that every atom in the known universe (est. 10⁸²) is a Frontier computer system. In this case—solving the puzzle takes order 10⁴³⁵⁹ years. That’s a thousand billion billion … (repeat “billion” 484 times here) years.

As I mentioned, this is a very weak upper bound. One could solve the problem using a SAT solver with sophisticated CDCL backtracking. Or one could use special methods derived from statistical mechanics that handle random SAT instances. However, to the best of our knowledge, the runtime still grows exponentially as a function of puzzle size. The best worst-case computational complexity for SAT solvers currently known is order O^∗(1.306995ⁿ) (see here, here, and here). So, simply by constructing puzzles in the way we’ve described with more and more pieces, you will soon reach a point of having a puzzle that is unfathomably difficult to solve.

It’s conceivable that a yet-undiscovered method could be found for which the solution cost would in fact grow polynomially instead of exponentially. But people have been trying to answer this question one way or the other for over 50 years and it’s still unsolved, with an open million dollar prize for proof or counterexample. So we just don’t know.

What about quantum computing? It can solve some exponential complexity problems like factoring in polynomial time. But how about the SAT problem? It is simply not known (see here, here) whether a polynomial-time quantum algorithm exists for arbitrary NP-complete problems. None are known, and it might not be possible at all (though research continues).

So we’re faced with the curious possibility: a jigsaw puzzle could be constructed for which, you’re holding the box of pieces in your hand, you spill it on the floor, and there is no single or joint intelligence or computational mechanism, existing or even possible, in the entire universe that could ever reassemble it.

Solving hard problems

GPS satellite orbits

Ramanujan’s master theorem

Related posts

Linear combination of sine and cosine as phase shift

Derivation

Equations

Sine with phase shift

Cosine with phase shift

Related posts

Resolving a mysterious problem with find

Related posts

The Postage Stamp Problem

Related posts

Impersonating an Edwardian math professor

Related posts

Maybe Copernicus isn’t coming

Related posts

Trigonometric interpolation

Example

Related posts

Moments with Laplace

Related posts

The impossible puzzle