Maybe Copernicus isn’t coming

Before Copernicus promoted the heliocentric model of the solar system, astronomers added epicycle on top of epicycle, creating ever more complex models of the solar system. The term epicycle is often used derisively to mean something ad hoc and unnecessarily complex.

Copernicus’ model was simpler, but it was less accurate. The increasingly complex models before Copernicus were refinements. They were not ad hoc, nor were they unnecessarily complex, if you must center your coordinate system on Earth.

It’s easy to draw the wrong conclusion from Copernicus, and from any number of other scientists who were able to greatly simplify a previous model. One could be led to believe that whenever something is too complicated, there must be a simpler approach. Sometimes there is, and sometimes there isn’t.

If there isn’t a simpler model, the time spent searching for one is wasted. If there is a simpler model, the time searching for one might still be wasted. Pursuing brute force progress might lead to a simpler model faster than pursuing a simpler model directly.

It all depends. Of course it’s wise to spend at least some time looking for a simple solution. But I think we’re fed too many stories in which the hero comes up with a simpler solution by stepping back from the problem.

Most progress comes from the kind of incremental grind that doesn’t make an inspiring story for children. And when there is a drastic simplification, that simplification usually comes after grinding on a problem, not instead of grinding on it.

3Blue1Brown touches on this in this video. The video follows two hypothetical problem solvers, Alice and Bob, who attack the same problem. Alice is the clever thinker and Bob is the calculating drudge. Alice’s solution of the original problem is certainly more elegant, and more likely to be taught in a classroom. But Bob’s approach generalizes in a way that Alice’s approach, as far as we know, does not.

Related posts

Trigonometric interpolation

Suppose you want to interpolate a set of data points with a combination of sines and cosines.

One way to approach this problem would be to set up a system of equations for the coefficients of the sines and cosines. If you have N data points, you will get a system of N equations in N unknowns. The system will have a unique solution, though this is not obvious a priori.

Another approach would be to use the discrete Fourier transform (DFT). This is the approach that would commonly be used in practice. It’s even further from obvious a priori that this would work, but it does. (The DFT is so often computed using the FFT algorithm that the transform is often referred to by the algorithm name. If you’d like, mentally substitute FFT for DFT in the rest of the post.)

There are multiple ways to motivate the DFT, and the way suggested by the name is to derive the DFT as a discrete approximation to the (continuous) Fourier transform. Why should a discrete approximation to an integral transform also solve an interpolation problem? This doesn’t sound inevitable, or even plausible, but it is the case.

Another way to motivate the DFT is as the least-squares solution to fitting a sum of sines and cosines to a set of points. Since this is phrased as an optimization problem rather than an interpolation problem, it is clear that it will have a solution. However, it is not clear that the error in the optimal fit will in fact be zero. Furthermore, the equation for the coefficients in the solution is the same as the equation for the DFT. You can find a derivation in [1].

Example

Let’s take the vector [3, 1, 4, 1, 5, 9] and find trig functions that pass through these points. We can use the FFT as implemented in Python’s SciPy library to find a set of complex exponentials that pass through the points.

    from scipy.fft import fft
    from numpy import exp, array, pi, round

    x = array([3, 1, 4, 1, 5, 9])
    y = fft(x)

    N = len(x)
    z = [sum([exp(2j*pi*k*n/N)*y[k] for k in range(N)])/N for n in range(N)]

Aside from rounding errors on the order of 10−15 the vector z equals the vector x.

Turning the expression for z above into a mathematical expression, we have

f(z) = y0 + y1 exp(nπi/3) + y2 exp(2nπi/3) + y3 exp(nπi) + y4 exp(4nπi/3) + y5 exp(5nπi/3)

where the y‘s come from the FFT above.

To find sines and cosines we need to use Euler’s formula

exp(iθ) = cos(θ) + i sin(θ)

Because started with real data x, there will be symmetries in the FFT components x that simplify the reduction of the complex function f to a real-valued function g using sines and cosines; some of the components will be conjugate and so the complex parts cancel out.

6 g(x) = y0 + (y1 + y5) cos(πx/3) + (y2 + y4) cos(2πx/3) + y3 cos(πx)
+ i (y1y5) sin(πx/3) + (y2y4) cos(2πx/3)

and so

g(x) = 3.833 + 0.8333 cos(πx/3) − 1.833 cos(2πx/3) + 0.1666 cos(πx)
− 2.5981 sin(πx/3) − 2.0207 cos(2πx/3)

Here’s a plot that verifies that g(x) passes through the specified points.

Related posts

[1] William L. Briggs and Van Emden Henson. The DFT: An Owner’s Manual for the Discrete Fourier Transform. SIAM 1995.

Moments with Laplace

This is a quick note to mention a connection between two recent posts, namely today’s post about moments and post from a few days ago about the Laplace transform.

Let f(t) be a function on [0, ∞) and F(s) be the Laplace transform of f(t).

F(s) = \int_0^\infty e^{-st} f(t) \,dt

Then the nth moment of f,

m_n = \int_0^\infty t^n \, f(t)\, dt

is equal to then nth derivative of F, evaluated at 0, with an alternating sign:

(-1)^n F^{(n)}(0) = m_n

To see this, differentiate with respect to s inside the integral defining the Laplace transform. Each time you differentiate you pick up a factor of −t, so differentiating n times you pick up a term (−1)n tn, and evaluating at s = 0 makes the exponential term go away.

Related posts

When do moments determine a function?

Two girls playing on a seesaw

The use of the word “moment” in mathematics is related to its use in physics, as in moment arm or moment of inertia. For a non-negative integer n, the nth moment of a function f is the integral of xn f(x) over the function’s domain.

Uniqueness

If two continuous functions f and g have all the same moments, are they the same function? The answer is yes for functions over a finite interval, but no for functions over an unbounded interval.

Existence

Now let’s consider starting with a set of moments rather than starting with a function. Given a set of moments m0, m1, m2, … is there a function that has these moments? Typically no.

A better question is what conditions on the moments are necessary for there to exist a function with these moments. This question breaks into three questions

  1. The Hausdorff moment problem
  2. The Stieltjes moment problem
  3. The Hamburger moment problem

depending on whether the function domain is a finite interval, a half-bounded interval, or the real line. For each problem there are known conditions that are necessary and sufficient, but the conditions are different for each problem.

Interestingly, each of the three names Hausdorff, Stieltjes, and Hamburger are well known. Felix Hausdorff is best known for his work in topology: Hausdorff spaces, etc. Thomas Stieltjes is best known for the Riemann-Stieltjes integral, and for his work on continued fractions. Hans Ludwig Hamburger is not as well known, though his last name is certainly familiar.

Finite moments

A practical question in probability is how well a finite number of moments determine a probability distribution. They cannot uniquely determine the distribution, but the do establish bounds for how different the two distributions can be. See this post.

Related posts

Band-limited expansion

The band-limited expansion of the function f(x) is given by

f(x) = \sum_{k=-\infty}^\infty f(kh) \, \text{sinc}\left(\frac{x - kh}{h}\right)
where sinc(x) = sin(πx)/πx. This is also called the sinc expansion, or the Whittaker cardinal after its discoverer E. T. Whittaker [1].

This is called the band-limited expansion of f because each term in the infinite sum is band-limited, i.e. only has Fourier spectrum within a finite band, because the Fourier transform of the sinc function is a step function supported between −1/2 and 1/2. [2]

The band-limited expansion has a lot of nice mathematical properties, leading Whittaker to call it “a function of royal blood in the family of entire functions, whose distinguished properties separate it from its bourgeois brethren.”

We can find a band-limited approximation for f by taking only a finite number of terms in the sum. An advantage of the band-limited approximation over a truncated Fourier series is that the former converges faster, making it useful in numerical algorithms [3]. Here’s an example of approximating the function exp(−x²) by taking h = 1 and using three terms, i.e. k running from −1 to 1.

You can improve the accuracy of the approximation by decreasing the size of h or by increasing N. This post explains how to pick the trade-off between h and N to minimize approximation error.

Related posts

[1] E. T. Whittaker, On the functions which are represented by the expansions of the interpolation theory, Proc. Roy. Soc. Edinburgh, 35 (1915), pp. 181–194.

[2] You may get a different interval of support if you use a different convention for defining the Fourier transform. Unfortunately there are many conventions.

[3] Frank Stenger. Numerical Methods based on Whittaker Cardinal, or Sinc Functions. Source: SIAM Review, Apr., 1981, Vol. 23, No. 2 (Apr., 1981), pp. 165-224

Delay differential equations

Sometimes the future state of a system depends not only on the current state (position, velocity, acceleration, etc.) but also on the previous state. Equations for modeling such systems are known as delay differential equations (DDEs), difference differential equations, retarded equations, etc. In a system with hysteresis, it matters not only where you are but how you got there.

The most basic theory of delay differential equations is fairly simple. Suppose you have an equation like the following.

u′(t) + u(t − ω) = f(t).

To uniquely determine a solution, you’d need an initial condition. And we’d need more than the value of u(0). We’d need a function g(t) that give the value of u on the entire interval [0, ω].

So we initially have the value of u over [0, ω]. Next, over the interval [ω, 2ω] the value of u(t − ω) is known. We could replace that term in the DDE with g(t), And after we’ve solved our equation over [ω, 2ω], we can use the solution to solve the equation over [2ω, 3ω]. This process is called the method of steps.

Although you can solve DDEs using the method of steps, this might not be the best approach. It might be more computationally efficient, or theoretically convenient, to use another method to solve such equations, such as Laplace transforms. The method of transforms might convince you that a solution exists, but it might not, for example, be the best way to determine the limiting behavior of solutions.

 

Laplace transform inversion theorems

The way Laplace transforms, as presented in a typical differential equation course, are not very useful. Laplace transforms are useful, but not as presented.

The use of Laplace transforms is presented is as follows:

  1. Transform your differential equation into an algebraic equation.
  2. Solve the algebraic equation.
  3. Invert the transform to obtain your solution.

This is correct, but step 3 is typically presented in a misleading way. For pedagogical reasons, students are only given problems for which the last step is easy. They’re given a table with functions on the left and transforms on the right, and you compute an inverse transform by recognizing the result of step 2 in the right column.

Because of the limitations listed above, Laplace transforms, as presented in an introductory course, can only solve problems that could just as easily be solved by other methods presented in the same course.

What good is it, in an undergraduate classroom setting, if you reduce a problem to inverting a Laplace transform but the inverse problem doesn’t have a simple solution?

Of course in practice, rather than in a classroom, it might be very useful to reduce a complicated problem to the problem of inverting a Laplace transform. The latter problem may not be trivial, but it’s a standard problem. You could ask someone to solve the inversion problem who does not understand where the transform of the solution came from.

Laplace inversion theorem

The most well-known Laplace inversion theorem states that if f is a function and F is the Laplace transform of f, then you can recover f from F via the following integral.

f(x) = \frac{1}{2\pi i} \int_{c -\infty i}^{c + \infty i} \exp(sx) \, F(s)\, ds

It’s understandable that you wouldn’t want to present this to most differential equation students. It’s not even clear what the right hand side means, much less how you would calculate it. As for what it means, it says you can calculate the integral along any line parallel to the imaginary axis. In practice, the integral may be evaluated using contour integration, in particular using the so-called Bromwich contour.

It might be difficult to invert the Laplace transform, either numerically or analytically, but at least this is a separate problem from whatever led to this. Maybe the original problem was more difficult, such as a complicated delay differential equation.

Post-Widder theorem

There is a lesser-known theorem for inverting a Laplace transform, the Post-Widder theorem. It says

f(x) = \lim_{n\to\infty} \frac{(-1)^n}{n!} \left( \frac{n}{x} \right)^{n+1} F^{(n)}\left( \frac{n}{x} \right)

where F(n) is the nth derivative of F. This may not be an improvement—it might be much worse than evaluating the integral above—but it’s an option. It doesn’t involve functions of a complex variable, so in that sense it is more elementary [1].

Related posts

[1] The use of the word elementary in mathematics can be puzzling. Particularly in the context of number theory, elementary essentially means “without using complex variables.” An elementary proof may be far more difficult to follow than a proof using complex variables.

Mellin transform and Riemann zeta

The Mellin transform of a function f is defined as

{\cal M}(f)(s) = \int_0^\infty x^{s-1} f(x)\, dx

For example, it follows directly from the definition that the gamma function Γ(s) is the Mellin transform of the function ex.

I ran across an exercise that states an impressive-looking theorem about the Mellin transform, namely that

{\cal M}\left( \sum_{k=1}^\infty f(kx) \right) = \zeta(s) F(s)

where F(s) denotes the Mellin transform of f(x).

I suppose the theorem looks impressive, at least in part, because it involves the Riemann zeta function. But although the zeta function holds deep mysteries—a conjecture about the location of its zeros may be the most famous open question in mathematics—not everything involving the zeta function is deep. The zeta function has, for example, come up here many times in the context of fairly simple probability problems.

I don’t know whether the theorem above is important. It may be. It resembles the sampling theorem, an important theorem in signal processing, and maybe it has analogous applications. But in any case it is not difficult to prove. A simple change of variables applied to the definition shows that

{\cal M}\left(f(ax)\right) = a^{-s} F(s)

The proof using the equation above is simply this:

{\cal M}\left( \sum_{k=1}^\infty f(kx) \right) = \sum_{k=1}^\infty {\cal M}\left(f(kx)\right) = \sum_{k=1}^\infty k^{-s} F(s) = \zeta(s) F(s)

Related posts

Sawtooth waves

I woke up around 3:00 this morning to some sort of alarm outside. It did not sound like a car alarm; it sounded like a sawtooth wave. The pattern was like a few Morse code O’s. Not SOS, or I would have gotten up to see if anyone needed help. Just O’s.

A sawtooth wave takes its name from the shape of its waveform: it looks like the edge of a saw. It also sounds a little jagged.

Sawtooth waves have come up several times here. For one thing, they have rich harmonics. Because the wave form is discontinuous, the Fourier coefficients decay to zero slowly. I wrote about that here. The post is about square waves and triangular waves, but sawtooth waves are very similar.

Here’s a post oscillators with a sawtooth forcing function.

I took sawtooth functions in a different direction in this post that started with an exercise from Knuth’s TAOCP. This led me down a rabbit hole on replicative functions and multiplication theorems in different contexts.

If I remember correctly the sound used for red alterts in Star Trek TOS started with a sawtooth wave. Early synthesizers had sawtooth generators because, as mentioned above, these waves are rich in overtones and can be manipulated to create interesting sounds such as the red alert sound.

Pioneering work is ugly

“A mathematician’s reputation rests on the number of bad proofs he has given. (Pioneer work is clumsy.)” — A. S. Besicovitch

I’m sure I’ve written about this quote somewhere, but I can’t find where. The quote comes from A Mathematician’s Miscellany by J. E. Littlewood, citing Besicovitch.

I’ve more often seen the quote concluding with “Pioneering work is ugly.” Maybe that’s what Besicovitch actually said, but I suspect it came from someone misremembering/improving Littlewood’s citation. Since the words are in parentheses, perhaps Besicovitch didn’t say them at all but Littlewood added them as commentary.

One way of interpreting the quote is to say it takes more creativity to produce a rough draft than to edit it.

The quote came to mind when I was talking to a colleague about the value of ugly code, code that is either used once or that serves as a prototype for something more lasting.

This is nearly the opposite of the attitude I had as a software developer and as a software team manager. But it all depends on context. Software techniques need to scale down as well as scale up. It doesn’t make sense to apply the same formality to disposable code and to enterprise software.

Yes, supposedly disposable code can become permanent. And as soon as someone realizes that disposable code isn’t being disposed of it should be tidied up. But to write every little one-liner as if it is going to be preserved for posterity is absurd.