Simple error function approximation

I recently ran across the fact that

\int_0^x \exp(-t^2)\, dt \approx \sin(\sin(x)

is a remarkably good approximation for −1 ≤ x ≤ 1.

Since the integral above defines the error function erf(x), modulo a constant, this says we have a good approximation for the error function

\text{erf}(x) \approx \frac{2}{\sqrt{\pi}} \sin( \sin(x) )

again provided −1 ≤ x ≤ 1.

The error function is closely related to the Gaussian integral, i.e. the normal probability distribution CDF Φ. The relation between erf and Φ is simple but error-prone. I wrote up some a page notes for myself a few years ago so I wouldn’t make a mistake again moving between these functions and their inverses.

You can derive the approximation by writing out the power series for exp(t), substituting −t² for t, and integrating term-by-term from 0 to x. You’ll see that the result is the same as the power series for sine until you get to the x5 term, so the error is on the order of x5. Here’s a plot of the error.

The error is extremely small near 0, which is what you’d expect since the error is on the order of x5.

Pressing the cosine key over and over

No matter what number you start with, if you press the cos key on a calculator repeatedly, the numbers eventually quit changing. This fact has been rediscovered by countless children playing with calculators.

If you start with 0, which is likely the default when you turn on a calculator, you’ll hit the final value after 90 steps [1]. You could verify this with the following Python code.

    from math import cos

    x = 0
    for i in range(100):
        x = cos(x)
        print(i, x) 

Starting with iteration 90, the code prints 0.7390851332151607 every time.

Visualizing convergence

Let’s visualize the sequence of values using a cobweb plot. Here’s an image I made for a post four years ago, starting with x = 1.

cobweb plot for iterating cosine

To read the graph, start at x = 1 on the horizontal axis. The solid black line is a plot of the cosine function, and so the point above 1 where the blue line starts is y = cos(1) = 0.54.

Since we’re going to stick our output back into cosine as input, we need to turn our y into an x. We do this by sliding the point over to the dotted line representing y = x. This creates the horizontal blue line that is the first segment of our spiral. This takes us to the point (cos(1), cos(1)).

Now when we take the cosine again, we get cos(cos(1)) = 0.86. Now we move from our previous value of y = 0.54 to our new value of y = 0.86. This gives the next segment of our spiral. Again we need to turn our y into an x, so we slide over to the line y = x as before, only this time we’re approaching from the left side rather than from the right.

We quickly get to where we can no longer see the convergence. The plot above used 20 iterations, and we end up with a blue blob around the point of convergence.

Proving convergence

We said that the iteration converges for any starting point. Why is that?

For one thing, we might as well assume x is between 0 and 1; if not, it will be after one iteration.

The mean value theorem says for any pair x1 and x2 in [0, 1],

cos(x1) − cos(x2) = − sin(c) (x1x2)

for some c in [0, 1] because the derivative of cos(x) is − sin(x). It follows that

| cos(x1) − cos(x2) | ≤ sin(1) | x1x2 |

because the maximum value of sin(x) on [0, 1] is sin(1). Since sin(1) = 0.84… is less than 1, this says that cosine is a contraction mapping on [0, 1] and so there is a fixed point p such that cos(p) = p.

Rate of convergence

We could use this to calculate an upper bound on how far x is from the fixed point p after k iterations:

| xp | ≤ sin(1)k

and so if we want to be within a distance ε of the fixed point, we need

k ≥ log(ε) / log(sin(1)).

This says that to get withing floating point precision (about 10−16) of the fixed point, 214 iterations will do. This is true, but it’s a pessimistic estimate because it’s based on a pessimistic estimate of sin(c).

Once we get close to the fixed point p, the rate of convergence is more like sin(p)k than sin(1)k. This suggests we should be within floating point precision of the fixed point after about 90 iterations, which is what we saw at the top of the post.

More fixed point posts

[1] This is true if your calculator is using 64-bit floating point numbers. Your mileage may vary if you have a calculator that retains more or less precision.

Perpetual Calendars

The previous post explained why the Gregorian calendar is the way it is, and that it consists of a whole number of weeks. It follows that the Gregorian calendar repeats itself every 400 years. For example, the calendar for 2025 will be exactly the same as the calendar for 1625 and 2425.

There are only 14 possible printed calendars, if you don’t print the year on the calendar. There are seven possibilities for the day of the week for New Years Day, and there are two possibilities for whether the year is a leap year.

A perpetual calendar is a set of the 14 possible calendars, along with some index that tells which possible calendar is appropriate in a given year.

Are each of the 14 calendars equally frequent? No, because only in 97 out of 400 years should the calendar include a leap day.

Are each of the non-leap year calendars equally frequent? No, because there are 303 non-leap years, and 303 is not divisible by 7. What about the leap year calendars? Again no, because 97 is not divisible by 7 either.

Now we might expect that within ordinary years, or within leap years, each potential calendar is used about the same number of times. But is that true? Here’s some Python code to answer that question.

ordinary_counts = [0]*7
leap_counts     = [0]*7

y = 0 # year mod 400
leap_counts[0] = 1

d = 0 # days since January 1 at beginning of cycle

def leap(y):
    return y % 4 == 0 and (y % 100 != 0 or y % 400 == 0)

for y in range(1, 400):
    d += y*365
    # whether previous year was a leap year
    if leap(y - 1):
        d += 1
    if leap(y):
        leap_counts[d % 7] += 1
    else:
        ordinary_counts[d % 7] += 1
    
print(ordinary_counts)
print(leap_counts)

Here’s the output:

[45, 43, 44, 40, 45, 34, 52]
[10, 11, 17, 16, 23, 13, 7]

The calendar types are not so evenly distributed in frequency. Within leaps years, the most common calendar type is more than three times as common as the least common type.

Here are bar charts of the frequencies. The charts start with Saturday because January 1, 2000 was a Saturday.
First, ordinary years:

Then, leap years:

Related posts

Gregorian Calendar and Number Theory

The time it takes for the earth to orbit the sun is not an integer multiple of the time it takes for the earth to rotate on its axis, nor is it a rational number with a small denominator. Why should it be? Much of the complexity of our calendar can be explained by rational approximations to an irrational number.

Rational approximation

The ratio is of course approximately 365. A better approximation is 365.25, but that’s not right either. A still better approximation would be 365.2422.

A slightly less accurate, but more convenient, approximation is 365.2425. Why is that more convenient? Because 0.2425 = 97/400, and 400 is a convenient number to work with.

A calendar based on a year consisting of an average of 365.2425 days would have a 365 days most years, with 97 out of 400 years having 366 days.

In order to spread 97 longer years among the cycle of 400 years, you could insert an extra day every four years, but make three exceptions, such as years that are divisible by 100 but not by 400. That’s the Gregorian calendar that we use.

It’s predecessor, the Julian calendar, had an average year of 365.25 days, which was good enough for a while, but the errors began to accumulate to the point that the seasons were drifting noticeably with respect to the calendar.

Not much room for improvement

It would be possible to create a calendar with an even more accurate average year length, but at the cost of more complexity. Even so, such a calendar wouldn’t be much more accurate. After all, even the number we’ve been trying to approximate, 365.2422 isn’t entirely accurate.

The ratio of the time of the earth’s orbit to the time of its rotation isn’t even entirely constant. The Gregorian calendar is off by about 1 day in 3030 years, but the length of the year varies by about 1 day in 7700 years.

I don’t know how accurately the length of the solar year was known when the Gregorian calendar was designed over four centuries ago. Maybe the error in the calendar was less than the uncertainty in the length of the solar year.

Days of the week

Four centuries of the Gregorian calendar contain 146097 days, which is a multiple of 7. This seems to be a happy coincidence. There was no discussion of weeks in derivation above.

The implicit optimization criteria in the design of the calendar were minimizing the discrepancy between the lengths of the average calendar year and the solar year, minimizing the length of the calendar cycle, and using a cycle length that is a round number. It’s plausible that there was no design goal of making the calendar cycle an integer number of weeks.

Related posts

Golden hospital gowns

Here’s something I posted on X a couple days ago:

There’s no direct connection between AI and cryptocurrency, but they have a similar vibe.

They both leave you wondering whether the emperor is sumptuously clothed, naked, or a mix of both.

Maybe he’s wearing a hospital gown with gold threads.

In case you’re unfamiliar with the story, this is an allusion to The Emperor’s New Clothes, one of the most important stories in literature.

I propose golden hospital gown as a metaphor for things that are a fascinating mixture of good and bad, things that have large numbers of haters and fanboys, both with valid points. There’s continual improvement and a lot work to be done sorting out what works well and what does not.

I tried to get Grok to create an image of what I had in mind by a golden hospital gown. The results were not what I wanted, but passable, which is kinda the point of the comment above. It’s amazing that AI can produce anything remotely resembling a desired image starting from a text description. But there is a very strong temptation to settle for mediocre and vaguely creepy images that aren’t what we really want.

Man in a yellow hospital gown with hairy back and legs exposed

Related posts

LLMs and regular expressions

Yesterday I needed to write a regular expression as part of a client report. Later I was curious whether an LLM could have generated an equivalent expression.

When I started writing the prompt, I realized it wasn’t trivial to tell the LLM what I wanted. I needed some way to describe the pattern that the expression should match.

“Hmm, what’s the easiest way to describe a text pattern? I know: use a regular expression! Oh wait, …”

Prompt engineering and results

I described the pattern in words, which was more difficult than writing the regular expression, and the LLM came up with a valid regular expression, and sample code for demonstrating the use of the expression, but the expression wasn’t quite right. After a couple more nudges I managed to get it to produce a correct regex.

I had asked for a Perl regular expression, and the LLM did generate syntactically correct Perl [1], both for the regex and the sample code. When I asked it to convert the regex to POSIX form it did so, and when I asked it to convert the regex to Python it did that as well, replete with valid test code.

I repeated my experiment using three different LLMs and got similar results. In all cases, the hardest part was specifying what I wanted. Sometimes the output was correct given what I asked for but not what I intended, a common experience since the dawn of computers. It was easier to translate a regex from one syntax flavor to another than to generate a correct regex, easier for both me and the computer: it was easier for me to generate a prompt and the LLM did a better job.

Quality assurance for LLMs

Regular expressions and LLMs are complementary. The downside of regular expressions is that you have to specify exactly what you want. The upside is that you can specify exactly what you want. We’ve had several projects lately in which we tested the output of a client’s model using regular expressions and found problems. Sometimes it takes a low-tech tool to find problems with a high-tech tool.

We’ve also tested LLMs using a different LLM. That has been useful because there’s some degree of independence. But we’ve gotten better results using regular expressions since there is a greater degree of independence.

Related posts

[1] Admittedly that’s a low bar. There’s an old joke that Perl was created by banging on a keyboard then hacking on the compiler until the input compiled.

Coiled logarithmic graph

A logarithmic scale is very useful when you need to plot data over an extremely wide range. However, sometimes even a logarithmic scale may not reduce the visual range enough.

I recently saw a timeline-like graph that was coiled into a spiral, packing more information into a limited visual window [1].

I got to thinking about when this could be useful. This raises two questions.

  1. When might you want to visualize something that grows faster than exponentially?
  2. How would this compare to the radial growth of a spiral as a function of arc length?

Let’s address the second question first. What exactly do we mean by spiral? Archemedian spirals are a class of spirals that include what many people think of as a spiral. These spirals have polar equation

r = b θ1/n

where n is a constant. The choice n = 1 corresponds to spirals with evenly spaced arms, such as a roll of carpet.

When n = 1, the length of the spiral for θ running from 0 to T is T on the order of T² when T is large, as shown in this post. For general n, the length is on the order of T1 + 1/n.

If n = 1 and the distance of a spiral from the origin grows linearly as a function of θ, the arc length is growing quadratically. If the logarithm is growing quadratically, the function itself must be something like exp(kθ²).

So now back to the first question. What is something that grows super exponentially that we might want to plot? The first thing that came to my mind was factorials. Stirling’s approximation shows that the logarithm of factorial grows faster than linearly but slower than any power with exponent larger than 1.

log(x!) = x log xx + O(log x).

and so if we plot x! on a coiled logarithmic scale, the distance from the image of a point to the origin will grow less than linearly, even if we allow the spiral parameter n to be larger than 1. But for a limited range of x, a coiled logarithmic plot works well. Here’s a polar plot of log(x!)/10.

I don’t know of a natural example of something that grows like exp(kθ²). Of course you could always construct something to have this growth, but I can’t think of a common algorithm, for example, whose run time is the exponential of a quadratic. If you know a good example, please a comment.

[1] I can trace the provenance of the image to here, but that page doesn’t say where the image came from.

Solution to a problem of Erdős

How many ways can you select six points in the plane so that every subset of three points forms the vertices of an isosceles triangle? This is a question asked by Erdős and recently resolved.

One solution is to choose the five vertices of a regular pentagon and the center.

Erdos pentagon

It’s easy to verify that this is a solution. The much harder part is to show that this is the only solution.

A uniqueness proof was published on arXiv last Friday: A note on Erdős’s mysterious remark by Zoltán Kovács.

More Erdős-related posts

Multiple angles and Osborn’s rule

This post was motivated by an exercise in [1] that says

Prove that for the hyperbolic functions … formulas hold similar to those in Section 2.3 with all the minuses replaced by pluses.

My first thought was that this sounds like Osborn’s rule, a heuristic for translating between (circular) trig identities and hyperbolic trig identities. As explained in that post, Osborn’s rule is an easy consequence of Euler’s identity.

Now what are the formulas the exercise refers to?

Sine to hyperbolic sine

Here’s the identity for sine.

\sin n\theta = \binom{n}{1} \cos^{n-1}\theta \sin\theta - \binom{n}{3} \cos^{n-3}\theta \sin^3\theta + \binom{n}{5} \cos^{n-5}\theta \sin^5\theta -\cdots

Osborn’s rule says to change sin to sinh and cos to cosh, and flip signs whenever two sinh terms are multiplied together. The term with sin³ θ loses its minus sign because two sines are multiplied together. The term with sin5 θ changes sign twice, and so the net result is that it doesn’t change sign. So we have the following.

\sinh n\theta = \binom{n}{1} \cosh^{n-1}\theta \sinh\theta + \binom{n}{3} \cosh^{n-3}\theta \sinh^3\theta + \binom{n}{5} \cosh^{n-5}\theta \sinh^5\theta + \cdots

Cosine to hyperbolic cosine

The cosine identity

\cos n\theta = \cos^n \theta - \binom{n}{2} \cos^{n-2}\theta \sin^2 \theta + \binom{n}{4} \cos^{n-4} \theta \sin^4\theta - \cdots

becomes

\cosh n\theta = \cosh^n \theta - \binom{n}{2} \cosh^{n-2}\theta \sinh^2 \theta + \binom{n}{4} \cosh^{n-4} \theta \sinh^4\theta - \cdots

by similar reasoning.

Tangent to hyperbolic tangent

Osborn’s rule applies to tan and tanh as well, if you imagine each tangent as sin/cos.

Thus

\tan n\theta = \frac{\binom{n}{1} \tan \theta - \binom{n}{3} \tan^3 \theta + \binom{n}{5} \tan^5 \theta - \cdots}{1 - \binom{n}{2} \tan^2 \theta + \binom{n}{4} \tan^4\theta - \cdots}

becomes

\tanh n\theta = \frac{\binom{n}{1} \tanh \theta + \binom{n}{3} \tanh^3 \theta + \binom{n}{5} \tanh^5 \theta + \cdots}{1 - \binom{n}{2} \tanh^2 \theta + \binom{n}{4} \tanh^4\theta + \cdots}

Related posts

[1] Dmitry Fuchs and Serge Tabachnikov. Mathematical Omnibus: Thirty Lectures on Classical Mathematics.

The vis-viva equation

The vis-viva equation greatly simplifies some calculations in orbital mechanics. It is reminiscent of how conservation of energy can sometimes trivialize what appears to be a complicated problem. In fact, the vis-viva equation is derived from conservation of energy, but the derivation is not trivial. Which is good: the effort required in the derivation implies the equation is a shortcut to a place that might take longer to arrive at starting from first principles.

The term vis viva is Latin for “life force” and was applied to mechanics by Liebnitz around 350 years ago. The vis-viva equation is also known as the vis-viva law or the vis-viva integral.

The vis-viva equation says that for an object orbiting another object in a Keplerian orbit

v^2 = \mu\left(\frac{2}{r} - \frac{1}{a}\right)

Here v is the relative velocity of the two bodies, r is the distance between their centers of mass, and a is the semi-major axis of the orbit. The constant μ is the standard gravitational parameter. It equals the product of the gravitational constant G and the combined mass M of the two bodies.

In practice it is often accurate enough to let M be the mass of the larger object; this works for GPS satellites circling the earth but would not be adequate to describe Charon orbiting Pluto.

For a circular orbit, r = a and so v² = μ/r. Then r is constant and v is constant.

More generally a is constant but r is continually changing, and the vis-viva equation says how r and v relate at any given instant. This shows, for example, that velocity is smallest when distance is greatest.

For an object to escape the orbit of another, the ellipse of its orbit has to become infinitely large, i.e. a → ∞. This says that if v is escape velocity, v² = 2μ/r. For a rocket on the surface of a planet, r is the radius of the planet. But for an object already in a high orbit, escape velocity is less because r is larger.

Related posts