Simple error function approximation

I recently ran across the fact that

\int_0^x \exp(-t^2)\, dt \approx \sin(\sin(x))

is a remarkably good approximation for −1 ≤ x ≤ 1.

Since the integral above defines the error function erf(x), modulo a constant, this says we have a good approximation for the error function

\text{erf}(x) \approx \frac{2}{\sqrt{\pi}} \sin( \sin(x) )

again provided −1 ≤ x ≤ 1.

The error function is closely related to the Gaussian integral, i.e. the normal probability distribution CDF Φ. The relation between erf and Φ is simple but error-prone. I wrote up some a page notes for myself a few years ago so I wouldn’t make a mistake again moving between these functions and their inverses.

Update: This post makes the connection to probability explicit.

You can derive the approximation by writing out the power series for exp(t), substituting −t² for t, and integrating term-by-term from 0 to x. You’ll see that the result is the same as the power series for sine until you get to the x5 term, so the error is on the order of x5. Here’s a plot of the error.

The error is extremely small near 0, which is what you’d expect since the error is on the order of x5.

A better integral for the normal distribution

For a standard normal random variable Z, the probability that Z exceeds some cutoff z is given by

\mbox{Prob}(Z \geq z) = Q(z) = \frac{1}{\sqrt{2\pi}} \int_z^\infty \exp(-x^2/2)\, dx

If you wanted to compute this probability numerically, you could obviously evaluate its defining integral numerically. But as is often the case in numerical analysis, the most obvious approach is not the best approach. The range of integration is unbounded and it varies with the argument.

J. W. Craig [1] came up with a better integral representation, better from the perspective of numerical integration. The integration is always over the same finite interval, with the argument appearing inside the integrand. The integrand is smooth and bounded, well suited to numerical integration.

For positive z, Craig’s integer representation is

Q(z) = \frac{1}{\pi} \int_0^{\pi/2} \exp\left( -\frac{z^2}{2\sin^2 \theta} \right) \, d\theta

Illustration

To show that the Craig’s integral is easy to integrate numerically, we’ll evaluate it using Gaussian quadrature with only 10 integration points.

    from numpy import sin, exp, pi
    from scipy import integrate
    from scipy.stats import norm

    for x in [0.5, 2, 5]:
        q, _ = integrate.fixed_quad(
            lambda t: exp(-x**2 / (2*sin(t)**2))/pi,
            0.0, pi/2, n=10)
        print(q, norm.sf(x))

(SciPy uses sf (“survival function”) for the CCDF. More on that here.)

The code above produces the following.

    0.30858301 0.30853754
    0.02274966 0.02275013
    2.86638437e-07 2.86651572e-07

So with 10 integration points, we get four correct figures. And the accuracy seems to be consistent for small, medium, and large values of x. (Five standard deviations is pretty far out in the tail of a normal distribution, as evidenced by the small value of the integral.)

Related posts

[1] J. W. Craig, A new, simple and exact result for calculating the probability of error for two-dimensional signal constellations, in TEEE MILCOM’91 Conf. Rec., Boston, MA (1991) рр. 25.2.1-25.5.5.

Too clever Monte Carlo

One way to find the volume of a sphere would be to imagine the sphere in a box, randomly select points in the box, and count how many of these points fall inside the sphere. In principle this would work in any dimension.

The problem with naive Monte Carlo

We could write a program to estimate the volume of a high-dimensional sphere this way. But there’s a problem: very few random samples will fall in the sphere. The ratio of the volume of a sphere to the volume of a box it fits in goes to zero as the dimension increases. We might take a large number of samples and none of them fall inside the sphere. In this case we’d estimate the volume as zero. This estimate would have small absolute error, but 100% relative error.

A more clever approach

So instead of actually writing a program to randomly sample a high dimensional cube, let’s imagine that we did. Instead of doing a big Monte Carlo study, we could be clever and use theory.

Let n be our dimension. We want to draw uniform random samples from [−1, 1]n and see whether they land inside the unit sphere. So we’d draw n random samples from [−1, 1] and see whether the sum of their squares is less than or equal to 1.

Let Xi be a uniform random variable on [−1, 1]. We want to know the probability that

X1² + X2² + X3² + … + Xn² ≤ 1.

This would be an ugly calculation, but since we’re primarily interested in the case of large n, we can approximate the sum using the central limit theorem (CLT). We can show, using the transformation theorem, that each Xi² has mean 1/3 and variance 4/45. The CLT says that the sum has approximately the distribution of a normal random variable with mean n/3 and variance 4n/45.

Too clever by half

The approach above turns out to be a bad idea, though it’s not obvious why.

The CLT does provide a good approximation of the sum above, near the mean. But we have a sum with mean n/3, with n large, and we’re asking for the probability that the sum is less than 1. In other words, we’re asking for the probability in the tail where the CLT approximation error is a bad (relative) fit. More on this here.

This post turned out to not be about what I thought it would be about. I thought this post would lead to a asymptotic approximation for the volume of an n-dimensional sphere. I would compare the approximation to the exact value and see how well it did. Except it did terribly. So instead, this post a cautionary tale about remembering how convergence works in the CLT.

Related posts

Pairs in poker

An article by Y. L. Cheung [1] gives reasons why poker is usually played with five cards. The author gives several reasons, but here I’ll just look at one reason: pairs don’t act like you might expect if you have more than five cards.

In five-card poker, the more pairs the better. Better here means less likely. One pair is better than no pair, and two pairs is better than one pair. But in six-card or seven-card poker, a hand with no pair is less likely than a hand with one pair.

For a five-card hand, the probabilities of 0, 1, or 2 pair are 0.5012, 0.4226, and 0.0475 respectively.

For a six-card hand, the same probabilities are 0.3431, 0.4855, and 0.1214.

For a seven-card hand, the probabilities are 0.2091, 0.4728, and 0.2216.

Related posts

[1] Y. L. Cheung. Why Poker Is Played with Five Cards. The Mathematical Gazette, Dec., 1989, Vol. 73, No. 466 (Dec., 1989), pp. 313–315

Probability resources

Each Wednesday I post a list of notes on some topic. This week it’s probability.

See also posts tagged probability and statistics.

Last week: Python resources

Next week: Regular expression resources