Discrete Taylor series

Posted on 14 January 2025 by John

Newton’s interpolation formula looks awfully complicated until you introduce the right notation. With the right notation, it looks like a Taylor series. Not only is this notation simpler and more memorable, it also suggests extensions.

The notation we need comes in two parts. First, we need the forward difference operator Δ defined by

$\Delta f(x) = f(x+1) - f(x)$

and its extension Δ^k defined by applying Δ to a function k times.

The other piece of notation we need is falling powers, denoted with a little bar under the exponent:

$x^{\underline{k}} = x(x-1)(x-2) \cdots (x - k + 1)$

The Δ operator is meant to resemble the differential operator D and falling powers are meant to resemble ordinary powers. With this notation, Newton’s interpolation formula based at a

$f(x) = \sum_{k=0}^\infty \frac{\Delta^k f(a)}{k!} (x - a)^{\underline{k}}$

looks analogous to Taylor’s formula for a power series based at a

$f(x) = \sum_{k=0}^\infty \frac{D^k f(a)}{k!} (x - a)^k.$

Newton’s formula applies to polynomials, and the infinite sum is actually a finite sum because Δ^k f(a) is 0 for all k beyond some point.

Newton’s formula is a discrete analog of Taylor’s formula because it only uses the values of f at discrete points, i.e. at the integers (shifted by a) and only involves finite operations: finite differences do not involve limits.

Convergence

As I mentioned on @AlgebraFact this morning,

It’s often useful to write a finite series as an infinite series with a finite number of non-zero coefficients.

This eliminates the need to explicitly track the number of terms, and may suggest what to do next.

Writing Newton’s formula as an infinite series keeps us from having to write down one version for linear interpolation, another version for quadratic interpolation, another for cubic interpolation, etc. (It’s a good exercise to write out these special cases when you’re new to the topic, but then remember the infinite series version going forward.)

As for suggesting what to do next, it’s natural to explore what happens if the infinite series really is infinite, i.e. if f is not a polynomial. Under what circumstances does the series converge? If it does converge to something, does it necessarily converge to f(x) at each x?

The example f(x) = sin(πx) shows that Newton’s theorem can’t always hold, because for this function, with a = 0, the series on right hand side of Newton’s theorem is identically zero because all the Δ^k terms are zero. But Carlson’s theorem [1] essentially says that for an entire function that grows slower than sin(πx) along the imaginary axis the series in Newton’s theorem converges to f.

Saying that a function is “entire” means that it is analytic in the entire complex plane. This means that Taylor’s series above has to converge everywhere in order for Newton’s series to converge [2].

[1] Carlson with no e. Not to be confused with Carleson’s theorem on the convergence of Fourier series.

[2] Carlson’s original theorem requires f to be entire. Later refinements show that it’s sufficient for f to be analytic in the open right half plane and continuous on the closed right half plane.

Interpolation and the cotanc function

Posted on 19 August 2024 by John

This weekend I wrote three posts related to interpolation:

The first post looks at reducing the size of mathematical tables by switching for linear to quadratic interpolation. The immediate application is obsolete, but the principles apply to contemporary problems.

The second post looks at alternatives to Lagrange interpolation that are much better suited to hand calculation. The final post is a tangent off the middle post.

Tau and sigma functions

In the process of writing the posts above, I looked at Chambers Six Figure Mathematical Tables from 1964. There I saw a couple curious functions I hadn’t run into before, functions the author called τ and σ.

τ(x) = x cot x
σ(x) = x csc x

So why introduce these two functions? The cotangent and cosecant functions have a singularity at 0, and so its difficult to tabulate and interpolate these functions. I touched on something similar in my recent post on interpolating the gamma function: because the function grows rapidly, linear interpolation gives bad results. Interpolating the log of the gamma function gives much better results.

Chambers tabulates τ(x) and σ(x) because these functions are easy to interpolate.

The cotanc function

I’ll refer to Chambers’ τ function as the cotanc function. This is a whimsical choice, not a name used anywhere else as far as I know. The reason for the name is as follows. The sinc function

sinc(x) = sin(x)/x

comes up frequently in signal processing, largely because it’s the Fourier transform of the indicator function of an interval. There are a few other functions that tack a c onto the end of a function to indicate it has been divided by x, such as the jinc function.

The function tan(x)/x is sometimes called the tanc function, though this name is far less common than sinc. The cotangent function is the reciprocal of the tangent function, so I’m calling the reciprocal of the tanc function the cotanc function. Maybe it should be called the cotank function just for fun. For category theorists this brings up images of a tank that fires backward.

Practicality of the cotanc function

As noted above, the cotangent function is ill-behaved near 0, but the cotanc function is very nicely behaved near 0. The cotanc function has singularities at non-zero multiples of π but multiplying by x removes the singularity at 0.

As noted here, interpolation error depends on the size of the derivatives of the function being interpolated. Since the cotanc function is flat and smooth, it has small derivatives and thus small interpolation error.

Binomial coefficients with non-integer arguments

Posted on 18 August 2024 by John

When n and r are positive integers, with n ≥ r, there is an intuitive interpretation of the binomial coefficient C(n, r), namely the number of ways to select r things from a set of n things. For this reason C(n, r) is usually pronounced “n choose r.”

But what might something like C(4.3, 2)? The number of ways to choose two giraffes out of a set of 4.3 giraffes?! There is no combinatorial interpretation for binomial coefficients like these, though they regularly come up in applications.

It is possible to define binomial coefficients when n and r are real or even complex numbers. These more general binomial coefficients are in this liminal zone of topics that come up regularly, but not so regularly that they’re widely known. I wrote an article about this a decade ago, and I’ve had numerous occasions to link to it ever since.

The previous post implicitly includes an application of general binomial coefficients. The post alludes to coefficients that come up in Bessel’s interpolation formula but doesn’t explicitly say what they are. These coefficients B_k can be defined in terms of the Gaussian interpolation coefficients, which are in turn defined by binomial coefficients with non-integer arguments.

$\begin{eqnarray*} G_{2n} &=& {p + n - 1 \choose 2n} \\ G_{2n+1} &=& {p + n \choose 2n + 1} \\ B_{2n} &=& \frac{1}{2}G_{2n} \\ B_{2n+1} &=& G_{2n+1} - \frac{1}{2} G_{2n} \end{eqnarray*}$

Note that 0 < p < 1.

The coefficients in Everett’s interpolation formula can also be expressed simply in terms of the Gauss coefficients.

$\begin{eqnarray*} E_{2n} &=& G_{2n} - G_{2n+1} \\ F_{2n} &=& G_{2n+1} \\ \end{eqnarray*}$

Bessel, Everett, and Lagrange interpolation

Posted on 18 August 2024 by John

I never heard of Bessel or Everett interpolation until long after college. I saw Lagrange interpolation several times. Why Lagrange and not Bessel or Everett?

First of all, Bessel interpolation and Everett interpolation are not different kinds of interpolation; they are different algorithms for carrying out the same interpolation as Lagrange. There is a unique polynomial of degree n fitting a function at n + 1 points, and all three methods evaluate this same polynomial.

The advantages of Bessel’s approach or Everett’s approach to interpolation are practical, not theoretical. In particular, these algorithms are practical when interpolating functions from tables, by hand. This was a lost art by the time I went to college. Presumably some of the older faculty had spent hours computing with tables, but they never said anything about it.

Bessel interpolation and Everett interpolation are minor variations on the same theme. This post will describe both at such a high level that there’s no difference between them. This post is high-level because that’s exactly what seems to be missing in the literature. You can easily find older books that go into great detail, but I believe you’ll have a harder time finding a survey presentation like this.

Suppose you have a function f(x) that you want to evaluate at some value p. Without loss of generality we can assume our function has been shifted and scaled so that we have tabulated f at integers our value p lies between 0 and 1.

Everett’s formula (and Bessel’s) cleanly separates the parts of the interpolation formula that depend on f and the parts that depend on p.

The values of the function f enter through the differences of the values of f at consecutive integers, and differences of these differences, etc. These differences are easy to calculate by hand, and sometimes were provided by the table publisher.

The functions of p are independent of the function being interpolated, so these functions, the coefficients in Bessel’s formula and Everett’s formula, could be tabulated as well.

If the function differences are tabulated, and the functions that depend on p are tabulated, you could apply polynomial interpolation without ever having to explicitly evaluate a polynomial. You’d just need to evaluate a sort of dot product, a sum of numbers that depend on f multiplied by numbers that depend on p.

Another advantage of Bessel’s and Everett’s interpolation formulas is that they can be seen as a sequence of refinements. First you obtain the linear interpolation result, then refine it to get the quadratic interpolation result, then add terms to get the cubic interpolation result, etc.

This has several advantages. First, you have the satisfaction of seeing progress along the way; Lagrange interpolation may give you nothing useful until you’re done. Related to this is a kind of error checking: you have a sense of where the calculations are going, and intermediate results that violate your expectations are likely to be errors. Finally, you can carry out the calculations for the smallest terms in with less precision. You can use fewer and fewer digits in your hand calculations as you compute successive refinements to your result.

Compression and interpolation

Posted on 17 August 2024 by John

Data compression is everywhere. We’re unaware of it when it is done well. We only become aware of it when it is pushed too far, such as when a photo looks grainy or fuzzy because it was compressed too much.

The basic idea of data compression is to not transmit the raw data but to transmit some of the data along with instructions for how to approximately reconstruct the rest [1].

Fifty years ago scientists were concerned with a different application of compression: reducing the size of mathematical tables. Books of tabulated functions are obsolete now, but the principles used in producing these tables are still very much relevant. We use compression and interpolation far more often now, though it’s almost always invisibly executed by software.

Compressing tables

In this post I want to expand on comments by Forman Acton from his book Numerical Methods That Work on compression.

Many persons are unaware of the considerable compression in a table that even the use of quadratic interpolation permits. A table of sin x covering the first quadrant, for example, requires 541 pages if it is to be linearly interpolable to eight decimal places. If quadratic interpolation is used, the same table takes only one page having entries at one-degree intervals with functions of the first and second differences being recorded together with the sine itself.

Acton goes on to mention the advantage of condensing shelf space by a factor of 500. We no longer care about saving shelf space, but we may care very much about saving memory in an embedded device.

Quadratic interpolation does allow more compression than linear interpolation, but not by a factor of 500. I admire Acton’s numerical methods book, but I’m afraid he got this one wrong.

Interpolation error bound

In order to test Acton’s claim we will need the following theorem on interpolation error [2].

Let f be a function so that f⁽ⁿ⁺¹⁾ is continuous on [a, b] and satisfies |f⁽ⁿ⁺¹⁾ (x)| ≤ M. Let p be the polynomial of degree ≤ n that interpolates f at n + 1 equally spaced nodes in [a, b], including the end points. Then on [a, b],

$|f(x) - p(x)| \leq \frac{1}{4(n+1)} M \left(\frac{b-a}{n}\right)^{n+1}$

Quadratic interpolation error

Acton claims that quadratic interpolation at intervals of one degree is adequate to produce eight decimal places of accuracy. Quadratic interpolation means n = 2.

We have our function tabulated at evenly spaced points a distance h = π/180 radians apart. Quadratic interpolation requires function values at three points, so b − a = 2h = π/90. The third derivative of sine is negative cosine, so M = 1.

This gives an error bound of 4.43 × 10⁻⁷, so this would give slightly better than six decimal place accuracy, not eight.

Linear interpolation error

Suppose we wanted to create a table of sine values so that linear interpolation would give results accurate to eight decimal places.
In the interpolation error formula we have M = 1 as before, and now n = 1. We would need to tabulate sine at enough points that h = b − a is small enough that the error is less than 5 × 10⁻⁹. It follows that h = 0.0002 radians. Covering a range of π/2 radians in increments of 0.0002 radians would require 7854 function values. Acton implicitly assumes 90 values to a page, so this would take about 87 pages.

Abramowitz and Stegun devotes 32 pages to tabulating sine and cosine at increments of 0.001 radian. This does not always guarantee eight decimal place accuracy using linear interpolation, but it does guarantee at least seven places (more on that here), which is better than a table at one degree increments would deliver using quadratic interpolation. So it would have been more accurate for Acton to say quadratic interpolation reduces the number of pages by a factor of 30 rather than 500.

Cubic interpolation error

If we have a table of sine values at one degree increments, how much accuracy could we get using cubic interpolation? In that case we’d apply the interpolation error theorem with n = 3 and b − a = 3(π/180) = π/60. Then the error bound is 5.8 × 10⁻⁹. This would usually give you eight decimal place accuracy, so perhaps Acton carried out the calculation for cubic interpolation rather than quadratic interpolation.

[1] This is what’s known as lossy compression; some information is lost in the compression process. Lossless compression also replaces the original data with a description that can be used to reproduce the data, but in this case the reconstruction process is perfect.

[2] Ward Cheney and David Kincaid. Numerical Methods and Computation. Third edition.

Using a table of logarithms

Posted on 3 June 2024 by John

My favorite quote from Richard Feynman is his remark that “nearly everything is really interesting if you go into it deeply enough.” This post will look at something that seems utterly trivial—looking up numbers in a table—and show that there’s much more to it when you dig a little deeper.

More than just looking up numbers

Before calculators were common, function values would be looked up in a table. For example, here is a piece of a table of logarithms from Abramowitz and Stegun, affectionately known as A&S.

But you wouldn’t just “look up” logarithm values. If you needed to know the value of a logarithm at a point where it is explicitly tabulated, then yes, you’d simply look it up. If you wanted to know the log of 1.754, then there it is in the table. But what if, for example, you wanted to know the log of 1.7543?

Notice that function values are given to 15 significant figures but input values are only given to four significant figures. If you wanted 15 sig figs in your output, presumably you’d want to specify your input to 15 sig figs as well. Or maybe you only needed 10 figures of precision, in which case you could ignore the rightmost column of decimal places in the table, but you still can’t directly specify input values to 10 figures.

Lagrange interpolation

If you go to the bottom of the column of A&S in the image above, you see this:

What’s the meaning of the mysterious square bracket expression? It’s telling you that for the input values in the range of this column, i.e. between 1.750 and 1.800, the error using linear interpolation will be less than 4 × 10⁻⁸, and that if you want full precision, i.e. 15 sig figs, then you’ll need to use Lagrange interpolation with 5 points.

So going back to the example of wanting to know the value of log(1,7543), we could calculate it using

0.7 × log(1.754) + 0.3 × log(1.755)

and expect the error to be less than 4 × 10⁻⁸.

We can confirm this with a little Python code.

>>> from math import log
>>> exact = log(1.7543)
>>> approx = 0.7*log(1.754) + 0.3*log(1.755)
>>> exact - approx
3.411265947494968e-08

Python uses double precision arithmetic, which is accurate to between 15 and 16 figures—more on that here—and so the function calls above are essentially the same as the tabulated values.

Now suppose we want the value of x = 1.75430123456789. The hint in square brackets says we should use Lagrange interpolation at five points, centered at the nearest tabulated value to x. That is, we’ll use the values of log at 1.752, 1.753, 1.754, 1.755, and 1.756 to compute the value of log(x).

Here’s the Lagrange interpolation formula, given in A&S as equation 25.2.15.

We illustrate this with the following Python code.

def interpolate(fs, p, h):
    s = (p**2 - 1)*p*(p-2)*fs[0]/24
    s -= (p - 1)*p*(p**2 - 4)*fs[1]/6
    s += (p**2 - 1)*(p**2 - 4)*fs[2]/4
    s -= (p + 1)*p*(p**2 - 4)*fs[3]/6
    s += (p**2 - 1)*p*(p + 2)*fs[4]/24
    return s

xs = np.linspace(1.752, 1.756, 5)
fs = np.log(xs)
h = 0.001
x = 1.75430123456789
p = (x - 1.754)/h

print(interpolate(fs, p, h))
print(np.log(x))

This prints

0.5620706206909348
0.5620706206909349

confirming that the interpolated value is indeed accurate to 15 figures.

Lagrange interpolation takes a lot of work to carry out by hand, and so sometimes you might use other techniques, such as transforming your calculation into one for which a Taylor series approximation converges quickly. In any case, sophisticated use of numerical tables was not simply a matter of looking things up.

Contemporary applications

A book of numerical tables enables you to do calculations without a computer. More than that, understanding how to do calculations without a computer helps you program calculations with a computer. Computers have to evaluate functions somehow, and one way is interpolating tabulated values.

For example, you could think of a digital image as a numerical table, the values of some ideal analog image sampled at discrete points. The screenshots above are interpolated: the HTML specifies the width to be less than that of the original screenshots,. You’re not seeing the original image; you’re seeing a new image that your computer has created for you using interpolation.

Interpolation is a kind of compression. A&S would be 100 billion times larger if it tabulated functions at 15 figure inputs. Instead, it tabulated functions for 4 figure inputs and gives you a recipe (Lagrange interpolation) for evaluating the functions at 15 figure inputs if you desire. This is a very common pattern. An SVG image, for example, does not tell you pixel values, but gives you equations for calculating pixel values at whatever scale is needed.

Best line to fit three points

Posted on 7 July 2023 by John

Suppose you want to fit a line to three data points. If no line passes exactly through your points, what’s the best compromise you could make?

minmax line for three points

Chebyshev suggested the best thing to do is find the minmax line, the line that minimizes the maximum error. That is, for each candidate line, find the vertical distance from each point to the line, and take the largest of these three distances as the measure of how well the line fits. The best line is the that minimizes this measure.

Note that this is not the same line you would get if you did a linear regression. Customary linear regression minimizes the average squared vertical distance from each point to the line. There are reasons this is the standard approach when you have at least a moderate amount of data, but when you only have three data points, it makes sense to use the minmax approach.

We can say several things about the minmax line. For one thing, there exists such a line and it is unique. Also, the vertical distance from each data point to the line will be the same. Either two points will be over the line and one under, or two will be under and one over.

Suppose your three points are (x₁, y₁), (x₂, y₂), and (x₃, y₃), with x₁ < x₂ < x₃. Then the slope will be

$m = \frac{y_3 - y_1}{x_3 - x_1}$

and the intercept will be

$b = \frac{y_1 (x_2+x_3)+y_2 (x_3-x_1) -y_3 (x_1+x_2)}{2 (x_3-x_1)}$

I made an online calculator to find the best line for three points.

How well does a spline fit a function?

Posted on 18 February 2023 by John

Suppose you’re going to fit a spline s to a function f by interpolating f at a number of points. What can you know a priori about how well s will approximate f?

This question was thoroughly resolved five decades ago [1], but the result is a bit complicated, so we’ll incrementally work our way up to the final result.

What kind of spline

First of all we’ll need to be specific about what we mean by a spline. We’re going to look at cubic splines over a finite interval [a, b]. A cubic spline approximation to f over [a, b] is constructed by splitting the interval into N subintervals and fitting a cubic polynomial to f over each subinterval. The piecewise polynomials much match the value of f at the ends of the subintervals, and the values over contiguous intervals much match their first and second derivatives.

We’re not going to specify the endpoint conditions of the spline. The spline described above is not unique; it has two degrees of freedom which are typically specified by boundary conditions, but the theorem we’re building up to doesn’t require us to say anything about boundary conditions. In particular, we do not require our spline to be a natural cubic spline.

How to measure fit

How should we measure how well our spline s fits our function f? Obviously we’d like have a bound on

|s(x) − f(x)|

at each point. We’d also like to have a bound on the difference in derivatives at each point. Since our spline has two continuous derivatives, we’re interested in the first and second derivatives. In symbols, we’d like to bound

|s^(r)(x) − f^(r)(x)|

for r = 0, 1, or 2.

Smoothness of f

We have to say something about the smoothness of the function we’re interpolating. Since we’re approximating f by a function with two continuous derivatives, it only makes sense to consider functions f with at least two derivatives. Typically f will be smoother than s, so we will also look at the case of f having three or four continuous derivatives. We will measure the smoothness of f by the maximum value of its derivatives over the interval [a, b]. That is, by

|f ^(m)|_∞

where m = 2, 3, or 4.

Mesh size

Obviously the accuracy of our approximation will depend on the number of interpolation points N+1. We will not require the interpolation points to be evenly spaced, and so we will quantify the fineness of our mesh as the length h of the longest subinterval of [a, b].

Shape of our theorem

We now have the elements we need to state our theorem. The left side will be

|s^(r)(x) − f^(r)(x)|

and the right side will be some expression involving r, n, h, and the values of

|f ^(m)|_∞

for varying values of m.

We can make the bound on the left side tighter by taking into account which subinterval j = 0, 1, 2, … N-1 our x belongs to. In general, spline interpolation is more accurate toward the middle of [a, b] than toward the edges. Interestingly, our theorem will only depend on j and not on the width of the jth interval.

The theorem

And now for the theorem we’ve been working our way up to:

|s^(r)(x) − f^(r)(x)| ≤ ε_mr |f^(m)|_∞h^m–r+ K_mβ_r(2^1-j + 2^1-N+j) h.

The statement of the theorem is complicated because the author is carefully quantifying the error. You could make the theorem much simpler by just making a big-O statement. Instead, the author gives numerical values of each of the constants.

The theorem is even a little more complicated than it looks at first because the constants K_m depend on |f^(m)|_∞h. The values of the constants in the theorem are given in the table below. You can click on the image to get a larger image that is easier to read.

Error estimates for splines with more boundary conditions

Posted on 13 July 2021 by John

Yesterday I wrote about rates of convergence for natural cubic splines. This brief post reports similar results for more boundary conditions.

As explained in the earlier post, a cubic spline that interpolates a function f at n+1 points satisfies 4n − 2 equations in 4n variables. Two more equations are necessary to uniquely determine the interpolating cubic spline.

A natural cubic spline S over [a, b] satisfies

S”(a) = 0
S”(b) = 0.

There are other ways of coming up with two more equations. You might match first derivatives at the end points

S‘(a) = f‘(a)
S‘(b) = f‘(b)

or second derivatives

S”(a) = f”(a)
S”(b) = f”(b).

Or if f is periodic you could require first and second derivatives of the spline to match at a and b.

S‘(a) = S‘(b)
S”(a) = S”(b).

In each case, you get the same order of convergence as we reported earlier for the natural cubic spline, assuming f is 4 times continuously differentiable over [a, b]. The error goes down like O(1/n⁴).

Source: D. Kershaw. A Note on the Convergence of Interpolatory Cubic Splines. SIAM Journal on Numerical Analysis, Mar., 1971, Vol. 8, No. 1 pp. 67-74.

Example

When we use natural cubic splines to fit exp(x) over [0, 2π] in the earlier post, the error was fairly large. We get better results when we specify the derivatives at the two ends of the interval. Here’s what we get for 11 points:

The error is about 7 times smaller than with natural cubic splines.

And here’s what we get for 21 points:

In this case the error is about 8x smaller than with natural cubic splines.

When we go from 11 to 21 points, specifying the derivatives on each end, the error drops by about a factor of 15, close value of 16 we’d expect in the limit.

Incidentally, to specify the boundary conditions of the spline in Python we need to add an option parameter bc_type in the call to CubicSpline. Otherwise the code is the same as before. In our example

    bc_type=((1, 1), (1, np.exp(2*np.pi)))

We set bc_type equal to a pair of pairs. The first pair gives a boundary condition on the left end and the second gives a boundary condition on the right. The first argument of the inner pair in both cases is 1, because we’re specifying 1st derivatives. Use a 2 to specify 2nd derivatives. The second argument to the pair the value of the derivative: exp(0) = 1 on the left, and exp(2π) on the right.

Rate of natural cubic spline convergence

Posted on 12 July 2021 by John

Suppose you want to approximate a function with a polynomial by interpolating it at evenly spaced points. You might reasonably expect that the more points you use, the better the approximation will be. That might be true, but it might not. As explained here, for some functions the maximum approximation error actually increases as the number of points increases. This is called the Runge phenomenon.

What about cubic splines? Instead of fitting one high-degree polynomial, you fit a different cubic polynomial on each sub-interval, with the constraint that the pieces fit together smoothly. Specifically, the spline must be twice differentiable. This means that the first and second derivatives from both sides match at the end of each interval.

Say our function f is defined over an interval [a, b] and we break the interval into n sub-intervals. This means we interpolate f at n + 1 evenly spaced points. We have 4n degrees of freedom: four polynomial coefficients over each interval. Interpolation adds 2n constraints: the value of the polynomial is specified at the ends of each sub-interval. Matching first and second derivatives at each of the n − 1 interior nodes gives 2(n − 1) constrains. This is a total of 4n − 2 equations for 4n unknowns.

We get the two more equations we need by specifying something about the derivatives at a and b. A natural cubic spline sets the second derivatives equal to zero at a and b.

So if we interpolate f at n + 1 evenly spaced points using a natural cubic spline, does the splines converge uniformly to f as we increase n? Indeed they do. Nothing like Runge phenomenon can happen. If f is four times continuously differentiable over [a, b], then convergence is uniform on every compact subinterval of (a, b) and the rate of convergence is O(1/n⁴). If the second derivative of f is zero at a and b, then the rate of convergence is O(1/n⁴) over the whole interval [a, b].

Source: Kendall E. Atkinson. On the Order of Convergence of Natural Cubic Spline Interpolation. SIAM Journal on Numerical Analysis, Vol. 5, No. 1 (Mar., 1968), pp. 89-101.

Update: See this post for results for other kinds of cubic splines, i.e. cubic splines with different boundary conditions.

Experiment

We will look at sin(x) and exp(x) on the interval [0, 2π]. Note that the second derivatives of former at zero at both ends of the interval, but this is not true for the latter.

We will look at the error when interpolating our functions at 11 points and 21 points (i.e. 10 sub-intervals and 20 sub-intervals) with natural cubic splines. For the sine function we expect the error to go down by about a factor of 16 everywhere. For the exponential function, we expect the error to go down by about a factor of 16 in the interior of [0, 2π]. We’ll see what happens when we zoom in on the interval [2, 4].

We’re basing our expectations on a theorem about what happens as the number of nodes n goes to infinity, and yet we’re using fairly small values of n. So we shouldn’t expect too close of an agreement with theory, but we’ll see how close our prediction gets.

Computation

The following Python code will plot the error in the interpolation by natural cubic splines.

    import numpy as np
    from scipy.interpolate import CubicSpline
    import matplotlib.pyplot as plt

    for n in [11, 21]:
        for f in [np.sin, np.exp]:
            knots = np.linspace(0, 2*np.pi, n)
            cs = CubicSpline(knots, f(knots))
            x = np.linspace(0, 2*np.pi, 200)
        
            plt.plot(x, f(x) - cs(x))
            plt.xlabel("$x$")
            plt.ylabel("Error")
            plt.title(f"Interpolating {f.__name__} at {n} points")
            plt.savefig(f"{f.__name__}_spline_error_{n}.png")
            plt.close()

Notice an interesting feature of NumPy: we call the __name__ method on sin and exp to get their names as strings to use in the plot title and in the graphic file.

Here’s what we get for the sine function.

First for 10 intervals:

interpolation error for 11 points

Then for 20 intervals:

interpolation error for 11 points

When we doubled the number of intervals, the maximum error went down by about 30x, better than the theoretical upper bound on error.

Here’s what we get for the exponential function.

First for 10 intervals:

interpolation error for 11 points

Then for 20 intervals:

interpolation error for 11 points

The error in approximating the exponential function by splines is much greater than the error in approximating the sine function. That’s because the former acts less like a polynomial than the latter. Polynomial approximation, and piecewise polynomial approximation, works better on things that behave like polynomials.

When we doubled the number of intervals, the maximum error went down by a factor of 12. We shouldn’t be surprised that the error to go down by a factor of less than 16 since the hypotheses for the theorem that gives a factor of 16 aren’t satisfied.

Here’s what we get when we zoom in on the interval [2, 4].

For 10 subintervals:

interpolation error for 11 points on subinterval

And for 20 subintervals:

interpolation error for 21 points on subinterval

The error is much smaller over the interior interval. And when we double the number of interpolation points, the error over [2, 4] goes down by about a factor of 3 just as it did for sine.

Note that in these last two plots, we’re still interpolating over the entire interval [0, 2π], but we’re looking at the error over [2, 4]. We’re zooming in on part of the error plot, not allocating our interpolation points specifically to the smaller interval.

Interpolation

Discrete Taylor series

Convergence

Related posts

Interpolation and the cotanc function

Tau and sigma functions

The cotanc function

Practicality of the cotanc function

Binomial coefficients with non-integer arguments

Bessel, Everett, and Lagrange interpolation

Related posts

Compression and interpolation

Compressing tables

Interpolation error bound

Quadratic interpolation error

Linear interpolation error

Cubic interpolation error

Related posts

Using a table of logarithms

More than just looking up numbers

Lagrange interpolation

Contemporary applications

Related posts

Best line to fit three points

How well does a spline fit a function?

What kind of spline

How to measure fit

Smoothness of f

Mesh size

Shape of our theorem

The theorem

More spline posts

Error estimates for splines with more boundary conditions

Example

Rate of natural cubic spline convergence

Experiment

Computation

Related posts