Rényi’s parking constant

Parallel parking photo by IFCAR, Public Domain

Imagine parallel parking is available along the shoulder of a road, but no parking spaces are marked.

The first person to park picks a spot on the shoulder at random. Then another car also chooses a spot along the shoulder at random, with the constraint that the second car can’t overlap the first. This process continues until no more cars can park. How many people can park this way?

Assume all cars have the same length, and we choose our units so that cars have length 1. The expected number of cars that can park is a function M(x) of the length of the parking strip x. Clearly if x < 1 then M(x) = 0. Alfréd Rényi [1] found that for x ≥ 1, M(x) satisfies the equation

M(x) = 1 + \frac{2}{x-1}\int_0^{x-1} M(t)\, dt

This is an unusual equation, difficult to work with because it defined M only implicitly. It’s not even clear that the equation has a solution. But it does, and the ratio of M(x) to x approaches a constant as x increases.

m = \lim_{x\to\infty} \frac{M(x)}{x} = 0.74759\ldots

The number m is known as Rényi’s parking constant.

This say for a long parking strip, parking sequentially at random will allow about 3/4 as many cars to park as if you were to pack the cars end-to-end.

More posts on Rényi’s work

[1] Steven R. Finch. Mathematical Constants. Cambridge University Press, 2003.

The Clausen function

I ran across the Clausen function the other day, and when I saw a plot of the function my first thought was that it looks sorta like a sawtooth wave.

Plot of Clausen function Cl_2

I wondered whether it also sounds like a sawtooth wave, and indeed it does. More on that shortly.

The Clausen function can be defined in terms of its Fourier series:

\text{Cl}_2(x) = \sum_{k=1}^\infty \frac{\sin(kx)}{k^2}

The function commonly known as the Clausen function is one of a family of functions, hence the subscript 2. The Clausen functions for all non-negative integers n are defined by replacing 2 with n on both sides of the defining equation.

The Fourier coefficients decay quadratically, as do those of a triangle wave or sawtooth wave, as discussed here. This implies the function Cl2(x) cannot have a continuous derivative. In fact, the derivative of Cl2(x) is infinite at 0. This follows quickly from the integral representation of the function.

\text{Cl}_2(x)=-\int_0^x\log \left|2\sin\frac{t}{2} \right|\, dt

The fundamental theorem of calculus shows that the derivative

\text{Cl}'_2(x)=-\log \left|2\sin\frac{x}{2} \right|

blows up at 0.

Now suppose we create an audio clip of Cl2(440x). This creates a sound with pitch A 440, but rather than a sinewave it has an unpleasant buzzing sound, much like a sawtooth wave.

clausen2.wav

 

The harshness of the sound is due to the slow decay of the Fourier coefficients; the Fourier coefficients of more pleasant musical sounds decay much faster than quadratically.

Related posts

Limit of a doodle

Suppose you’re in a boring meeting and you start doodling. You draw a circle, and then you draw a triangle on the outside of that circle.

Next you draw a circle through the vertices of the triangle, and draw a square outside that.

Then you draw a circle through the vertices of the square, and draw a pentagon outside that.

Then you think “Will this ever stop?!”, meaning the meeting, but you could ask a similar question about your doodling: does your sequence of doodles converge to a circle of finite radius, or do they grow without bound?

An n-gon circumscribed on the outside of a circle of radius r is inscribed in a circle of radius

\frac{r}{\cos(\pi/n)}

So if you start with a unit circle, the radius of the circle through the vertices of the N-gon is

\prod_{n=3}^N \frac{1}{\cos(\pi/n)}

and the limit as N → ∞ exists. The limit is known as the polygon circumscribing constant, and it equals 8.7000366252….

We can visualize the limit by making a plot with large N. The plot is less cluttered if we leave out the circles and just show the polygons. N = 30 in the plot below.

The reciprocal of the polygon circumscribing constant is known as the Kepler-Bouwkamp constant. The Kepler-Bouwkamp constant is the limiting radius if you were to reverse the process above, inscribing polygons at each step rather than circumscribing them. It would make sense to call the Kepler-Bouwkamp constant the polygon inscribing constant, but for historical reasons it is named after Johannes Kepler and Christoffel Bouwkamp.

Computing Γ(z) for complex z with tables

In the previous post I mentioned that the general strategy for computing a mathematical function using tables is to first reduce the function argument to be within the range of the tabulated values, and then to use interpolation to compute the function at values that are not directly tabulated.

The second step is always the same, applying Lagrange interpolation with enough points to get the accuracy you need. But the first step, range reduction, depends on the function being evaluated. And as the previous post concluded, evaluating more advanced functions generally requires more advanced range reduction.

Real argument

For the gamma function, the identity

\Gamma(x + 1) = x\, \Gamma(x)

can be used to reduce the problem of computing Γ(x) for any real x to the problem of computing Γ(x) for x in the interval [1, 2]. If x is large, the identity will have to be applied many times and so this would be a lot of work. However, the larger x is, the more accurate Stirling’s approximation becomes.

Complex argument

Computing Γ(x + iy) is more complex, pardon the pun. We can still use the identity above to reduce the real part x of the argument to lie in the interval [1, 2], but what about the complex part y?

Abramowitz and Stegun, Table 6.7, tabulates the principal branch of log Γ(x + iy) for x from 1 to 2 and for y from 0 to 10, both in increments of 0.1. Generally the logarithm of the gamma function is more useful in computation than the gamma function itself. It is also easier to interpolate, which I imagine is the main reason A&S tabulate it rather than the gamma function per se. A note below Table 6.7 says that linear interpolation gives about 3 significant figures, and eight-point interpolation gives about 8 figures.

By the Schwarz reflection principle,

\Gamma(\bar{z}) = \overline{\Gamma(z)}

and with this we can extend our range on y to [−10, 10].

Large imaginary part

What about larger y? We have two options: the duplication formula and Stirling’s approximation.

The duplication formula

\Gamma(2z) = \frac{1}{\sqrt{2\pi}}2^{2z - \frac{1}{2}} \Gamma(z) \Gamma\left(z + \tfrac{1}{2} \right )

lets us compute Γ(2z) if we can compute Γ(z) and Γ(z + ½).

Stirling’s approximation for Γ(z) is accurate when |z| is large, and |x + iy| is large when |y| is large.

More posts on using tables

Calculating trig functions from tables

It takes some skill to use tables of mathematical functions; it’s not quite as simple as it may seem. Although it’s no longer necessary to use tables, it’s interesting to look into the details of how it is done.

For example, the Handbook of Mathematical Functions edited by Abramowitz and Stegun tabulates sines and cosines in increments of one tenth of a degree, from 0 degrees to 45 degrees. What if your angle was outside the range 0° to 45° or if you needed to specify your angle more precisely than 1/10 of a degree? What if you wanted, for example, to calculate cos 203.147°?

The high-level answer is that you would use range reduction and interpolation. You’d first use range reduction to reduce the problem of working with any angle to the problem of working with an angle between 0° and 45°, then you’d use interpolation to get the necessary accuracy for a value within this range.

OK, but how exactly do you do the range reduction and how exactly do you to the interpolation? This isn’t deep, but it’s not trivial either.

Range reduction

Since sine and cosine have a period of 360°, you can add or subtract some multiple of 360° to obtain an angle between −180° and 180°.

Next, you can use parity to reduce the range further. That is, since sin(−x) = −sin(x) and cos(−x) = cos(x) you can reduce the problem to computing the sine or cosine of an angle between 0 and 180°.

The identities sin(180° − x) = sin(x) and cos(180° −x) = −cos(x) let you reduce the range further to between 0 and 90°.

Finally, the identities cos(x) = sin(90° − x) and sin(x) = cos(90° − x) can reduce the range to 0° to 45°.

Interpolation

You can fill in between the tabulated angles using interpolation, but how accurate will your result be? How many interpolation points will you need to use in order to get single precision, e.g. an error on the order of 10−7?

The tables tell you. As explained in this post on using a table of logarithms, the tables have a notation at the bottom of the table that tells you how many Lagrange interpolation points to use and what kind of accuracy you’ll get. Five interpolation points will give you roughly single precision accuracy, and the notation gives you a little more accurate error bound. The post on using log tables also explains how Lagrange interpolation works.

Beyond trig functions

I intend to write more posts on using tables. The general pattern is always range reduction and interpolation, but it takes more advanced math to reduce the range of more advanced functions.

Update: The next post shows how to use tables to compute the gamma function for complex arguments.

Rapidly convergent series for ellipse perimeter

The previous post looked at two simple approximations for the perimeter of an ellipse. Approximations are necessary since the perimeter of an ellipse cannot be expressed as an elementary function of the axes.

Kepler noted in 1609 that you could approximate the perimeter of an ellipse as the perimeter of a circle whose radius is the mean of the semi-axes of the ellipse, where the mean could be either the arithmetic mean or the geometric mean. The previous post showed that the arithmetic mean is more accurate, and that it under-estimates the perimeter. This post will explain both of these facts.

There are several series for calculating the perimeter of an ellipse. In 1798 James Ivory came up with a series that converges more rapidly than previously discovered series. Ivory’s series is

P = \pi (a+b) \left(1 + \sum_{n=1}^\infty \left(\frac{(2n-1)!!}{2^n n!}\right)^2 \frac{h^n}{(2n-1)^2}\right)

where

h = \frac{(a-b)^2}{(a+b)^2}

If you’re not familiar with the !! notation, see this post on multifactorials.

The 0th order approximation using Ivory’s series, dropping all the infinite series terms, corresponds to Kepler’s approximation using the arithmetic mean of the semi-axes a and b. By convention the semi-major axis is labeled a and the semi-minor axis b, but the distinction is unnecessary here since Ivory’s series is symmetric in a and b.

Note that h ≥ 0 and h = 0 only if the ellipse is a circle. So the terms in the series are positive, which explains why Kepler’s approximation under-estimates the perimeter.

Using just one term in Ivory’s series gives a very good approximation

P \approx \pi(a + b)\left(1 + \frac{1}{4} \frac{(a-b)^2}{(a+b)^2}\right)

The approximation error increases as the ratio of a to b increases, but even for a highly eccentric ellipse like the orbit of the Mars Orbital Mission, the ratio of a to b is only 2, and the relative approximation error is about 1/500, about 12 times more accurate than Kepler’s approximation.

Kepler’s ellipse perimeter approximations

In 1609, Kepler remarked that the perimeter of an ellipse with semiaxes a and b could be approximated either as

P ≈ 2π(ab)½

or

P ≈ π(a + b).

In other words, you can approximate the perimeter of an ellipse by the circumference of a circle of radius r where r is either the geometric mean or arithmetic mean of the semi-major and semi-minor axes.

Comparing ellipse with circles of approximately the same perimeter

How good are these approximations, particularly when a and b are roughly equal? Which one is better?

When can choose our unit of measurement so that the semi-minor axis b equals 1, then plot the error in the two approximations as a increases.

Exact and approximation ellipse perimeter

We see from this plot that both approximations give lower bounds, and that arithmetic mean is more accurate than geometric mean.

Incidentally, if we used the geometric mean of the semi-axes as the radius of a circle when approximating the area then the results would be exactly correct. But for perimeter, the arithmetic mean is better.

Ellipse approximation relative error

Next, if we just consider ellipses in which the semi-major axis is no more than twice as long as the semi-minor axis, the arithmetic approximation is within 2% of the exact value and the geometric approximation is within 8%. Both approximations are good when ab.

The next post goes into more mathematical detail, explaining why Kepler’s approximation behaves as it does and giving ways to improve on it.

More ellipse posts

Power method and centrality

A few days ago I wrote about eigenvector centrality, a way of computing which nodes in a network are most important. Rather than simply looking for the most highly connected nodes, it looks for nodes that are highly connected to nodes that are highly connected. It’s the idea behind Google’s PageRank algorithm.

Adjacency matrices

One way to capture the structure of a graph is its adjacency matrix A. Each element aij of this matrix equals 1 if there is an edge between the ith and jth node and a 0 otherwise. If you square the matrix A, the (i, j) entry of the result tells you how many paths of length 2 are between the ith and jth nodes. In general, the (i, j) entry of An tells you how many paths of length n there are between the corresponding nodes.

Power method

Calculating eigenvector centrality requires finding an eigenvector associated with the largest eigenvalue of A. One way to find such an eigenvector is the power method. You start with a random initial vector and repeatedly multiply it by A. This produces a sequence of vectors that converges to the eigenvector we’re after.

Conceptually this is the same as computing An first and multiplying it by the random initial vector. So not only does the matrix An count paths of length n, for large n it helps us compute eigenvector centrality.

Theoretical details

Now for a little fine print. The power method will converge for any starting vector that has some component in the direction of the eigenvector you’re trying to find. This is almost certainly the case if you start with a vector chosen at random. The power method also requires that the matrix has a single eigenvector of largest magnitude and that its associated eigenspace have dimension 1. The post on eigenvector centrality stated that these conditions hold, provided the network is connected.

Computational practicalities

In principle, you could use calculate eigenvector centrality by computing An for some large n. In practice, you’d never do that. For a square matrix of size N, a matrix-vector product takes O(N²) operations, whereas squaring A requires O(N³) operations. So you would repeatedly apply A to a vector rather than computing powers of A.

Also, you wouldn’t use the power method unless A is sparse, making it relatively efficient to multiply by A. If most of the entries of A are zeros, and there is an exploitable pattern to where the non-zero elements are located, you can multiply A by a vector with far less than N² operations.

Eigenvector centrality

A basic question to ask about a network is which nodes are most important. A simple way of answering this question would be to say the most well-connected nodes, i.e. the nodes with the most edges. This approach is known as degree centrality. It’s not a bad place to start. It’s easy to understand and easy to compute. It may be adequate for some purposes, but it’s often helpful to take a more sophisticated approach known as eigenvector centrality.

Examples

Let’s look at a few motivating examples before we get into the details.

Social network marketing

If you wanted to advertise something, say a book you’ve written, and you’re hoping people will promote it on Twitter. Would you rather get a shout out from someone with more followers or someone with less followers? All else being equal, more followers is better. But even better would be a shout out from someone whose followers have a lot of followers.

Deans and flight attendants

Suppose you’re at a graduation ceremony. Your mind starts to wander after the first few hundred people walk across the stage, and you start to think about how a cold might spread through the crowd. The dean handing out diplomas could be a superspreader because he’s shaking hands with everyone as they receive their diplomas. But an inconspicuous parent in the audience may also be a superspreader because she’s a flight attendant and will be in a different city every day for the next few days. And not only is she a traveler, she’s in contact with travelers.

Search engines

Ranking web pages according to the number of inbound links they have was a good idea because this takes advantage of revealed preferences: instead of asking people what pages they recommend, you can observe what pages they implicitly recommend by linking to them. An even better idea was Page Rank, weighing inbound links by how many links the linking pages have.

Eigenvector centrality

The idea of eigenvector centrality is to give each node a rank proportional to the sum of the ranks of the adjacent nodes. This may seem circular, and it kinda is.

To know the rank of a node, you have to know the ranks of the nodes linking to it. But to know their ranks, you have to know the ranks of the nodes linking to them, etc. There is no sequential solution to the problem because you’d end up in an infinite regress, even for a finite network. But there is a simultaneous solution, considering all pages at once.

You want to assign to the ith node a rank xi proportional to the sum of the ranks of all adjacent nodes. A more convenient way to express this to compute the weighted sum of the ranks of all nodes, with weight 1 for adjacent nodes and weight 0 for non-adjacent nodes. That is, you want to compute

x_i = \frac{1}{\lambda} \sum a_{ij} x_j

where the a‘s are the components of the adjacency matrix A. Here aij equals 1 if there is an edge between nodes i and j and it equals 0 otherwise.

This is equivalent to looking for solutions to the system of equations

Ax = \lambda x

where A is the adjacency matrix and x is the vector of node ranks. If there are N nodes, then A is an N × N matrix and x is a column vector of length N.

In linear algebra terminology, x is an eigenvector of the adjacency matrix A, hence the name eigenvector centrality.

Mathematical questions

There are several mathematical details to be concerned about. Does the eigenvalue problem defining x have a solution? Is the solution unique (up to scalar multiples)? What about the eigenvalue λ? Does the adjacency matrix have multiple eigenvalues, which would mean there are multiple eigenvectors?

If A is the adjacency matrix for a connected graph, then there is a unique eigenvalue λ of largest magnitude, there is a unique corresponding eigenvector x, and all the components of x are non-negative. It is often said that this is a result of the Perron–Frobenius theorem, but there’s a little more to it than that because you need the hypothesis that the graph is connected.

The matrix A is non-negative, but not necessarily positive: some entries may be zero. But if the graph is connected, there’s a path between any node i and another node j, which means for some power of A, the ij entry of A is not zero. So although A is not necessarily positive, some power of A is positive. This puts us in the positive case of the Perron–Frobenius theorem, which is nicer than the non-negative case.

This section has discussed graphs, but Page Rank applies to directed graphs because links have a direction. But similar theorems hold for directed graphs.

Related posts

Solutions to tan(x) = x

I read something recently that said in passing that the solutions to the equation tan x = x are the zeros of the Bessel function J3/2. That brought two questions to mind. First, where have I seen the equation tan x = x before? And second, why should its solutions be the roots of a Bessel function.

The answer to the first question is that I wrote about the local maxima of the sinc function three years ago. That post shows that the derivative of the sinc function sin(x)/x is zero if and only if x is a fixed point of the tangent function.

As for why that should be connected to zeros a Bessel function, that one’s pretty easy. In general, Bessel functions cannot be expressed in terms of elementary functions. But the Bessel functions whose order is an integer plus ½ can.

For integer n,

J_{n+{\frac{1}{2}}}(x)= (-1)^n \left(\frac{2}{\pi}\right)^{\frac{1}{2}} x^{n+{\frac{1}{2}}} \left(\frac{1}{x}\frac{d}{dx}\right)^n\left(\frac{\sin x}{x}\right)

So when n = 1, we’ve got the derivative of sinc right there in the definition.