Starlink configurations

My nephew recently told me about being on a camping trip and seeing a long line of lights in the sky. The lights turned out to be Starlink satellites. It’s fairly common for people report seeing lines of these satellites.

Four lights in the sky in a line

Why would the satellites be in a line? Wouldn’t it be much more efficient to spread them out? They do spread out, but they’re launched in groups. Satellites released into orbit at the same time initially orbit in a line close together.

It would seem the optimal strategy would be to spread communication satellites out evenly in a sphere. There are several reasons why that is neither desirable or possible. It is not desirable because human population is far from evenly distributed. It’s very nice to have some coverage over the least-populated places on earth, such as Antarctica, but there is far more demand for service over the middle latitudes.

It is not possible to evenly distribute more than 20 points on a sphere, and so it would not be possible to spread out thousands of satellites perfectly evenly. However there are ways to arbitrarily many points somewhat evenly, such as in a Fibonacci lattice.

It’s also not possible to distribute satellites in a static configuration. Unless a satellite is in geostationary orbit, it will constantly move relative to the earth. One problem with geostationary orbit is that it is at an altitude of 42,000 km. Starlink satellites are in low earth orbit (LEO) between 300 km and 600 km altitude. It is less expensive to put satellites into LEO and there is less latency bouncing signals off satellites closer to the ground.

Satellites orbit at different altitudes, and altitude and velocity are tightly linked. You want satellites orbiting at different altitudes to avoid collisions, they’re orbiting at different velocities. Even if you wanted all satellites to orbit at the same altitude, this would require constant maintenance due to various real-world departures from ideal Keplerian conditions. Satellites are going to move around relative to each other whether you want them to or not.

Related posts

Putting a face on a faceless account

I’ve been playing around with Grok today, logging into some of my X accounts and trying out the prompt “Draw an image of me based on my posts.” [1] In most cases Grok returned a graphic, but sometimes it would respond with a text description. In the latter case asking for a photorealistic image made it produce a graphic.

Here’s what I get for @AlgebraFact:

The icons for all my accounts are cerulean blue dots with a symbol in the middle. Usually Grok picks up on the color, as above. With @AnalysisFact, it dropped a big blue piece of a circle on the image.

For @UnixToolTip it kept the & from the &> in the icon. Generative AI typically does weird things with text in images, but it picked up “awk” correctly.

Here’s @ProbFact. Grok seems to think it’s a baseball statistics account.

Last but not least, here’s @DataSciFact.

I wrote a popular post about how to put Santa hats on top of symbols in LaTeX, and that post must have had an outsided influence on the image Grok created.

[1] Apparently if you’re logging into account A and ask it to draw B, the image will be heavily influence by A‘s posts, not B‘s. You have to log into B and ask in the first person.

Can AI models reason: Just a stochastic parrot?

OpenAI has just released its full o1 model—a new kind of model that is more capable of multi-step reasoning than previous models. Anthropic, Google and others are no doubt working on similar products. At the same time, it’s hotly debated in many quarters whether AI models actually “reason” in a way similar to humans.

Emily Bender and her colleagues famously described large language models as nothing more than “stochastic parrots“—systems that simply repeat their training data blindly, based on a statistical model, with no real understanding (reminiscent of the Chinese Room experiment). Others have made similar comments, describing LLMs as “n-gram models on steroids” or a “fancy extrapolation algorithm.

There is of course some truth to this. AI models sometimes generate remarkable results and yet lack certain basic aspects of understanding that might inhibit their sometimes generation of nonsensical results. More to the point of “parroting” the training data, recent work from Yejin Choi’s group has shown how LLMs at times will cut and paste snippets from its various training documents, almost verbatim, to formulate its outputs.

Are LLMs (just) glorified information retrieval tools?

The implication of these concerns is that an LLM can “only” repeat back what it was taught (albeit with errors). However this view does not align with the evidence. LLM training is a compression process in which new connections between pieces of information are formed that were not present in the original data. This is evidenced both mathematically and anecdotally. In my own experience, I’ve gotten valid answers to such obscure and detailed technical question that it is hard for me to believe would exist in any training data in exactly that form. Whether you would call this “reasoning” or not might be open to debate, but regardless of what you call it, it is something more than just unadorned information retrieval like a “stochastic parrot.”

What is your experience? Let us know in the comments.

Interval arithmetic and fixed points

A couple days ago I analyzed the observation that repeatedly pressing the cosine key on a calculator leads to a fixed point. After about 90 iterations the number no longer changes. This post will analyze the same phenomenon a different way.

Interval arithmetic

Interval arithmetic is a way to get exact results of a sort from floating point arithmetic.

Suppose you start with a number x that cannot be represented exactly as a floating point number, and you want to compute f(x) for some function f. You can’t represent x exactly, but unless x is too large you can represent a pair of numbers a and b such that x is certainly in the interval [a, b]. Then f(x) is in the set f( [a, b] ).

Maybe you can represent f( [a, b] ) exactly. If not, you can enlarge the interval a bit to exactly represent an interval that contains f(x). After applying several calculations, you have an interval, hopefully one that’s not too big, containing the exact result.

(I said above that interval arithmetic gives you exact results of a sort because even though you don’t generally get an exact number at the end, you do get an exact interval containing the result.)

Cosine iteration

In this post we will use interval arithmetic, not to compensate for the limitations of computer arithmetic, but to illustrate the convergence of iterated cosines.

The cosine of any real number lies in the interval [−1, 1]. To put it another way,

cos( [−∞, ∞] ) = [−1, 1].

Because cosine is an even function,

cos( [−1, 1] ) = cos( [0, 1] )

and so we can limit our attention to the interval [0, 1].

Now the cosine is a monotone decreasing function from 0 to π, and so it’s monotone on [0, 1]. For any two points with 0 ≤ ab ≤ π we have

cos( [a, b] ) = [cos(b), cos(a)].

Note that the order of a and b reverses on the right hand side of the equation because cosine is decreasing. When we apply cosine again we get back the original order.

cos(cos( [a, b] )) = [cos(cos(a)), cos(cos(b))].

Incidentally, this flip-flop is explains why the cobweb plot from the previous post looks like a spiral rather than a staircase.

Now define a0 = 0, b0 = 1, and

[an+1, bn+1] = cos( [an, bn] ) = [cos(bn), cos(an)].

We could implement this in Python with a pair of mutually recursive functions.

    a = lambda n: 0 if n == 0 else cos(b(n-1))
    b = lambda n: 1 if n == 0 else cos(a(n-1))

Here’s a plot of the image of [0, 1] after n iterations.

Note that odd iterations increase the lower bound and even iterations decrease the upper bound.

Numerical interval arithmetic

This post introduced interval arithmetic as a numerical technique, then proceeded to do pure math. Now let’s think about computing again.

The image of [0, 1] under cosine is [cos(1), cos(0)] = [cos(1), 1]. A computer can represent 1 exactly but not cos(1). Suppose we compute

cos(1) = 0.5403023058681398

and assume each digit in the result is correct. Maybe the exact value of cos(1) was slightly smaller and was rounded to this value, but we know for sure that

cos( [0, 1] ) ⊂ [0.5403023058681397, 1]

So in this case we don’t know the image of [0, 1], but we know an interval that contains the image, hence the subset symbol.

We could iterate this process, next computing an interval that contains

cos( [0.5403023058681397, 1] )

and so forth. At each step we would round the left endpoint down to the nearest representable lower bound and round the right endpoint up to the nearest representable upper bound. In practice we’d be concerned with machine representable numbers rather than decimal representable numbers, but the principle is the same.

The potential pitfall of interval arithmetic in practice is that intervals may grow so large that the final result is not useful. But that’s not the case here. The rounding error at each step is tiny, and contraction maps reduce errors at each step rather than magnifying them. In a more complicated calculation, we might have to resort to lose estimates and not have such tight intervals at each step.

Related posts

Normal probability approximation

The previous post presented an approximation

\int_0^x \exp(-t^2)\, dt \approx \sin(\sin(x))

for −1 ≤ x ≤ 1 and said that it was related to a probability function. This post will make the connect explicit.

Let X be a normally distributed random variable with mean μ and variance σ². Then the CDF of X is

P(X \leq x) = \frac{1}{\sqrt{2\pi} \sigma} \int_{-\infty}^x \exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right)\, dt

So if μ = 0 and σ² = 1/2 then we have the following.

P(0 \leq X \leq x) = \frac{1}{\sqrt{\pi}} \int_0^x \exp(-t^2)\, dt \approx \frac{1}{\sqrt{\pi}} \sin(\sin(x))

Here’s Python code to show how good the approximation is.

    from numpy import sin, linspace, sqrt, pi
    from scipy.stats import norm
    import matplotlib.pyplot as plt

    x = linspace(-1, 1, 200)
    X = norm(0, sqrt(0.5))
    plt.plot(x, X.cdf(x) - X.cdf(0))
    plt.plot(x, sin(sin(x))/sqrt(pi))
    plt.legend(["exact", "approx"])
    plt.xlabel("$x$")
    plt.ylabel(r"$P(X \leq x)$")

Here’s the resulting plot:

The orange curve for the plot of the approximation completely covers the blue curve of the exact value. The two curves are the same to within the resolution of the plot. See the previous post for a detailed error analysis.

Simple error function approximation

I recently ran across the fact that

\int_0^x \exp(-t^2)\, dt \approx \sin(\sin(x))

is a remarkably good approximation for −1 ≤ x ≤ 1.

Since the integral above defines the error function erf(x), modulo a constant, this says we have a good approximation for the error function

\text{erf}(x) \approx \frac{2}{\sqrt{\pi}} \sin( \sin(x) )

again provided −1 ≤ x ≤ 1.

The error function is closely related to the Gaussian integral, i.e. the normal probability distribution CDF Φ. The relation between erf and Φ is simple but error-prone. I wrote up some a page notes for myself a few years ago so I wouldn’t make a mistake again moving between these functions and their inverses.

Update: This post makes the connection to probability explicit.

You can derive the approximation by writing out the power series for exp(t), substituting −t² for t, and integrating term-by-term from 0 to x. You’ll see that the result is the same as the power series for sine until you get to the x5 term, so the error is on the order of x5. Here’s a plot of the error.

The error is extremely small near 0, which is what you’d expect since the error is on the order of x5.

Pressing the cosine key over and over

No matter what number you start with, if you press the cos key on a calculator repeatedly, the numbers eventually quit changing. This fact has been rediscovered by countless children playing with calculators.

If you start with 0, which is likely the default when you turn on a calculator, you’ll hit the final value after 90 steps [1]. You could verify this with the following Python code.

    from math import cos

    x = 0
    for i in range(100):
        x = cos(x)
        print(i, x) 

Starting with iteration 90, the code prints 0.7390851332151607 every time.

Visualizing convergence

Let’s visualize the sequence of values using a cobweb plot. Here’s an image I made for a post four years ago, starting with x = 1.

cobweb plot for iterating cosine

To read the graph, start at x = 1 on the horizontal axis. The solid black line is a plot of the cosine function, and so the point above 1 where the blue line starts is y = cos(1) = 0.54.

Since we’re going to stick our output back into cosine as input, we need to turn our y into an x. We do this by sliding the point over to the dotted line representing y = x. This creates the horizontal blue line that is the first segment of our spiral. This takes us to the point (cos(1), cos(1)).

Now when we take the cosine again, we get cos(cos(1)) = 0.86. Now we move from our previous value of y = 0.54 to our new value of y = 0.86. This gives the next segment of our spiral. Again we need to turn our y into an x, so we slide over to the line y = x as before, only this time we’re approaching from the left side rather than from the right.

We quickly get to where we can no longer see the convergence. The plot above used 20 iterations, and we end up with a blue blob around the point of convergence.

Proving convergence

We said that the iteration converges for any starting point. Why is that?

For one thing, we might as well assume x is between 0 and 1; if not, it will be after one iteration.

The mean value theorem says for any pair x1 and x2 in [0, 1],

cos(x1) − cos(x2) = − sin(c) (x1x2)

for some c in [0, 1] because the derivative of cos(x) is − sin(x). It follows that

| cos(x1) − cos(x2) | ≤ sin(1) | x1x2 |

because the maximum value of sin(x) on [0, 1] is sin(1). Since sin(1) = 0.84… is less than 1, this says that cosine is a contraction mapping on [0, 1] and so there is a fixed point p such that cos(p) = p.

Rate of convergence

We could use this to calculate an upper bound on how far x is from the fixed point p after k iterations:

| xp | ≤ sin(1)k

and so if we want to be within a distance ε of the fixed point, we need

k ≥ log(ε) / log(sin(1)).

This says that to get withing floating point precision (about 10−16) of the fixed point, 214 iterations will do. This is true, but it’s a pessimistic estimate because it’s based on a pessimistic estimate of sin(c).

Once we get close to the fixed point p, the rate of convergence is more like sin(p)k than sin(1)k. This suggests we should be within floating point precision of the fixed point after about 90 iterations, which is what we saw at the top of the post.

More fixed point posts

[1] This is true if your calculator is using 64-bit floating point numbers. Your mileage may vary if you have a calculator that retains more or less precision.

Perpetual Calendars

The previous post explained why the Gregorian calendar is the way it is, and that it consists of a whole number of weeks. It follows that the Gregorian calendar repeats itself every 400 years. For example, the calendar for 2025 will be exactly the same as the calendar for 1625 and 2425.

There are only 14 possible printed calendars, if you don’t print the year on the calendar. There are seven possibilities for the day of the week for New Years Day, and there are two possibilities for whether the year is a leap year.

A perpetual calendar is a set of the 14 possible calendars, along with some index that tells which possible calendar is appropriate in a given year.

Are each of the 14 calendars equally frequent? No, because only in 97 out of 400 years should the calendar include a leap day.

Are each of the non-leap year calendars equally frequent? No, because there are 303 non-leap years, and 303 is not divisible by 7. What about the leap year calendars? Again no, because 97 is not divisible by 7 either.

Now we might expect that within ordinary years, or within leap years, each potential calendar is used about the same number of times. But is that true? Here’s some Python code to answer that question.

ordinary_counts = [0]*7
leap_counts     = [0]*7

y = 0 # year mod 400
leap_counts[0] = 1

d = 0 # days since January 1 at beginning of cycle

def leap(y):
    return y % 4 == 0 and (y % 100 != 0 or y % 400 == 0)

for y in range(1, 400):
    d += y*365
    # whether previous year was a leap year
    if leap(y - 1):
        d += 1
    if leap(y):
        leap_counts[d % 7] += 1
    else:
        ordinary_counts[d % 7] += 1
    
print(ordinary_counts)
print(leap_counts)

Here’s the output:

[45, 43, 44, 40, 45, 34, 52]
[10, 11, 17, 16, 23, 13, 7]

The calendar types are not so evenly distributed in frequency. Within leaps years, the most common calendar type is more than three times as common as the least common type.

Here are bar charts of the frequencies. The charts start with Saturday because January 1, 2000 was a Saturday.
First, ordinary years:

Then, leap years:

Related posts

Gregorian Calendar and Number Theory

The time it takes for the earth to orbit the sun is not an integer multiple of the time it takes for the earth to rotate on its axis, nor is it a rational number with a small denominator. Why should it be? Much of the complexity of our calendar can be explained by rational approximations to an irrational number.

Rational approximation

The ratio is of course approximately 365. A better approximation is 365.25, but that’s not right either. A still better approximation would be 365.2422.

A slightly less accurate, but more convenient, approximation is 365.2425. Why is that more convenient? Because 0.2425 = 97/400, and 400 is a convenient number to work with.

A calendar based on a year consisting of an average of 365.2425 days would have a 365 days most years, with 97 out of 400 years having 366 days.

In order to spread 97 longer years among the cycle of 400 years, you could insert an extra day every four years, but make three exceptions, such as years that are divisible by 100 but not by 400. That’s the Gregorian calendar that we use.

It’s predecessor, the Julian calendar, had an average year of 365.25 days, which was good enough for a while, but the errors began to accumulate to the point that the seasons were drifting noticeably with respect to the calendar.

Not much room for improvement

It would be possible to create a calendar with an even more accurate average year length, but at the cost of more complexity. Even so, such a calendar wouldn’t be much more accurate. After all, even the number we’ve been trying to approximate, 365.2422 isn’t entirely accurate.

The ratio of the time of the earth’s orbit to the time of its rotation isn’t even entirely constant. The Gregorian calendar is off by about 1 day in 3030 years, but the length of the year varies by about 1 day in 7700 years.

I don’t know how accurately the length of the solar year was known when the Gregorian calendar was designed over four centuries ago. Maybe the error in the calendar was less than the uncertainty in the length of the solar year.

Days of the week

Four centuries of the Gregorian calendar contain 146097 days, which is a multiple of 7. This seems to be a happy coincidence. There was no discussion of weeks in derivation above.

The implicit optimization criteria in the design of the calendar were minimizing the discrepancy between the lengths of the average calendar year and the solar year, minimizing the length of the calendar cycle, and using a cycle length that is a round number. It’s plausible that there was no design goal of making the calendar cycle an integer number of weeks.

Related posts

Golden hospital gowns

Here’s something I posted on X a couple days ago:

There’s no direct connection between AI and cryptocurrency, but they have a similar vibe.

They both leave you wondering whether the emperor is sumptuously clothed, naked, or a mix of both.

Maybe he’s wearing a hospital gown with gold threads.

In case you’re unfamiliar with the story, this is an allusion to The Emperor’s New Clothes, one of the most important stories in literature.

I propose golden hospital gown as a metaphor for things that are a fascinating mixture of good and bad, things that have large numbers of haters and fanboys, both with valid points. There’s continual improvement and a lot work to be done sorting out what works well and what does not.

I tried to get Grok to create an image of what I had in mind by a golden hospital gown. The results were not what I wanted, but passable, which is kinda the point of the comment above. It’s amazing that AI can produce anything remotely resembling a desired image starting from a text description. But there is a very strong temptation to settle for mediocre and vaguely creepy images that aren’t what we really want.

Man in a yellow hospital gown with hairy back and legs exposed

Related posts