1000 most common words

Last week I wrote about a hypothetical radio station that plays the top 100 songs in some genre, with songs being chosen randomly according to Zipf’s law. The nth most popular song is played with probability proportional to 1/n.

This post is a variation on that post looking at text consisting of the the 1,000 most common words in a language, where word frequencies follow Zipf’s law.

How many words of text would you expect to read until you’ve seen all 1000 words at least once? The math is the same as in the radio station post. The simulation code is the same too: I just changed a parameter from 100 to 1,000.

The result of a thousand simulation runs was an average of 41,246 words with a standard deviation of 8,417.

This has pedagogical implications. Say you were learning a foreign language by studying naturally occurring text with a relatively small vocabulary, such as newspaper articles. You might have to read a lot of text before you’ve seen all of the thousand most common words.

On the one hand, it’s satisfying to read natural text. And it’s good to have the most common words reinforced the most. But it might be more effective to have slightly engineered text, text that has been subtly edited to make sure common words have not been left out. Ideally this would be done with such a light touch that it isn’t noticeable, unlike heavy-handed textbook dialogs.

Miscellaneous mathematical symbols

As longtime readers of this blog have probably noticed, I like to poke around in Unicode occasionally. It’s an endless system of rabbit holes to explore.

This morning I was looking at the Miscellaneous Mathematical Symbols block. These are mostly obscure symbols, though I’m sure for each symbol that I think is obscure, there is someone out there who uses it routinely.

Perpendicular

The only common symbol in this block is ⟂ (U+27C2) for perpendicular. Even so, this symbol is a variation on ⊥ (U+22A5). The distinction is semantic rather than visual: U+22A5 is used for the Boolean value “false.”

In addition to using ⟂ to denote perpendicular lines, some (e.g. Donald Knuth) use the symbol to denote that two integers are relatively prime.

Geometric algebra

The block contains ⟑ (U+27D1) which is used in geometric algebra for the geometric product, a.k.a. the dot-wedge product. The block also contains the symbol for the dual operator ⟇ (U+27c7), the geometric antiproduct. Incidentally, Eric Lengyel’s Projective Geometric Algebra site officially sponsors these two Unicode symbols.

I’m sure these symbols predate Eric Lengyel’s use of them, but I can only recall seeing them used in his work.

Database joins

Unicode has four symbols for database joins. The bowtie symbol ⨝ (U+2A1D) is used for inner (natural) joins is in another block. The Miscellaneous Mathematical Symbols block has three other symbols for outer joins: left, right, and full. I posted a table of these on @CompSciFact this morning.

Angle brackets

The Miscellaneous Mathematical Symbols block also has angle brackets: ⟨ (U+27E8) and ⟩ (U+27E9). These correspond to \langle and \rangle in LaTeX. I’ve used the LaTeX commands, but I wasn’t sure whether I’d ever used the Unicode characters. I searched this blog and found that I did indeed use the characters in my post on the Gram matrix.

More posts on math notation

Erdős-Mordell triangle theorem

If any field of mathematics has been thoroughly combed over, it’s Euclidean geometry. But once in a while someone will come up with a new theorem that seems it should have been discovered centuries ago.

Here’s a theorem conjectured by Paul Erdős in 1935 and proved by Louis Mordell later the same year.

If from a point O inside a given triangle ABC perpendiculars OD, OE, OF are drawn to its sides, then

OA + OB + OC ≥ 2(OD + OE + OF).

Equality holds if and only if triangle ABC is equilateral.

To put it more succinctly,

From any interior point, the distances to the vertices are at least twice the distances to the sides.

Here’s an illustration. In the figure above, the theorem says the dashed blue lines together are more than twice as long as the solid red lines.

In the units I used in drawing the figure above, the blue lines have combined length 9.5 and the red lines have combined length 4.7.

Hojoo Lee gave an elementary proof of the Erdős-Mordell theorem in 2001 that takes about one printed page [1].

In my opinion the Erdős-Mordell theorem feels like a theorem an ancient geometer could have discovered and proved. Here’s a generalization of the theorem that feels much more contemporary [2].

Let Ri be the distance from an interior point O to the ith vertex and ri the distance to the side opposite the ith vertex. Let λ1, λ2, and λ3 be any three positive real numbers. Then

\sum_{i=1}^3 \lambda_iR_i \geq 2\sqrt{\lambda_1\lambda_2\lambda_3} \sum_{i=1}^3 \frac{r_i}{\sqrt{\lambda_i}}

When all the λs equal 1, we get the original Erdős-Mordell theorem.

You could say the weighted distances to the vertices are at least twice the weighted distances to the sides, but you have to say more about the weights, and in general the weights work differently on both sides of the inequality.

[1] Hojoo Lee. Another Proof of the Erdős-Mordell Theorem. Forum Geometricorum, Volume 1 (2001) p. 7–8

[2] Seannie Dar, Shay Gueron. A Weighted Erdős-Mordell Inequality. The American Mathematical Monthly. Vol. 108, No. 2, Feb., 2001

Logarithmic sawtooth

Here’s a strange integral I ran across recently [1].

\int_0^1 \log(x) \bmod 2 = \frac{1}{\tanh(1)}

It’s a little surprising that the integral even exists, and more surprising that its value has a simple expression.

Here’s a plot of the integrand.

The plot doesn’t do justice to all the activity on the left end. There are an infinite number of increasingly vertical segments piled up on the left end as x goes to 0.

If you were to plot the integrand on a log scale, you’d get a sawtooth wave, stretching back to negative infinity.

I would expect the integral to be difficult to evaluate numerically without some special handling, but SciPy’s quod function does a decent job by default.

    from numpy import log, tanh
    from scipy.integrate import quad

    print(quad(lambda x: log(x)%2, 0, 1))
    print(1/tanh(1))

This evaluates the integral as 1.313042… while the exact value is 1.313035…

To the integrator’s credit, it warns of difficulties before producing the result:

IntegrationWarning: The maximum number of subdivisions (50) has been achieved.

If increasing the limit yields no improvement it is advised to analyze the integrand in order to determine the difficulties. If the position of a local difficulty can be determined (singularity, discontinuity) one will probably gain from splitting up the interval and calling the integrator on the subranges. Perhaps a special-purpose integrator should be used.

Related posts

[1] Matthew A. Niemiro. The Natural Logarithm Modulo 2. The American Mathematical Monthly, Vol. 128, No. 2 (FEBRUARY 2021), p. 177.

When Benford’s law is exact

Eight years ago I wrote a blog post on the Leading digits of powers of 2. In that post I said that Benford’s law gives remarkably good predictions for the distribution of leading digits of 2n. In this post I’ll make that statement stronger.

A sequence obeys Benford’s law, in base 10, if the proportion of the first N elements of the sequence that begin with the digit d is asymptotically

N log10(1 + 1/d).

Benford’s law applies to powers of 2, but also to factorials, for example. In general it applies only in the limit as N goes to infinity. For a large but finite value of N it gives a good approximation, but there’s no guarantee on how good the approximation is.  Except sometimes there is.

You could use Benford’s law to estimate how many times n! begins with a 1, but it can tell you exactly how many times 2n begins with a 1. The exact number is the estimate given by Benford’s law with d = 1, rounded down to the nearest integers, i.e. the floor of the expression above.

N log10(2)⌋

See [1] for a proof. The following Python code demonstrates this.

    import decimal
    from math import log10, floor

    def exact(N):
        c = 0
        a = decimal.Decimal(1)
        for i in range(1, N+1):
            a = a*2
            if int(str(a)[0]) == 1:
                c += 1
        return c

    def estimated(N):
        return N*log10(2)

    for i in range(1, 10000):
        assert(exact(i) == int(floor(estimated(i))))

The code uses the decimal class for efficiency. See Andrew Dalke’s comment on the original post. [2]

Benford’s law predicts exactly how often the sequence 2n begins with a 1. It also predicts exactly how often it begins with 4, but it is not exact for other digits.

Related posts

[1] Zhaodong Cai, Matthew Faust, A. J. Hildebrand, Junxian Li, Yuan Zhang. The Surprising Accuracy of Benford’s Law in Mathematics. The American Mathematical Monthly, Vol. 127, No. 3 (MARCH 2020), pp. 217–237

[2] The function exact is efficient for individual values of N, but there is a lot of redundant calculation when you use it to test a range of N‘s as we do in the loop with the assert statement. It wouldn’t be hard to rewrite the code to have it more efficiently check a range.

Lots of flat sides

The exponential sum of the day page draws a new figure each day by plotting the partial sums of

\sum_{n=0}^N \exp\left( 2\pi i \left( \frac{n}{m} + \frac{n^2}{d} + \frac{n^3}{y} \right ) \right )

and drawing a line between consecutive terms. Here md, and y are the month, day, and last two digits of the year.

Today’s sum has an unusual number of lines that are parallel to the horizontal and vertical axes.

Typically the images on the exponential sum of the day page are much more complicated, except around the first of the month.

Horizontal lines occur when consecutive partial sums differ by a real number, i.e. when the latest term in the sum is real. Similarly, vertical lines occur when the latest term is purely imaginary. This happens when n is a multiple of 10.

Also, nearly horizontal or vertical lines occur when the latest term is nearly real or purely imaginary. This happens a few times as well.

Top 100 radio

Suppose you want to hear the top 100 songs in some music genre, so you listen to a radio station that specializes in that kind of music. How long would it take to hear each of the songs at least once?

This is a variation on the coupon collector problem. If a radio station plays the top N songs in some genre, selected at random, the expected number of songs you would need to listen to before hearing each one at least once is

N HN

where HN is the Nth harmonic number. For large N this is approximately equal to

N(log N + γ)

where γ is the Euler-Mascheroni constant. When N = 100, this equals 518.

But radio stations do not choose songs at random. Even a station that only plays the 100 most popular songs will presumably play the most popular songs more often than the less popular ones. They probably also have some rules about waiting a while before playing the same song again, but we’ll ignore that for this post.

Billboard for radio station ZIPF

Suppose we’re listening to ZIPF, a radio station that plays the top 100 songs according to Zipf’s law: the probability of playing the nth most popular song is proportional to 1/n. How long before you’ve heard all 100 songs?

We know it’ll take more than 518 songs on average. The unequal frequency of the songs will cause us to wait longer to hear the least popular song. The 10oth most popular song, for example, will only play with probability 1/518 rather than probability 1/100.

Does the number 518 look familiar? It’s exactly the same number as above:

100 H100

That’s because the probability of hearing the 100th most popular song is proportional to 1/100, and the proportionality constant is the reciprocal of

1 + 1/2 + 1/3 + … 1/100 = H100 = 5.1873…

So in general the expected number of draws from a set of N items until you’ve seen everything at least once equals the probability of selected the least likely item when the items have draw probabilities according to Zipf’s law.

I don’t know how hard it would be to work out the exact probability for our problem of hearing all 100 songs on ZIPF, but we could estimate the probability by simulation.

When I ran it, the average of 1000 simulation reps was 1900 with a standard deviation of 595.

So it would take roughly 20x longer to hear every song selected according to Zipf’s law than to listen to the list.

Related posts

Estimating satellite altitude from apparent motion

If you see a light moving in the sky, how could you tell whether it’s an airplane or a satellite? If it is a satellite, could you tell how high of an orbit it’s in?

For an object in a circular orbit,

v² = μ/r

where r is the orbit radius and μ is the standard gravitational parameter. For a satellite orbiting the earth,

μ = 5.1658 × 1012  km³/hr².

The orbit radius r is the earth’s radius R plus the altitude a above the earth’s surface.

rRa

where the mean radius of the earth is R = 6,371 km.

Angular velocity is

vr = √(μ/r³)

This velocity is relative to an observer hovering far above the earth. An observer on the surface of the earth would need to subtract the earth’s angular velocity to get the apparent velocity [1].

from numpy import pi, sqrt

def angular_v(altitude): # in radians/hour
    R = 6371 # mean earth radius in km
    r = altitude + R
    mu = 5.1658e12 # km^3/hr^2
    return sqrt(mu/r**3)

def apparent_angular_v(altitude):
    earth_angular_v = 2*pi/(23 + 56/60 + 4/3600)
    return angular_v(altitude) - earth_angular_v

Here’s a plot of apparent angular velocity for altitudes between 350 km (initial orbit of a Starlink satellite) and geostationary orbit (35,786 km).

Radians per hour is a little hard to conceptualize; degrees per minute would be easier. But fortunately, the two are nearly the same. One radian per hour is 3/π = 0.955 degrees per minute.

The apparent size of the moon is about 0.5°, so you could estimate angular velocity by seeing how long it takes a satellite to cross the moon (or a region of space the width of the moon). It would only take an object in low earth orbit (LEO) a few seconds to cross the moon.

Can you see objects in medium earth orbit (MEO) or high earth orbit (HEO)? Yes. Although most satellites are in LEO, objects in higher orbits may be larger, and if they’re reflective enough they may be visible.

Airplanes move much slower than LEO satellites. An airliner at altitude 10 km moving 1000 km/hr would have an apparent angular velocity of 0.16 radians per hour. An object appearing to move that slowly is either an airplane or an object in HEO, and very likely the former.

[1] The earth completes a full rotation (2π radians) in 23 hours 56 minutes and 4 seconds, so its angular velocity is 0.2625 radians per hour. Why not 24 hours? That’s the length of a rotation of the earth relative to the sun. Since the earth moves in its orbit relative to the sun while it rotates, it has to rotate a little longer (i.e. for about 4 more minutes) to reach the same position relative to the sun.

Linear KdV dispersion

The Korteweg–De Vries (KdV) equation

u_t - 6 u\, u_x + u_{xxx} = 0

is a nonlinear PDE used to model shallow water waves. The linear counterpart omits the nonlinear term in the middle.

u_t + u_{xxx} = 0

This variant is useful in itself, but also for understanding the nonlinear KdV equation.

Solitons

Solutions to the linear KdV equation spread out over time. The nonlinear term in the KdV equation counterbalances the dispersion term uxxx so that solutions to the nonlinear PDE behave in some ways like linear solutions.

Solutions to a nonlinear equation don’t add, and yet they sorta add. Here’s a description from [1].

At the heart of these observations is the discovery that these nonlinear waves can interact strongly and then continue thereafter almost as if there had been no interaction at all. This persistence of the wave led Zabusky and Kruskal to coin the term ‘soliton’ (after photon, proton, etc.) to emphasize the particle-like character of these waves which retain their identities in a collision.

I added the emphasis on almost because many descriptions leave out this important qualifier.

Solution to linear KdV

There is a compact expression for the solution to the linear KdP equation if u, ux , and uxx all go to 0 as |x| → ∞. If u(x, 0) = f(x), then the solution is

u(x, t) = (3t)^{-1/3} \int_{-\infty}^\infty f(y) \text{ Ai}\!\left( \frac{x-y}{(3t)^{1/3}} \right) \,dy

Here Ai(z) is the Airy function. This function has come up several times here. For example, I’ve written about the Airy function in the context of triple factorials and in connection with Bessel functions.

Aside on third order equations

Third order differential equations are uncommon. Third order linear ODEs are quite rare. Third order differential equations are usually nonlinear PDEs, like the KdV equation. The linear KdV is an example of a linear third order PDE that arises in applications.

 

[1] P. G. Drazin and R. S. Johnson. Solitons: an introduction. Cambridge University Press. 1989.

Closed-form minimal surface solutions

Differential equations, especially nonlinear differential equations, rarely have a closed-form solution, but sometimes it happens. As I wrote about a year ago

It is unusual for a nonlinear PDE to have a closed-form solution, but it is not unheard of. There are numerous examples of nonlinear PDEs, equations with important physical applications, that have closed-form solutions.

This post will present some closed-form solutions of the minimal surface equation

(1 + |ux|²) uyy − 2 ux uy uxy + (1 + |uy|²) uxx = 0

One trivial class of closed-form solutions are planes.

u(xy) = axbyc.

There are three non-trivial classes of solutions as far as I know. Jean Baptiste Marie Meusnier discovered two of these in 1776, namely the helicoild

u(x, y) = tan−1(y/x)

and the catenoid

u(x, y) = cosh−1(a (x² + y²)½) / a

Heinrich Scherk discovered another closed form solution in 1830:

u(x, y) = log( cos(ay) / cos(ax) ) / a

Here’s a plot.

The surface formed by the graph of the solution is known as Scherk’s surface. You could image that if the edges of this surface were made of wire and the wire was dipped in soapy wanter, it would form a bubble like Sherk’s surface.

Note that the closed-form solutions satisfy the minimal surface PDE itself, but do not satisfy any given boundary conditions, unless the boundary values you’d like to specify happen to be exactly the values this function has.