Triangles to Triangles

The set of functions of the form

f(z) = (az + b)/(cz + d)

with adbc are called bilinear transformations or Möbius transformations. These functions have three degrees of freedom—there are four parameters, but multiplying all parameters by a constant defines the same function—and so you can uniquely determine such a function by picking three points and specifying where they go.

Here’s an explicit formula for the Möbius transformation that takes z1, z2, and z3 to w1, w2, and w3.

\begin{vmatrix} 1 & z & w & zw \\ 1 & z_1 & w_1 & z_1w_1 \\ 1 & z_2 & w_2 & z_2w_2 \\ 1 & z_3 & w_3 & z_3w_3 \\ \end{vmatrix} = 0

To see that this is correct, or at least possible, note that if you set z = zi and w = wi for some i then two rows of the matrix are equal and so the determinant is zero.

Triangles, lines, and circles

You can pick three points in one complex plane, the z-plane, and three points in another complex plane, the w-plane, and find a Möbius transformation w = f(z) taking the z-plane to the w-plane sending the specified z‘s to the specified w‘s.

If you view the three points as vertices of a triangle, you’re specifying that one triangle gets mapped to another triangle. However, the sides of your triangle may or may not be straight lines.

Möbius transformations map circles and lines to circles and lines, but a circle might become a line or vice versa. So the straight lines of our original triangle may map to straight lines or they may become circular arcs. How can you tell whether the image of a side of a triangle will be straight or curved?

When does a line map to a line?

It’ll be easier if we add a point ∞ to the complex plane and think of lines as infinitely big circles, circles that pass through ∞.

The Möbius transformation (az + b)/(cz + d) takes ∞ to a/c and it takes −d/c to ∞.

The sides of a triangle are line segments. If we look at the entire line, not just the segment, then this line is mapped to a circle. If this line contains the point that gets mapped to ∞ then the image of the line is an infinite circle (i.e. a line). Otherwise the image of the line is a finite circle.

The line between z1 and  z2 can be parameterized by

z1 + t(z2z1)

where t is real. So the image of this line will be a line if and only if

z1 + t(z2z1) = −d/c

for some real t. So solve for t and see whether you get a real number.

Note that if the point that is mapped to ∞ lies inside the line segment, not just on the line, then the image of that side of the triangle is infinitely long.

Examples

To keep things as simple as possible without being trivial, we’ll use the Möbius transformation f(z) = 1/z. Clearly the origin is the point that is mapped to ∞. The side of a triangle is mapped to a straight line if and only if the side is part of a line through the origin.

First let’s look at the triangle with vertices at (1, 1), (1, 4), and (5, 1). None of the sides is on a line that extends to the origin, so all sides map to circular arcs.

Next let’s move the second point from (1, 4) to (4, 4). The line running between (1, 1) and (4, 4) goes through the origin, and so the segment along that line maps to a straight line.

Related posts

Golden ellipse

A golden ellipse is an ellipse whose axes are in golden proportion. That is, the ratio of the major axis length to the minor axis length is the golden ratio φ = (1 + √5)/2.

Draw a golden ellipse and its inscribed and circumscribed circles. In other words draw the largest circle that can fit inside and the smallest circle outside that contains the ellipse.

Then the area of the ellipse equals the area of the annulus bounded by the two circles. That is, the area of the green region

equals the area of the orange region.

The proof is straight forward. Let a be the semimajor axis and b the semiminor axis, with a = φb.

Then the area of the annulus is

π(a² − b²) = πb²(φ² − 1).

The area of the ellipse is

πab = πφb².

The result follows because the golden ratio satisfies

φ² − 1 = φ.

Related posts

Areal coordinates and ellipse area

Barycentric coordinates are sometimes called area coordinates or areal coordinates in the context of triangle geometry. This is because the barycentric coordinates of a point P inside a triangle ABC correspond to areas of the three triangles PBC, PCA and PAB.

(This assumes ABC has unit area. Otherwise divide the area of each of the three triangles by the area of ABC. We will assume for the rest of this post that the triangle ABC has unit area.)

Areal coordinates take three numbers two describe a point in two dimensional space. Why would you do that? It’s often useful to use an overdetermined coordinate system. The benefit of adding one more coordinate is that you get a coordinate system matched to the geometry of the triangle. For example, the vertices of the triangle have coordinates (1, 0, 0), (0, 1, 0), and (0, 0, 1), regardless of the shape of the triangle.

Here is an example of a theorem [1] that is convenient to state in terms of areal coordinates but that would be more complicated in Cartesian coordinates.

First we need to define the midpoint triangle, also called the medial triangle. This is the triangle whose vertices are the midpoints of each side of ABC. In terms of areal coordinates, the vertices of this triangle are (0, ½, ½), (½, 0, ½), and (½, ½, 0).

Now let P be any point inside the midpoint triangle of ABC. Then there is a unique ellipse E inscribed in ABC and centered at P.

Let (α, β, γ) be the areal coordinates of P. Then the area of E is

\pi \sqrt{(1 - 2\alpha)(1 - 2\beta)(1 - 2\gamma)}

Because P is inside the medial triangle, each of the areal coordinates are less than ½ and so the quantity under the square root is positive.

Finding the equation of the inscribed ellipse is a bit complicated, but that’s not necessary in order to find its area.

Related posts

[1] Ross Honsberger. Mathematical Plumbs. 1979

Average number of divisors

Let d(n) be the number of divisors of an integer n. For example, d(12) = 6 because 12 is divisible by 1, 2, 3, 4, 6, and 12.

The function d varies erratically as the following plot shows.

But if you take the running average of d

f(n) = (d(1) + d(2) + d(3) + … + d(n)) / n

then this function is remarkably smoother.

Not only that, the function f(n) is asymptotically equal to log(n).

Computing

Incidentally, directly computing f(n) for n = 1, 2, 3, …, N would be inefficient because most of the work in computing f(m) would be duplicated in computing f(m + 1). The inefficiency isn’t noticeable for small N but matters more as N increases. It would be more efficient to iteratively compute f(n) by

f(n + 1) = (n f(n) + d(n + 1)) / (n + 1).

Asymptotics and citations

Several people have asked for a reference for the results above. I didn’t know of one when I wrote the post, but thanks to reader input I now know that the result is due to Dirichlet. He proved that

f(n) = log(n) + 2γ − 1 + o(1).

Here γ is the Euler-Mascheroni constant.

You can find Dirichlet’s 1849 paper (in German) here. You can also find the result in Tom Apostle’s book Introduction to Analytic Number Theory.

Related posts

Lucas numbers and Lucas pseudoprimes

Lucas numbers [1] are sometimes called the companions to the Fibonacci numbers. This sequence of numbers satisfies the same recurrence relation as the Fibonacci numbers,

Ln+2 = Ln + Ln+1

but with different initial conditions: L0 = 2 and L1 = 1.

Lucas numbers are analogous to Fibonacci numbers in many ways, but are also in some ways complementary. Many sources refer to Lucas numbers as the complement to the Fibonacci numbers, I have not seen a source that explains why they are not simply a complement to the Fibonacci numbers. That is, I have not seen anyone explain why Édouard Lucas chose the particular initial conditions that he did.

Any initial conditions linearly independent from the Fibonacci conditions would produce a sequence complementary to the Fibonacci numbers. Just as a second order linear differential equation has two linearly independent solutions, so a second order linear difference equation has two linearly independent solutions.

Fibonacci and Lucas numbers are analogous to sines and cosines because the former form a basis for solutions to a difference equation and the latter form a basis for solutions to a differential equation. The analogy goes deeper than that [2], but that’s a topic for another post.

As with Fibonacci numbers, the ratio of consecutive Lucas numbers converges to the golden ratio. An interesting property of the Lucas numbers, one with no Fibonacci analogy, is that if n is prime, then

Ln ≡ 1 mod n.

For example, the 7th Lucas number is 29, which is congruent to 1 mod 7. Maybe this property had something to do with how Lucas chose his starting values, or maybe he chose the starting values and discovered this later. If you know the history hear, please let me know.

The converse of the theorem above is false. That is, the condition

Ln ≡ 1 mod n

does not imply that n is prime. Composite numbers satisfying this condition are called Lucas pseudoprimes. The smallest Lucas pseudoprime is 705. The 705th Lucas number

2169133972532938006110694904080729167368737086736963884235248637362562310877666927155150078519441454973795318130267004238028943442676926535761270636

leaves a remainder of 1 when divided by 705.

If φ is the golden ratio, then Ln is the nearest integer to φn for n ≥ 2. Perhaps this motivated Lucas’ definition.

Related posts

[1] The “s” in Lucas is silence because Monsieur Lucas was French.

[2] Barry Lewis, Trigonometry and Fibonacci Numbers, Math. Gazette, 91 (July 2007) pp. 216—226

Identifying hash algorithms

Given a hash value, can you determine what algorithm produced it? Or what algorithm probably produced it?

Obviously if a hash value is 128 bits long, then a 128-bit algorithm produced it. Such a hash value might have been produced by MD5, but not by SHA-1, because the former produces 128-bit hashes and the latter a 160-bit hash.

The more interesting question is this: given an n-bit hash, can you tell which n-bit hash algorithm it probably came from? You will find contradictory answers online. Some people confidently say no, that a hash value is for practical purposes a random selection from its range. Others say yes, and point to software that reports which algorithm the hash probably came from. Which is it?

Some hashing software has a custom output format, such as sticking a colon at a fixed position inside the hash value. Software such as hashid uses regular expressions to spot these custom formats.

But assuming you have a hash value that is simply a number, you cannot know which hash algorithm produced it, other than narrowing it to a list of known algorithms that produce a number of that length. If you could know, this would indicate a flaw in the hashing algorithm.

So, for example, a 160-bit hash value could come from SHA-1, or it could come from RIPEMD-160, Haval-160, Tiger-160, or any other hash function that produces 160-bit output.

To say which algorithm probably produced the hash, you need context, some sort of modeling assumption. In general SHA-1 is the most popular 160-bit hash algorithm, so if you have nothing else to go on, your best guess for the origin of a 160-bit hash value would be SHA-1. But RIPEMD is part of the Bitcoin standard, and so if you find a 160-bit hash value in the context of Bitcoin, it’s more likely to be RIPEMD. There must be contexts in which Haval-160 and Tiger-160 are more likely, though I don’t know what those contexts would be.

Barring telltale formatting, software that tells you which hash functions most likely produced a hash value is simply reporting the most popular hash functions for the given length.

For example, I produced a 160-bit hash of “hello world” using RIMEMD-160

   echo -n "hello world" | openssl dgst -rmd160

then asked hashid where it came from.

    hashid '98c615784ccb5fe5936fbc0cbe9dfdb408d92f0f'
    Analyzing '98c615784ccb5fe5936fbc0cbe9dfdb408d92f0f'
    [+] SHA-1
    [+] Double SHA-1
    [+] RIPEMD-160
    [+] Haval-160
    [+] Tiger-160
    [+] HAS-160
    [+] LinkedIn
    [+] Skein-256(160)
    [+] Skein-512(160)

I got exactly the same output when I hashed “Gran Torino” and “Henry V” because the output doesn’t depend on the hashes per se, only their length.

Whether software can tell you where a hash probably came from depends on your notion of “probably.” If you find a 160-bit hash in the wild, it’s more likely to have come from SHA-1 than RIPEMD. But if you were to write a program to generate random text, hash it with either SHA-1 or RIPEMD, it would likely fail badly.

Related posts

Testing random number generators

Random number generators are subtle. Unless the generator is some physical device, random number generators (RNGs) are usually technically pseudorandom number generators (PRNGs), deterministic algorithms designed to mimic randomness.

Suppose you have a PRNG that produces the digits 0 through 9. How might you test the output to see whether it (acts like it) is random? An obvious test would be to see how often each digit is produced. As the number of samples n increases, you’d expect the frequency of each digit to approach n/10.

Starting with χ²

If your “random” number generator simply cycles through the digits 0 to 9 in order, the frequencies will match expectations. In fact, they will match too well. A two-sided χ² test will catch this problem. The χ² will be too small, indicating a suspiciously even distribution.

Nick Lord [1] gives a construction that has a much more subtle pattern. In his example, the frequencies also converge to the expect values, but the χ² statistic diverges to ∞ as n increases. So rather than producing too small a χ² value, his example produces too large a value. This shows that the χ² test is a stronger test than simply looking at frequencies, but it’s only a start.

RNG testing service

There are more sophisticated tests, and standard suites of tests: DIEHARDER, NIST STS, TestU01, etc. We’ve run these test suites for several clients. More on that here.

It’s curious how this work comes in spurts. We had a run of clients wanting RNG testing, then nothing for a long time, and now we have a new RNG testing project in the queue.

Related posts

[1] A Chi-Square Nightmare. The Mathematical Gazette, Vol. 76, No. 476 (July 1992), p. 274.

Limitations on Venn diagrams

Why do Venn diagrams almost always show the intersections of three sets and not more? Can Venn diagrams be generalized to show all intersections of more sets?

That depends on the rules you give yourself for generalization. If you require that your diagram consist of circles, then three is the limit. As John Venn put it in the original paper on Venn diagrams [1]

Beyond three terms circles fail us, since we cannot draw a fourth circle which shall intersect three others in the way required.

But Mr. Venn noted that you could create what we now call a Venn diagram using four ellipses and included the following illustration.

Venn diagram with four ellipses by John Venn

(It’s confusing that there is an X inside the diagram. Venn meant that to be an asterisk and not the same symbol as the X outside. He says in the paper “Thus the one which is asterisked is instantly seen to be ‘X that is Y and Z, but is not W’.” Maybe someone else, like the publisher, drew the diagram for him.)

So the answer to whether, or how far, it is possible to generalize the classic Venn diagram depends on permissible generalizations of a circle. If you replace circles with arbitrary closed curves then Venn diagrams exist for all orders. If you demand the curves have some sort of symmetry, there are fewer options. It’s possible to make a Venn diagram from five ellipses, and that may be the limit for ellipses.

A Venn diagram is a visualization device, and so an important question is what is the limit for the use of Venn diagrams as an effective visualization technique. This is an aesthetic / pedagogical question, and not one with an objective answer, but in my mind the answer is four. Venn’s diagram made from four ellipses is practical for visualization, though it takes more effort to understand than the typical three-circle diagram.

Although my upper bound of four is admittedly subjective, it may be possible to make it objective post hoc. A Venn diagram made from n curves divides the plane into 2n regions [2]. In order to use more than four curves, you either have to gerrymander the curves or else tolerate some regions being much smaller than others. The former makes the diagram hard to understand, and he latter makes it hard to label the regions.

I suspect that if you make precise what it means for a curve to be simple [3], such as using ellipses or convex symmetric curves, and specify a maximum ratio between the largest and smallest bounded regions, then four curves will be the maximum.

***

[1] John Venn. On the Diagrammatic and Mechanical Representation of Propositions and Reasonings. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science. July 1880.

[2] This includes the outside of the diagram representing the empty set. The diagram shows the intersection of 0, 1, 2, …, n sets, and the intersection of no sets is the empty set. This last statement might seem like an arbitrary definition, but it can be justified using category theory.

[3] Simple in the colloquial sense, which is more restrictive than the technical mathematical sense of a simple curve.

Edit distance

Newspaper editor

I was just talking to a colleague about edit distance because it came up in a project we’re working on.

Technically, we were discussing Levenshtein distance. It sounds more impressive to say Levenshtein distance, but it’s basically how much editing effort it would take to turn one block of text into another.

Edit distance is a fairly simple idea, and very useful. For example, the Morse code practice site LCWO [1] reports the Levenshtein distance between what you transcribe and the correct response. This better matches your intuitive idea of how well you did than some more direct ways of measuring accuracy. For example, transposing two characters is less of an error that typing two unrelated characters.

The LCWO site is unusual in that it explicitly reports Levenshtein distance. It’s far more common for the algorithm to be used behind the scenes and never mentioned by name.

Related posts

[1] Why “LCWO”? It stands for “learn CW online.” In the ham radio community, Morse code is usually called CW for historical reasons. CW stands for “continuous wave.” In principle you’re always transmitting at the same frequency, turning a carrier wave on and off, not modulating it. The details are a bit more interesting, as I write about here.

Birthday problem approximation

The birthday problem is a party trick with serious practical applications. It’s well known to people who have studied probability, but the general public is often amazed by it.

If you have a group of 23 people, there’s a 50-50 chance that at least two people have the same birthday. With a larger group, say 30, its quite likely two birthdays are the same. Not only is this a theoretical result, based on certain modeling assumptions, it actually works in practice, essentially as predicted.

Variations of the birthday problem come up routinely in applications. For example, in cryptography it’s important to know the probability of secure hash collisions. Hash functions are deterministic, but for practical purposes they act random. If you are hashing into a space of N possible hash values, you can expect to compute about √N items before two items have the same hash value.

The square root rule of thumb is very useful. For example, if you’re computing 128-bit hash values, there’s about a 50-50 chance of seeing two duplicate hash values after hashing about 264 items.

The square root heuristic works well for large N, but gives mediocre results for N as small as 365. When applied to the original birthday problem, it predicts even odds for seeing a pair of equal birthdays in a group of 19 people. That’s a little low, but not too far off.

As useful as the square root rule is, it is only good for finding when the probability of duplication is 1/2. What if you’d like to know when the probability of a collision is, say, 0.01?

Let N be the number of possible options and let r be the number of items chosen independently from the set of N options. Let P(N, r) be the probability that all r choices are distinct. Then

P(N, r) ≈ exp( −r²/2N).

This approximation [1] is valid when N is large and r is small relative to N. We could be more precise about the error bounds, but suffice it to say that bigger N is better.

When N = 365 and r = 23, the approximation above computes the probability that all 23 choices are distinct as 0.48, matching the canonical birthday problem and showing an improvement over the square root heuristic.

Related posts

[1] Anthony C. Robin. The Birthday Distribution: Some More Approximations. The Mathematical Gazette, Vol. 68, No. 445 (October, 1984), pp. 204–206