Integral representations of means

The average of two numbers, a and b, can be written as the average of x over the interval [a, b]. This is easily verified as follows.

\frac{1}{b-a}\int_a^b x\, dx = \frac{1}{b-a} \left( \frac{b^2}{2} - \frac{a^2}{2}\right) = \frac{a+b}{2}

The average is the arithmetic mean. We can represent other means as above if we generalize the pattern to be

\varphi^{-1}\left(\,\text{average of } \varphi(x) \text{ over } [a, b] \,\right )

For the arithmetic mean, φ(x) = x.

Logarithmic mean

If we set φ(x) = 1/x we have

\left(\frac{1}{b-a} \int_a^b x^{-1}\, dx \right)^{-1} = \left(\frac{\log b - \log a}{b - a} \right)^{-1} = \frac{b - a}{\log b - \log a}

and the last expression is known as the logarithmic mean of a and b.

Geometric mean

If we set φ(x) = 1/x² we have

\left(\frac{1}{b-a} \int_a^b x^{-2}\, dx \right)^{-1/2} = \left(\frac{1}{b-a}\left(\frac{1}{a} - \frac{1}{b} \right )\right)^{-1/2} = \sqrt{ab}

which gives the geometric mean of a and b.

Identric mean

In light of the means above, it’s reasonable ask what happens if we set φ(x) = log x. When we do we get a more arcane mean, known as the identric mean.

The integral representation of the identric mean seems natural, but when we compute the integral we get something that looks arbitrary.

\begin{align*} \exp\left( \frac{1}{b-a} \int_a^b \log x\, dx \right) &= \exp\left( \left.\frac{1}{b-a} (x \log x - x)\right|_a^b \right) \\ &= \exp\left( \frac{b \log b - a \log a - b + a}{b-a} \right) \\ &= \frac{1}{e} \left( \frac{b^b}{a^a} \right)^{1/(b-a)} \end{align*}

The initial expression looks like something that might come up in application. The final expression looks artificial.

Because the latter is more compact, you’re likely to see the identric mean defined by this expression, then later you might see the integral representation. This is unfortunate since the integral representation makes more sense.

Order of means

It is well known that the geometric mean is no greater than the arithmetic mean. The logarithmic and identric means squeeze in between the geometric and arithmetic means.

If we denote the geometric, logarithmic, identric, and arithmetic means of a and b by G, L, I, and A respectively,

G \leq L \leq I \leq A

Related posts

Sierpiński’s inequality

Let An, Gn and Hn be the arithmetic mean, geometric mean, and harmonic mean of a set of n numbers.

When n = 2, the arithmetic mean times the harmonic mean is the geometric mean squared. The proof is simple:

A_2(x, y) H_2(x, y) = \left(\frac{x + y}{2}\right)\left(\frac{2}{\frac{1}{x} + \frac{1}{y}} \right ) = xy = G_2(x,y)^2

When n > 2 we no longer have equality. However, W. Sierpiński, perhaps best known for the Sierpiński’s triangle, proved that an inequality holds for all n. Given

x = (x_1, x_2, \ldots, x_n)

we have the inequality

H_n(x)^{n-1}\, A_n(x) \leq G_n(x)^n \leq A_n(x)^{n-1}\, H_n(x)

Related posts

[1] W. Sierpinski. Sur une inégalité pour la moyenne alrithmétique, géometrique, et harmonique. Warsch. Sitzunsuber, 2 (1909), pp. 354–357.

The Five Safes data privacy framework

Five safes

The Five Safes decision framework was created a couple decades ago by Felix Ritchie at the UK Office for National Statistics. It is a framework for evaluating the safe use of confidential data, particularly by government agencies. You can find a description of the Five Safes, for example, in NIST SP 800-188.

The Five Safes are

  1. Safe projects
  2. Safe people
  3. Safe settings
  4. Safe data
  5. Safe outputs

Safe projects asks whether the use of the data is appropriate. It doesn’t matter how safe the access controls and so forth are if the project itself is inappropriate.

Safe people asks whether the users be trusted to use the data in an appropriate manner. For health care data, for example, one could ask whether users have had HIPAA training.

Safe settings applies to physical access. Does the facility hosting the data limit unauthorised access?

Safe data asks about statistical disclosure control, whether the data itself poses a disclosure risk. For example, have the data been adequately deidentified?

Safe outputs asks whether the output of the project poses a privacy risk.

Various approaches to data privacy have different trade-offs between the Five Safes. Differential privacy focuses on safe outputs. There are mathematical guarantees that the outputs satisfy a certain definition of privacy. The data itself is regarded as unsafe, and so it is important that the people and settings are safe.

HIPAA expert determination focuses on safe data. Often there is a sort of firewall with data considered safe on one side for one set of reasons (patient consent, a BAA contract, etc.) and considered safe on the other side of the wall because the data itself is safe, i.e. properly deidentified.

Safe Harbor is unrelated to the Five Safes. Safe Harbor is a provision under the HIPAA Privacy Rule for deeming certain data safe. Whether the Safe Harbor rules actually result in safe data depends on context. Data may comply with the letter of the law appealing to Safe Harbor, and yet not protect individuals in the data from being identified.

If you would like help evaluating the privacy aspects of a data analysis project, let’s talk.

A curious pattern in January exponential sums

The exponential sum page on this site draws a new image every day based on plugging the month, day, and year into a formula. Some of these images are visually appealing; I’ve had many people ask if they could use the images in publications or on coffee mugs etc.

The images generally look very different from one day to the next. One reason I include the date numbers in the order I do, using the American convention, is that this increases the variety from one day to the next.

If you first became aware of the page on New Year’s Day this year, you might think the page is broken because there was no apparent change between January 1 and January 2. Yesterday’s image was different, but then today, January 4, the image looks just like the images for January 1st and 2nd. They all look like the image below.

The plots produced on each day are distinct, but they are geometrically congruent.

The exponential sum page displays a plot connecting the partial sums of a certain series given here. The axes are turned off and so only the shape of the plot is displayed. If one plot is a translation or dilation of another, the images shown on the page will be the same.

Here’s a plot of the images for January 1, 2, and 4, plotted red, green, and blue respectively.

This shows that the images are not the same, but are apparently translations of each other.

There’s another difference between the images. Connecting consecutive partial sums draws an image clockwise on January 1, but counterclockwise on January 2 and 4. You can see this by clicking on the “animate” link on each page.

 

The Million Dollar Matrix Multiply

The following post is by Wayne Joubert, the newest member of our consulting team. Wayne recently retired from his position as a Senior Computational Scientist at Oak Ridge National Laboratory. — John

Training large language models like GPT-4 costs many millions of dollars in server expenses. These costs are expected to trend to billions of dollars over the next few years [1]. One of the biggest computational expenses of LLM training is multiplying matrices. These are simple operations of the form C = AB. Matrix multiplies are common not only in AI model training but also many high performance computing applications from diverse science domains.

Eking out more speed from matrix multiplies could reduce AI model training costs by millions of dollars. More routinely, such improvements could reduce training runtime by hours on a single GPU-powered workstation or cut down cloud service provider expenses significantly.

What is less well-known is that matrix multiples run on graphics processing units (GPUs) that are typically used for model training have many exotic performance behaviors that can drastically reduce matrix multiply efficiency by a wide margin.

Two recent works [2], [3] examine these phenomena in considerable depth. Factors such as matrix size, alignment of data in memory, power throttling, math library versions, chip-level manufacturing variability, and even the values of the matrix entries can significantly affect performance. At the same time, much of this variability can be modeled by machine learning methods such as decision trees and random forests [2].

Use of these methods can be the first step toward implementing autotuning techniques to minimize costs. Using such methods or carefully applying rules of thumb for performance optimization can make a huge performance difference for matrix multiply-heavy GPU software.

Related posts

[1] What large models cost you—there is no free AI lunch

[2] Wayne Joubert, Eric Palmer and Verónica G. Melesse Vergara, “Matrix Multiply Performance of GPUs on Exascale-class HPE/Cray Systems,” Proceedings of the Cray User Group Meeting (CUG) 2022, https://www.osti.gov/biblio/2224210.

[3] P. Sinha, A. Guliani, R. Jain, B. Tran, M. D. Sinclair and S. Venkataraman, “Not All GPUs Are Created Equal: Characterizing Variability in Large-Scale, Accelerator-Rich Systems,” SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, Dallas, TX, USA, 2022, pp. 01-15, doi: 10.1109/SC41404.2022.00070.

Is there an elliptic curve with 2024 points?

On New Year’s Day I posted about groups of order 2024. Are there elliptic curves of order 2024?

The Hasse-Weil theorem relates the number of points on an elliptic curve over a finite field to the number of elements of the field. Namely, an elliptic curve E over a field with q elements must have cardinality

q + 1 − t

where

|t| ≤ 2√q.

So if there is an elliptic curve with 2024 points, the curve must be over a field with roughly 2024 points.

The condition on t above is necessary for the existence of an elliptic curve of a certain size, but is it sufficient? Sorta.

The order of a finite field must be a prime power, i.e. q = pd for some prime p. There is a theorem ([1], Theorem 13.30) that there exists a curve of the size indicated in the Hasse-Weil theorem if t ≠ 0 mod p. The theorem also lists a couple more sufficient conditions that are more complicated.

So, for example, we could take q = p = 2027 and t = 4.

Now that we know the search isn’t futile, we can search for an elliptic curve over the integers mod 2027 that has 2024 points. After a brief brute force search I found

y² = x³ + 4x + 28

over the field with 2027 elements is such a curve .

Related posts

[1] Henri Coghen and Gerhard Frey. Handbook of Elliptic and Hyperelliptic Curve Cryptography. Chapman & Hall/CRC. 2006.

Computing inverse factorial

I needed the inverse factorial function for my previous post. I was sure I’d written a post on computing the inverse factorial, and intended to reuse the code from that earlier post. But when I searched I couldn’t find anything, so I’m posting the code here for my future reference and for anyone else who may need it.

Given a positive number x, how can you find a number n such that n! = x, or such that n! ≈ x?

Mathematical software tends to work with the gamma function rather than factorials because Γ(x) extends x! to real numbers, with the relation Γ(x+1) = x!. The left side is often taken as a definition of the right side when x is not a positive integer.

Not only do we prefer to work with the gamma function, it’s easier to work with the logarithm of the gamma function to avoid overflow.

The Python code below finds the inverse of the log of the gamma function using the bisection method. This method requires an upper and lower bound. If we only pass in positive values, 0 serves as a lower bound. We can find an upper bound by trying powers of 2 until we get something large enough.

from scipy.special import gammaln
from scipy.optimize import bisect

def inverse_log_gamma(logarg):
    assert(logarg > 0)
    a = 1
    b = 2
    while b < logarg:
        b = b*2
    return bisect(lambda x: gammaln(x) - logarg, a, b)

def inverse_factorial(logarg):
    g = inverse_log_gamma(logarg)
    return round(g)-1 

Note that the inverse_factorial function takes the logarithm of the value whose inverse factorial you want to compute.

For example,

inverse_factorial(log(factorial(42))))

returns 42.

Working on the log scale lets us work with much larger numbers. The factorial of 171 is larger than the largest IEEE double precision number, and so inverse_factorial could not return any value larger than 171 if we passed in x rather than log x. But by taking log x as the argument, we can calculate inverse factorials of numbers so large that the factorial overflows.

For example, suppose we want to find the value of m such that m! is the closest factorial to √(2024!), as we did in the previous post. We can use the code

inverse_factorial(gammaln(2025)/2)

to find m = 1112 even though 1112! is on the order of 102906, far larger than the largest representable floating point number, which is on the order of 10308.

When n! = x does not have an exact solution, there is some choice over what value of n to return as the inverse factorial of x. The code above minimizes the distance from n! to x on a log scale. You might want to modify the code to minimize the distance on a linear scale, or to find the largest n with n! < x, or the smallest n with n! > x depending on your application.

Related posts

Square root factorial

What factorial is closest to the square root of 2024 factorial?

A good guess would be 1012, based on the idea that √(n!) might be near (n/2)!.

This isn’t correct—the actual answer is 1112—but it’s not wildly off.

Could it be that (2n)! is asymptotically (n!)²?

No, Gauss’ duplication formula

\Gamma(2z) = \frac{1}{\sqrt{2\pi}} 2^{2z - 1/2} \Gamma(z) \Gamma\left(z + \frac{1}{2}\right)

shows that the ratio of (2n)! to (n!)² grows exponentially as a function of n.

However, the ratio only grows exponentially, and factorials grow faster than exponentially. I believe that the value of m minimizing

| √(n!) – m! |

asymptotically approaches n/2. In other words, the inverse factorial of  √(n!) approaches n/2. And more generally the inverse factorial of (n!)1/k asymptotically approaches n/k. I haven’t written out a proof, but the plot below shows numerical evidence.

So √(n!) is not asymptotically (n/2)!, but the inverse factorial of √(n!) is asymptotically n/2.

See the next post for a way to compute inverse factorials.

Computing square root floor

Given an integer n, suppose you want to compute ⌊√n⌋, the greatest integer less than or equal to the square root of n. One reason you may want to compute ⌊√n⌋ is to know when you can stop trial division when factoring n. Similarly, ⌊√n⌋ gives you a starting point for Fermat’s factorization algorithm.

With floating point arithmetic

The obvious approach to computing ⌊√n⌋ would be to compute the square root of n as a real number and then round the result down.

Suppose we go down that route. How would we compute √n as real number? The usual way is to use Newton’s square root method. First you make an initial guess at the square root of n, say a. Then repeatedly update your guess to be the average of your previous guess a and n/a:

a ← (a + n/a) /2.

This is an ancient algorithm, going back to at least 1000 BC in Babylon. We call it Newton’s method because it is a special case of Newton’s root-finding method, applied to f(x) = x² − a.

Newton’s square root algorithm is implemented in computer hardware to this day. In some sense, we still compute square roots the same way the ancient Babylonians did.

Without floating point arithmetic

Our original problem was to find an integer, the largest integer no greater than the integer n. Can we take advantage of the fact that we’re working with integers?

A general approach to solving problems over the integers is to solve the same problem over the real numbers, then convert the result to an integer. That’s what we did above, but what if we incorporate the “integerification” into Newton’s algorithm itself so that we never work with real numbers?

This is a reasonable idea, but we don’t know a priori that it will work. Sometimes integer problems can be solved by integer analogs of real algorithms, but sometimes not. Integer optimization problems are harder than real number optimization problems because the optimal integer solution cannot in general be found by first finding the optimal real solution and rounding. And even when this approach works, it may not be the most efficient approach.

It turns out that the integer analog of Newton’s method does work. Bressoud [1] gives pseudocode for an algorithm which I write in Python below.

def sqrt_floor(n):
    a = n
    b = (n + 1) // 2
    while b < a:
        a = b
        b = (a*a + n) // (2*a)
    return a

Note that in Python the // operator carries out integer division, i.e. a // b computes ⌊a/b⌋.

The somewhat opaque line updating b inside the loop is just Newton’s method, averaging a and n/a, rounded down.

This algorithm could run on hardware with no floating point capability, only integer arithmetic. It will also work on extended precision integers where floating point would fail. For example, the following Python code

from math import factorial, sqrt
sqrt(factorial(1000))

produces an error “OverflowError: int too large to convert to float.” But

sqrt_floor(factorial(1000))

works just fine.

Update: FIPS 184-5

After writing this post, I was looking through the NIST FIPS 184-5 Digital Signature Standard (DSS). In Appendix B.4 the standard gives an algorithm for determining whether a large integer is a square. The algorithm in the appendix is essentially the algorithm above. The line that carries out the Newton method update is not explained, but this post explains where this otherwise mysterious line comes from.

[1] David M. Bressoud. Factorization and Primality Testing. Springer, 1989.

Groups of order 2024

This time last year I wrote about groups of order 2023 and now I’d like to do the same for 2024.

There are three Abelian groups of order 2024, and they’re not hard to find.

We can factor

2024 = 8 × 11 × 23

and so the Abelian groups of order 2024 are of the form

G ⊕ ℤ11 ⊕ ℤ23

where G is a group of order 8, and there are three possibilities for G:

  • 8,
  • 4 ⊕ ℤ2, and
  • 2 ⊕ ℤ2 ⊕ ℤ2.

How many non-Abelian groups of order 2024 are there? Conway’s estimate would be a total of 52 groups, Abelian and non-Abelian, but that turns out to be a bit high.

There is a formula for the number of groups of order n, but it only applies to square-free numbers. The number 2024 is divisible by 4, so it’s not square-free.

There are 46 groups of order 2024, so 43 of these are non-Abelian. When n is divisible by a square, finding the number of groups of order n is hard, but the results have been tabulated for small n. I’ve seen a table going up to 2048 and no doubt there are tables that go further.