Falling power analog of binomial theorem

Yesterday I wrote about how the right notation could make Newton’s interpolation theorem much easier to remember, revealing it as an analog of Taylor series. This post will do something similar for the binomial theorem.

Let’s start with the following identity.

(x + y)(x + y - 1)(x + y - 2) =

x(x - 1)(x - 2) + 3x(x - 1)y + 3xy(y - 1) + y(y - 1)(y - 2)

It’s not clear that this is true, or how one might generalize it. But if we rewrite the equation using falling power notation we have

(x + y)^{\underline{3}} = x^{\underline{3}} + 3x^{\underline{2}} y^{\underline{1}} + 3x^{\underline{1}}y^{\underline{2}} + y^{\underline{3}}

which looks a lot like the binomial theorem. In fact it is the case n = 3 of the Chu-Vandermonde theorem which says

(x + y)^{\underline{n}} = \sum_{k=0}^n \binom{n}{k} x^{\underline{n-k}} y^{\underline{k}}

Viewed purely visually, this is the binomial theorem with little lines under each exponent.

Incidentally, the analogous theorem holds for rising powers. Just change all the lines under the exponents to lines on top of the exponents.

Why eliminate trusted third parties?

Suppose one company would like to buy another company’s client list, but only if the lists don’t overlap too much. Neither company wants to hand over their list to the other before a sale takes place. What can they do?

A low-tech solution would be for both parties to provide their client lists to a trusted third party who will report back how much the lists overlap. That may be the best thing to do.

But it is possible to solve this problem without a trusted third party. With homomorphic encryption, the companies can exchange encrypted versions of their client lists that will allow both to calculate the amount of overlap without revealing any further information.

But why go to the effort? Many peer-to-peer technologies raise this question. So you’ve eliminated a third party; what’s so great about that? If you can send someone cryptocurrency, for example, without going through an intermediary like a bank or credit card company, what good is that if the transaction fees are no lower?

It’s often not worth using sophisticated technology to eliminate a trusted third party, but sometimes it is. Here are some reasons the technology might be worth using.

Transaction speed

The two companies hiring a third party to compare their lists have to wait on the third party, and the amount of time they need to wait is beyond their control. Maybe that’s acceptable for a one-time transaction, but not for repeated transactions. With homomorphic encryption, transactions could be automated and the only delay would be computation time.

Reproducibility

Sharing limited information via encryption reduces legal risk. If either party later accuses the other of providing incorrect information, the accused party can demonstrate that the software applied to the data gives the reported result.

Trust

To paraphrase Bob Dylan, you gotta trust somebody. Some technologies are labeled “zero-trust” or “trust no one,” but these terms need to be understood in context. When a company asks you to run a particular piece of software on your proprietary data and share the results, you have to trust that the software is not malicious or buggy.

Instead of trusting that a third party holding your data is honest and competent, you trust that a third party developing software is honest and competent. You have to decide that the software product is trustworthy. You might test the software on some sample data. Maybe you inspect the source code if it’s available. But at some point you have to trust the software and the context it runs in.

Related posts

Discrete Taylor series

Newton’s interpolation formula looks awfully complicated until you introduce the right notation. With the right notation, it looks like a Taylor series. Not only is this notation simpler and more memorable, it also suggests extensions.

The notation we need comes in two parts. First, we need the forward difference operator Δ defined by

\Delta f(x) = f(x+1) - f(x)

and its extension Δk defined by applying Δ to a function k times.

The other piece of notation we need is falling powers, denoted with a little bar under the exponent:

 x^{\underline{k}} = x(x-1)(x-2) \cdots (x - k + 1)

The Δ operator is meant to resemble the differential operator D and falling powers are meant to resemble ordinary powers. With this notation, Newton’s interpolation formula based at a

f(x) = \sum_{k=0}^\infty \frac{\Delta^k f(a)}{k!} (x - a)^{\underline{k}}

looks analogous to Taylor’s formula for a power series based at a

f(x) = \sum_{k=0}^\infty \frac{D^k f(a)}{k!} (x - a)^k.

Newton’s formula applies to polynomials, and the infinite sum is actually a finite sum because Δk f(a) is 0 for all k beyond some point.

Newton’s formula is a discrete analog of Taylor’s formula because it only uses the values of f at discrete points, i.e. at the integers (shifted by a) and only involves finite operations: finite differences do not involve limits.

Convergence

As I mentioned on @AlgebraFact this morning,

It’s often useful to write a finite series as an infinite series with a finite number of non-zero coefficients.

This eliminates the need to explicitly track the number of terms, and may suggest what to do next.

Writing Newton’s formula as an infinite series keeps us from having to write down one version for linear interpolation, another version for quadratic interpolation, another for cubic interpolation, etc. (It’s a good exercise to write out these special cases when you’re new to the topic, but then remember the infinite series version going forward.)

As for suggesting what to do next, it’s natural to explore what happens if the infinite series really is infinite, i.e. if f is not a polynomial. Under what circumstances does the series converge? If it does converge to something, does it necessarily converge to f(x) at each x?

The example f(x) = sin(πx) shows that Newton’s theorem can’t always hold, because for this function, with a = 0, the series on right hand side of Newton’s theorem is identically zero because all the Δk terms are zero. But Carlson’s theorem [1] essentially says that for an entire function that grows slower than sin(πx) along the imaginary axis the series in Newton’s theorem converges to f.

Saying that a function is “entire” means that it is analytic in the entire complex plane. This means that Taylor’s series above has to converge everywhere in order for Newton’s series to converge [2].

Related posts

[1] Carlson with no e. Not to be confused with Carleson’s theorem on the convergence of Fourier series.

[2] Carlson’s original theorem requires f to be entire. Later refinements show that it’s sufficient for f to be analytic in the open right half plane and continuous on the closed right half plane.

Podcast feed

The previous post was an AI-generated podcast that I friend made by crawling my web site. I decided to create an actual podcast for posting occasional audio files. I expect to post very sporadically. I’ve posted two audio files, and I have one more in mind to post some day. Maybe that’ll be the end of it, or maybe I’ll post more.

The first file I posted was the one from the previous post. The second was an interview I did with the late Sir Michael Atiyah.

Here’s the RSS feed for the podcast. You can also find the podcast via my Substack newsletter.

RSA security in light of news

A recent article reported on the successful factoring of a 512-bit RSA key. The process took $8 worth of rented computing power. What does that say about the security of RSA encryption?

The following Python function estimates the computation steps required to factor a number b bits long using the best known algorithm. We’re going to take a ratio of two such estimates, so proportionality constants will cancel.

def f(b):
    logn = b*log(2)
    return exp((8/3)**(2/3) * logn**(1/3) * (log(logn))**(2/3))

The minimum recommended RSA key size is 2048 bits. The cost ratio for factoring a 2048 bit key to a 512 bit key is f(2048) / f(512), which is on the order of 1016. So factoring a 2048-bit key would take 80 quadrillion dollars.

This is sort of a back-of-the-envelope estimate. There things it doesn’t take into account. If a sophisticated and highly determined entity wanted to factor a 2048 number, they could probably do so for less than 80 quadrillion dollars. But it’s safe to say that the people who factored the 512 bit key are unlikely to factor a 2048 bit key by the same approach.

Can AI models reason like a human?

We’re awaiting the release of OpenAI’s o3 model later this month. Its performance is impressive on very hard benchmarks like SWE-bench Verified, Frontier Math and the ARC AGI benchmark (discussed previously in this blog).

And yet at the same time some behaviors of the frontier AI models are very concerning.

Their performance on assorted math exams is outstanding, but they make mistakes doing simple arithmetic, like wrongly multiplying numbers that are a few digits long. Performance of the o1 preview model on the difficult Putnam math exam is excellent but drops precipitously under simple changes like renaming constants and variables in the problem statement.

Similarly, when o1 is applied to a planning benchmark expressed in standardized language, it performs well, but accuracy falls apart when applied to a mathematically equivalent planning problem in a different domain. And also, a given AI model applied to the simple ROT13 cipher can have wildly different performance based on the value of the cipher key, suggesting the models don’t really understand the algorithm.

It was the best of times, it was the worst of times, . . .

What is going on here?

For years now, some have made claims of “human-level performance” for various deep learning algorithms. And as soon as one party starts making claims like this, it’s hard for the others to resist doing the same.

The confusion is that, from a certain point of view, the claim of “human-level” is true—but the definition of “human-level” is fraught.

Here, “human-level” is taken to mean achievement of some high score on a benchmark set, ostensibly exceeding some human performance measure on the same benchmark. However, a single AI model can vary wildly in capability across behaviors—“smart” compared to humans in some ways, “dumb” in others.

For humans, test-taking is a proxy for measuring a range of skills and abilities. And even for humans it is not always an accurate proxy. A person can perform very well on academic tests and very poorly on the job, or vice versa.

And the capability ratios for AI models are very different still, in ways we don’t fully understand. So, outscoring humans on a software engineering benchmark doesn’t mean the AI has the whole panoply of coding skills, decision-making abilities, software architecture design savvy, etc., needed to be a competent software engineer.

It’s no surprise that recent articles (below) show a growing perception of the limitations of AI benchmarks as currently conceived.

Ways forward

Perhaps we should consider developing requirements like the following before claiming human-level reasoning performance of an AI model:

  • It should be able to “explain its work” at any level of detail to another human (just like a human can), in a way that that human can understand.
  • It should be able to give answers without “hallucinating” or “confabulating” (yes, humans can hallucinate too, but most occupations would not be well-served by an employee who hallucinates on the job).
  • It should be able to reliably and consistently (100% of the time) do things that we routinely expect a human or computer to do accurately (like add or multiply two numbers accurately, for things like filling out tax returns or doing engineering calculations to build an airplane).
  • It should be frank and honest in assessing its level of certainty about an answer it gives (no gaslighting).
  • It should be able to solve a trivial perturbation of a given problem with the same ease as the original problem (to the same extent that a human can).
  • As someone has said, it should be able to do, without specific training, what a 5 year old can do without specific training.
  • This one sounds good, from Emmett Shear: “AGI [artificial general intelligence] is the ability to generalize [without special training by a human] to an adversarially chosen new benchmark.”

AI models are fantastic and amazing tools—and best used when one has eyes wide open about their limitations.

Have you had problems with AI model performance? If so, please share in the comments.

References

Rethinking AI benchmarks: A new paper challenges the status quo of evaluating artificial intelligence, https://venturebeat.com/ai/rethinking-ai-benchmarks-a-new-paper-challenges-the-status-quo-of-evaluating-artificial-intelligence/

Rethink reporting of evaluation results in AI, https://www.science.org/doi/10.1126/science.adf6369, https://eprints.whiterose.ac.uk/198211/

Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence, https://arxiv.org/abs/2402.09880

Everyone Is Judging AI by These Tests. But Experts Say They’re Close to Meaningless, https://themarkup.org/artificial-intelligence/2024/07/17/everyone-is-judging-ai-by-these-tests-but-experts-say-theyre-close-to-meaningless

Why we must rethink AI benchmarks, https://bdtechtalks.com/2021/12/06/ai-benchmarks-limitations/

AI and the Everything in the Whole Wide World Benchmark, https://arxiv.org/abs/2111.15366

BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices, https://arxiv.org/abs/2411.12990

Goodhart’s Law states that when a proxy for some value becomes the target of optimization pressure, the proxy will cease to be a good proxy.

Quick change directory

One difficulty when working at a command line is navigating between directories, particularly between locations with long paths. There are several ways to mitigate this. One of the simplest is using cd - to return to the previous directory. Another is to use pushd and popd. Still another is to set the CDPATH variable.

qcd function

This post presents another approach that could be used instead of or in addition to the ones mentioned above. The book Efficient Linux at the Command Line by Daniel Barrett contains a function called qcd (quick change directory) that will cd to any of a list of commonly used directories. The function is essentially a big case statement taking a key and going to a corresponding directory.

qcd () {
 case "$1" in
    work)
      cd $HOME/Work/Projects/Web/src/include
      ;;
    recipes)
      cd $HOME/Family/Cooking/Recipes
      ;;
    …
  esac
  pwd
}

So, for example, qcd work will take you to the directory ~/…/include.

Barrett adds one more line after the definition of the qcd function:

complete -W "work recipes …" qcd

This turns on tab completion for the script using the bash builtin function complete. You use this function implicitly when you use tab completion with shell utilities. You can call it as above to add the same command completion to your own functions. So, for example, with the code above, a user could type

qcd w TAB

instead of cd work.

Improvements

The book says “Store the function in a shell configuration file such as $HOME/.bashrc … source it, and it’s ready to run.” I’d like to make two comments about this.

First, it’s important that qcd is a function and not a script. Scripts run in a subshell, so running a cd command in a script changes your working directory while the script is running. But when the script finishes, the subshell exits, and the working directory is just as it was before running the script.

Second, if you use this function you’ll edit it frequently as you think of new directories to add. I’d rather not put it in my .bashrc file for that reason. Also, maybe I’d like to use it from a bash shell on a Linux box and from zshell on a Mac. So instead of putting the definition of qcd in my .bashrc file, I put it in a file qcd.sh and source that file from my .bashrc file.

When you add a new key and directory to the qcd script, you need to also add the key to the call to complete, otherwise you’ll be in the awkward situation of tab completion working for some directories but not others. It would be possible to write a fancier shell script that would fix this problem.

Generating qcd

My knowledge of shell scripting is minimal, and I’d like to keep it that way. If I need to do something complicated, I don’t do it in a shell script. So I wrote a Python script to generate the qcd.sh file from a dictionary of keys and directories. Someone fluent in shell scripting would find this unnecessarily complicated. To each his own.

By the way, if you’re going to write a Python script, why not just write a Python script and be done with it rather than write a Python script to generate a shell script? For the same reason qcd is a function: cd inside a Python script will only change the working directory while the script is running. There is probably some way around this, but I didn’t want to take the time to figure it out.

Related posts

Converse of RSA

The security of RSA encryption depends on the difficulty of factoring the product of two large primes. If you can factor large numbers efficiently, you can break RSA. But if can break RSA, can you factor large numbers?

Sorta. It’s conceivable that there is a way to break RSA encryption without having to recover the private key. But if you can recover the private key, then you can factor efficiently. That is, if you can start with a public key (N, e), where N is the modulus and e is the encryption key, and recover the private key d, then you can efficiently factor N. See “Fact 1” in [1].

Let n = log2 N. Then the algorithm alluded to above can be run in O(n³) time. But the best known factoring algorithms take more than O(exp(n1/3)) time.

Related posts

[1] Dan Boneh. Twenty Years of Attacks on the RSA Cryptosystem. Available here.

Unicode Steganography

Steganography attempts to prevent messages from being read by unintended recipients by hiding the messages rather than (or in addition to) encrypting them. Steganography is used when you not only want to keep your communication private, you want to hide the fact that you’ve communicated at all.

Fun fact: The words steganography and stegosaurus are related [1].

Famous example

A famous example of steganography was a secret message sent by Jeremiah Denton during the Vietnam War. While a prisoner of war, Denton was forced to participate in a Vietnamese propaganda video. He send the word torture by blinking the Morse code for the letters in the word. You can find the video here.

Clip from Jeremiah Denton propaganda video with Morse code blinking

Famous non-example

Some alleged examples of steganography have turned out to be apophenia, looking for patterns where they do not exist. The book The Woman Who Smashed Codes details Elizebeth Smith’s introduction to cryptography, being tasked to find messages hidden in minor variations in Shakespeare’s handwriting that were not there. The book goes on to describe her cryptographic work during WWII, deciphering messages that most certainly did exist.

Incidentally, Elizebeth Smith [2] married fellow cryptographer William F. Friedman. I wrote about Friedman’s index of coincidence a while back.

Enter Unicode

Randall Monroe said “I am endlessly delighted by the hopeless task that the Unicode Consortium has created for themselves.” One of the things that makes their task delightful and hopeless is trying to distinguish semantics from appearance.

For example, the capital letters  at the beginning of the Roman and Greek alphabets have different Unicode values even though they both look like alike. A (U+0041) is a Roman letter and Α (U+0391) is a Greek letter and so they’re not the same. Also, the Roman letter M (U+004D) is semantically different from the Roman numeral Ⅿ (U+216F) that represents 1,000.

But it quickly becomes impossible to consistently make such distinctions, and so Unicode is full of compromises. Should the letter i and the imaginary unit i have different code points? What about the symbol i for current and the unit basis vector i? You can’t have a different code point for every use of a symbol.

Because Unicode has numerous pairs of characters with identical appearance, it’s possible to hide binary data in Unicode text by using one member of a pair to represent a 0 and the other to represent a 1. So maybe d (U+0064 Latin Small Letter D) represents a 0 and ԁ (U+0501 Cyrillic Small Letter Komi De) represents a 1.

There is a potential problem with this scheme. Unicode does not dictate appearance, and it’s entirely possible a typographer might create a font that has distinct glyphs for characters that are not distinct in other fonts.

Security

Look-alike characters are often used to create malicious URLs. For instance, someone might take “Microsoft.com” and substitute the Roman numeral Ⅿ for the first letter, or substitute a Greek omicron for one of the o‘s.

Text that is expected to ASCII should be turned into ASCII to prevent mistakes or malice, or the user warned. “Do you really want to visit this URL that contains nine Roman letters and one Cyrillic letter?”

When I’m reading, I want fonts with broad Unicode support. No missing symbols, no jarring change in font for foreign words. But when I’m debugging, it would be nice to have the opposite, a xenophobic font that displays non-ASCII characters in some ugly way that makes them jump out. I imagine someone has developed such a font, but it’s hard to find because most people are looking for better Unicode support, not worse.

Related posts

[1] Both derive from the Greek word for ‘cover’. Steganographic writing is covered in the sense of being hidden. A stegosaurus has armored plates that look like roof tiles, i.e. like the covering of a house.

[2] That’s not a typo. She spelled her name with ‘e’ as the fifth letter rather than the more common ‘a’.