Entering Russian characters in Vim with digraphs

The purpose of this post is to expand on the following sentence [1]:

Russian letters are created by entering [Ctrl-k followed by] a corresponding Latin letter followed by an equals sign -, or, in a few places, a percent sign %.

The Russian alphabet has 33 letters, so there couldn’t be a Latin letter for every Russian letter. Also, there are Latin letters that don’t have a Russian counterpart and vice versa. So the mapping can’t be simple. But still, the above summary is nice: try control-k followed by the English analog and the equal sign. If that doesn’t work, try a percent sign instead.

Which Latin letters does Vim chose as corresponding to Russian letters? Does it go by sound or appearance? For example, the Russian letter Н looks like a Latin H but it sounds like a Latin N. Vim goes by sound. You would enter the Russian letter Н by typing Ctrl-k N =.

For full details, see the Vim documentation :h digraph-table. I give a simplified excerpt from the documentation below. I just look at capital letters because the lower case letters are analogous. All the official Unicode names begin with CYRILLIC CAPITAL LETTER and so I cut that part out.

char    digraph hex     official name 
А       A=      0410    A
Б       B=      0411    BE
В       V=      0412    VE
Г       G=      0413    GHE
Д       D=      0414    DE
Е       E=      0415    IE
Ё       IO      0401    IO
Ж       Z%      0416    ZHE
З       Z=      0417    ZE
И       I=      0418    I
Й       J=      0419    SHORT I
К       K=      041A    KA
Л       L=      041B    EL
М       M=      041C    EM
Н       N=      041D    EN
О       O=      041E    O
П       P=      041F    PE
Р       R=      0420    ER
С       S=      0421    ES
Т       T=      0422    TE
У       U=      0423    U
Ф       F=      0424    EF
Х       H=      0425    HA
Ц       C=      0426    TSE
Ч       C%      0427    CHE
Ш       S%      0428    SHA
Щ       Sc      0429    SHCHA
Ъ       ="      042A    HARD SIGN
Ы       Y=      042B    YERU
Ь       %"      042C    SOFT SIGN
Э       JE      042D    E
Ю       JU      042E    YU
Я       JA      042F    YA

Note that the end of the alphabet is more complicated than simply using a Latin letter and either an equal or percent sign. Also, the table is in alphabetical order, which doesn’t quite correspond to Unicode numerical order because of a quirk with the letter Ё (U+0401) explained here.

[1] Arnold Robbins and Elbert Hannah. Learning the vi & Vim Editors, 8th edition

Why eliminate trusted third parties?

Suppose one company would like to buy another company’s client list, but only if the lists don’t overlap too much. Neither company wants to hand over their list to the other before a sale takes place. What can they do?

A low-tech solution would be for both parties to provide their client lists to a trusted third party who will report back how much the lists overlap. That may be the best thing to do.

But it is possible to solve this problem without a trusted third party. With homomorphic encryption, the companies can exchange encrypted versions of their client lists that will allow both to calculate the amount of overlap without revealing any further information.

But why go to the effort? Many peer-to-peer technologies raise this question. So you’ve eliminated a third party; what’s so great about that? If you can send someone cryptocurrency, for example, without going through an intermediary like a bank or credit card company, what good is that if the transaction fees are no lower?

It’s often not worth using sophisticated technology to eliminate a trusted third party, but sometimes it is. Here are some reasons the technology might be worth using.

Transaction speed

The two companies hiring a third party to compare their lists have to wait on the third party, and the amount of time they need to wait is beyond their control. Maybe that’s acceptable for a one-time transaction, but not for repeated transactions. With homomorphic encryption, transactions could be automated and the only delay would be computation time.

Reproducibility

Sharing limited information via encryption reduces legal risk. If either party later accuses the other of providing incorrect information, the accused party can demonstrate that the software applied to the data gives the reported result.

Trust

To paraphrase Bob Dylan, you gotta trust somebody. Some technologies are labeled “zero-trust” or “trust no one,” but these terms need to be understood in context. When a company asks you to run a particular piece of software on your proprietary data and share the results, you have to trust that the software is not malicious or buggy.

Instead of trusting that a third party holding your data is honest and competent, you trust that a third party developing software is honest and competent. You have to decide that the software product is trustworthy. You might test the software on some sample data. Maybe you inspect the source code if it’s available. But at some point you have to trust the software and the context it runs in.

Related posts

RSA security in light of news

A recent article reported on the successful factoring of a 512-bit RSA key. The process took $8 worth of rented computing power. What does that say about the security of RSA encryption?

The following Python function estimates the computation steps required to factor a number b bits long using the best known algorithm. We’re going to take a ratio of two such estimates, so proportionality constants will cancel.

def f(b):
    logn = b*log(2)
    return exp((8/3)**(2/3) * logn**(1/3) * (log(logn))**(2/3))

The minimum recommended RSA key size is 2048 bits. The cost ratio for factoring a 2048 bit key to a 512 bit key is f(2048) / f(512), which is on the order of 1016. So factoring a 2048-bit key would take 80 quadrillion dollars.

This is sort of a back-of-the-envelope estimate. There things it doesn’t take into account. If a sophisticated and highly determined entity wanted to factor a 2048 number, they could probably do so for less than 80 quadrillion dollars. But it’s safe to say that the people who factored the 512 bit key are unlikely to factor a 2048 bit key by the same approach.

Can AI models reason like a human?

We’re awaiting the release of OpenAI’s o3 model later this month. Its performance is impressive on very hard benchmarks like SWE-bench Verified, Frontier Math and the ARC AGI benchmark (discussed previously in this blog).

And yet at the same time some behaviors of the frontier AI models are very concerning.

Their performance on assorted math exams is outstanding, but they make mistakes doing simple arithmetic, like wrongly multiplying numbers that are a few digits long. Performance of the o1 preview model on the difficult Putnam math exam is excellent but drops precipitously under simple changes like renaming constants and variables in the problem statement.

Similarly, when o1 is applied to a planning benchmark expressed in standardized language, it performs well, but accuracy falls apart when applied to a mathematically equivalent planning problem in a different domain. And also, a given AI model applied to the simple ROT13 cipher can have wildly different performance based on the value of the cipher key, suggesting the models don’t really understand the algorithm.

It was the best of times, it was the worst of times, . . .

What is going on here?

For years now, some have made claims of “human-level performance” for various deep learning algorithms. And as soon as one party starts making claims like this, it’s hard for the others to resist doing the same.

The confusion is that, from a certain point of view, the claim of “human-level” is true—but the definition of “human-level” is fraught.

Here, “human-level” is taken to mean achievement of some high score on a benchmark set, ostensibly exceeding some human performance measure on the same benchmark. However, a single AI model can vary wildly in capability across behaviors—“smart” compared to humans in some ways, “dumb” in others.

For humans, test-taking is a proxy for measuring a range of skills and abilities. And even for humans it is not always an accurate proxy. A person can perform very well on academic tests and very poorly on the job, or vice versa.

And the capability ratios for AI models are very different still, in ways we don’t fully understand. So, outscoring humans on a software engineering benchmark doesn’t mean the AI has the whole panoply of coding skills, decision-making abilities, software architecture design savvy, etc., needed to be a competent software engineer.

It’s no surprise that recent articles (below) show a growing perception of the limitations of AI benchmarks as currently conceived.

Ways forward

Perhaps we should consider developing requirements like the following before claiming human-level reasoning performance of an AI model:

  • It should be able to “explain its work” at any level of detail to another human (just like a human can), in a way that that human can understand.
  • It should be able to give answers without “hallucinating” or “confabulating” (yes, humans can hallucinate too, but most occupations would not be well-served by an employee who hallucinates on the job).
  • It should be able to reliably and consistently (100% of the time) do things that we routinely expect a human or computer to do accurately (like add or multiply two numbers accurately, for things like filling out tax returns or doing engineering calculations to build an airplane).
  • It should be frank and honest in assessing its level of certainty about an answer it gives (no gaslighting).
  • It should be able to solve a trivial perturbation of a given problem with the same ease as the original problem (to the same extent that a human can).
  • As someone has said, it should be able to do, without specific training, what a 5 year old can do without specific training.
  • This one sounds good, from Emmett Shear: “AGI [artificial general intelligence] is the ability to generalize [without special training by a human] to an adversarially chosen new benchmark.”

AI models are fantastic and amazing tools—and best used when one has eyes wide open about their limitations.

Have you had problems with AI model performance? If so, please share in the comments.

References

Rethinking AI benchmarks: A new paper challenges the status quo of evaluating artificial intelligence, https://venturebeat.com/ai/rethinking-ai-benchmarks-a-new-paper-challenges-the-status-quo-of-evaluating-artificial-intelligence/

Rethink reporting of evaluation results in AI, https://www.science.org/doi/10.1126/science.adf6369, https://eprints.whiterose.ac.uk/198211/

Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence, https://arxiv.org/abs/2402.09880

Everyone Is Judging AI by These Tests. But Experts Say They’re Close to Meaningless, https://themarkup.org/artificial-intelligence/2024/07/17/everyone-is-judging-ai-by-these-tests-but-experts-say-theyre-close-to-meaningless

Why we must rethink AI benchmarks, https://bdtechtalks.com/2021/12/06/ai-benchmarks-limitations/

AI and the Everything in the Whole Wide World Benchmark, https://arxiv.org/abs/2111.15366

BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices, https://arxiv.org/abs/2411.12990

Goodhart’s Law states that when a proxy for some value becomes the target of optimization pressure, the proxy will cease to be a good proxy.

Quick change directory

One difficulty when working at a command line is navigating between directories, particularly between locations with long paths. There are several ways to mitigate this. One of the simplest is using cd - to return to the previous directory. Another is to use pushd and popd. Still another is to set the CDPATH variable.

qcd function

This post presents another approach that could be used instead of or in addition to the ones mentioned above. The book Efficient Linux at the Command Line by Daniel Barrett contains a function called qcd (quick change directory) that will cd to any of a list of commonly used directories. The function is essentially a big case statement taking a key and going to a corresponding directory.

qcd () {
 case "$1" in
    work)
      cd $HOME/Work/Projects/Web/src/include
      ;;
    recipes)
      cd $HOME/Family/Cooking/Recipes
      ;;
    …
  esac
  pwd
}

So, for example, qcd work will take you to the directory ~/…/include.

Barrett adds one more line after the definition of the qcd function:

complete -W "work recipes …" qcd

This turns on tab completion for the script using the bash builtin function complete. You use this function implicitly when you use tab completion with shell utilities. You can call it as above to add the same command completion to your own functions. So, for example, with the code above, a user could type

qcd w TAB

instead of cd work.

Improvements

The book says “Store the function in a shell configuration file such as $HOME/.bashrc … source it, and it’s ready to run.” I’d like to make two comments about this.

First, it’s important that qcd is a function and not a script. Scripts run in a subshell, so running a cd command in a script changes your working directory while the script is running. But when the script finishes, the subshell exits, and the working directory is just as it was before running the script.

Second, if you use this function you’ll edit it frequently as you think of new directories to add. I’d rather not put it in my .bashrc file for that reason. Also, maybe I’d like to use it from a bash shell on a Linux box and from zshell on a Mac. So instead of putting the definition of qcd in my .bashrc file, I put it in a file qcd.sh and source that file from my .bashrc file.

When you add a new key and directory to the qcd script, you need to also add the key to the call to complete, otherwise you’ll be in the awkward situation of tab completion working for some directories but not others. It would be possible to write a fancier shell script that would fix this problem.

Generating qcd

My knowledge of shell scripting is minimal, and I’d like to keep it that way. If I need to do something complicated, I don’t do it in a shell script. So I wrote a Python script to generate the qcd.sh file from a dictionary of keys and directories. Someone fluent in shell scripting would find this unnecessarily complicated. To each his own.

By the way, if you’re going to write a Python script, why not just write a Python script and be done with it rather than write a Python script to generate a shell script? For the same reason qcd is a function: cd inside a Python script will only change the working directory while the script is running. There is probably some way around this, but I didn’t want to take the time to figure it out.

Related posts

Converse of RSA

The security of RSA encryption depends on the difficulty of factoring the product of two large primes. If you can factor large numbers efficiently, you can break RSA. But if can break RSA, can you factor large numbers?

Sorta. It’s conceivable that there is a way to break RSA encryption without having to recover the private key. But if you can recover the private key, then you can factor efficiently. That is, if you can start with a public key (N, e), where N is the modulus and e is the encryption key, and recover the private key d, then you can efficiently factor N. See “Fact 1” in [1].

Let n = log2 N. Then the algorithm alluded to above can be run in O(n³) time. But the best known factoring algorithms take more than O(exp(n1/3)) time.

Related posts

[1] Dan Boneh. Twenty Years of Attacks on the RSA Cryptosystem. Available here.

Details of generating primes for cryptography

RSA public key cryptography begins by finding a couple large primes. You essentially do this by testing random numbers until you find primes, but not quite.

Filippo Valsorda just posted a good article on this.

Suppose you’re looking for a 1024-bit prime number. You generate random 1024-bit numbers and then test until you find one that’s prime. You can immediately make this process twice as fast by setting the last bit to 1: it would be wasteful to generate a new number every time you happened to draw an even number.

A little less obvious is that it’s common to set the top bit as well. When you’re generating a number between 0 and 21024 − 1, it’s possible that you could generate a small number. It’s possible that you generate a very small number, like 59, but extremely unlikely. But it’s not so unlikely that you’d generate a number on the order of 21020, for example. By setting the top bit, you know you’re generating a number between 21023 and 21024.

Most composite numbers have small factors, so you check for divisibility by 3, 5, 7 etc. before running more time-consuming tests. Probabilistic tests are far more efficient than deterministic tests, so in practice everyone uses probable primes in RSA. For details of how you apply these tests, and how many tests to run, see Filippo Valsorda’s article.

Should you be concerned about setting the top bit of prime candidates? There are naive and sophisticated reasons not work worry, and an intermediate reason to at least think about it.

The naive response is that you’re just losing one bit of randomness. How much could that hurt? But in other contexts, such as losing one bit of randomness in an AES key, people do worry about such losses.

The bits in the prime factors of an RSA modulus do not correspond directly to bits of security. A 2048-bit modulus, the product of two 1024-bit primes, has about 112 bits of security. (See NIST SP 800-57.) You could set several bits in your prime factors before losing a bit of security. If this bothers you, move up to using a 3072-bit modulus rather than worrying that you 2048-bit modulus is in a sense a 2046-bit modulus.

More cryptography posts

Unix Time and a Modest Proposal

The time it takes earth to orbit the sun is not a simple multiple of the time it takes earth to rotate on its axis. The ratio isn’t even constant. The time it takes earth to circle the sun wobbles a bit, and the rotation of the earth is slowing down slightly.

The ratio is around 365.2422. Calling it 365 is too crude. Julius Caesar said we should call it 365 1/4, and that was good enough for a while. Then Pope Gregory said we really should use 365 97/400, and that’s basically good enough, but not quite. More on that here.

Leap seconds

In 1972 we started adding leap seconds in order to synchronize the day and the year more precisely. Unlike leap days, leap seconds don’t occur on a strict schedule. A leap second is inserted when a committee of astronomers decides one should be inserted, about every two years.

An international standards body has decided to stop adding leap seconds by 2035. They cause so confusion that it was decide that letting the year drift by a few seconds was preferable.

Unix time

Unix time is the number of seconds since the “epoch,” i.e. January 1, 1970, sorta.

If you were to naively calculate Unix time for this coming New Year’s Day, you’d get the right result.

New Year’s Day 2025

When New Year’s Day 2025 begins in England, the Unix time will be

(55 × 365 + 14) × 24 × 60 × 60 = 1735689600

This because there are 55 years between 1970 and 2025, 14 of which were leap years.

However, that moment will be 1735689627 seconds after the epoch.

Non-leap seconds

Unix time is the number of non-leap seconds since 1970-01-01 00:00:00 UTC. There have been 27 leap seconds since 1970, and so Unix time is 27 seconds behind elapsed time.

Leap year analogy

You could think of a second in Unix time as being 1/86400 th of a day. Every day has 86400 non-leap seconds, but some days have had 86401 seconds. A leap second could potentially be negative, though this has not happened yet. A day containing a negative leap second would have 86399 seconds.

The analogous situation for days would be to insist that every year have 365 days. February 28 would be the 59th day of the year, and March 1 would be the 60th day, even in a leap year.

International Atomic Time

What if you wanted a time system based on the actual number of elapsed seconds since the epoch? This is almost what International Atomic Time is.

International Atomic Time (abbreviated TAI, from the French temps atomique international) is ahead of UTC [1] by 37 seconds, not 27 seconds as you might expect. Although there have been 27 leap seconds since 1972, TAI dates back to 1958.

So New Year’s Day will start in England at 2025-01-01 00:00:37 TAI.

A Modest Proposal

It seems our choices are to add leap seconds and endure the resulting confusion, or not add leap seconds and allow the year to drift with respect to the day. There is a third way: adjust the position of the earth periodically to keep the solar year equal to an average Gregorian calendar day. I believe this modest proposal [2] deserves consideration.

Kepler’s law says the square of a planet’s orbital period is proportional to the cube of the semi-major axis of its orbit. This means that increasing earth’s orbital period by 1 second would only require moving earth 3.16 km further from the sun.

***

[1] UTC stands for Universal Coordinated Time. From an earlier post,

The abbreviation UTC is an odd compromise. The French wanted to use the abbreviation TUC (temps universel coordonné) and the English wanted to use CUT (coordinated universal time). The compromise was UTC, which doesn’t actually abbreviate anything.

[2] In case you’re not familiar with the term “modest proposal,” it comes from the title of a satirical essay by Jonathan Swift. A modest proposal has come to mean an absurdly simple solution to a complex problem presented satirically.

Can AI models reason: Just a stochastic parrot?

OpenAI has just released its full o1 model—a new kind of model that is more capable of multi-step reasoning than previous models. Anthropic, Google and others are no doubt working on similar products. At the same time, it’s hotly debated in many quarters whether AI models actually “reason” in a way similar to humans.

Emily Bender and her colleagues famously described large language models as nothing more than “stochastic parrots“—systems that simply repeat their training data blindly, based on a statistical model, with no real understanding (reminiscent of the Chinese Room experiment). Others have made similar comments, describing LLMs as “n-gram models on steroids” or a “fancy extrapolation algorithm.

There is of course some truth to this. AI models sometimes generate remarkable results and yet lack certain basic aspects of understanding that might inhibit their sometimes generation of nonsensical results. More to the point of “parroting” the training data, recent work from Yejin Choi’s group has shown how LLMs at times will cut and paste snippets from its various training documents, almost verbatim, to formulate its outputs.

Are LLMs (just) glorified information retrieval tools?

The implication of these concerns is that an LLM can “only” repeat back what it was taught (albeit with errors). However this view does not align with the evidence. LLM training is a compression process in which new connections between pieces of information are formed that were not present in the original data. This is evidenced both mathematically and anecdotally. In my own experience, I’ve gotten valid answers to such obscure and detailed technical question that it is hard for me to believe would exist in any training data in exactly that form. Whether you would call this “reasoning” or not might be open to debate, but regardless of what you call it, it is something more than just unadorned information retrieval like a “stochastic parrot.”

What is your experience? Let us know in the comments.

Interval arithmetic and fixed points

A couple days ago I analyzed the observation that repeatedly pressing the cosine key on a calculator leads to a fixed point. After about 90 iterations the number no longer changes. This post will analyze the same phenomenon a different way.

Interval arithmetic

Interval arithmetic is a way to get exact results of a sort from floating point arithmetic.

Suppose you start with a number x that cannot be represented exactly as a floating point number, and you want to compute f(x) for some function f. You can’t represent x exactly, but unless x is too large you can represent a pair of numbers a and b such that x is certainly in the interval [a, b]. Then f(x) is in the set f( [a, b] ).

Maybe you can represent f( [a, b] ) exactly. If not, you can enlarge the interval a bit to exactly represent an interval that contains f(x). After applying several calculations, you have an interval, hopefully one that’s not too big, containing the exact result.

(I said above that interval arithmetic gives you exact results of a sort because even though you don’t generally get an exact number at the end, you do get an exact interval containing the result.)

Cosine iteration

In this post we will use interval arithmetic, not to compensate for the limitations of computer arithmetic, but to illustrate the convergence of iterated cosines.

The cosine of any real number lies in the interval [−1, 1]. To put it another way,

cos( [−∞, ∞] ) = [−1, 1].

Because cosine is an even function,

cos( [−1, 1] ) = cos( [0, 1] )

and so we can limit our attention to the interval [0, 1].

Now the cosine is a monotone decreasing function from 0 to π, and so it’s monotone on [0, 1]. For any two points with 0 ≤ ab ≤ π we have

cos( [a, b] ) = [cos(b), cos(a)].

Note that the order of a and b reverses on the right hand side of the equation because cosine is decreasing. When we apply cosine again we get back the original order.

cos(cos( [a, b] )) = [cos(cos(a)), cos(cos(b))].

Incidentally, this flip-flop explains why the cobweb plot from the previous post looks like a spiral rather than a staircase.

Now define a0 = 0, b0 = 1, and

[an+1, bn+1] = cos( [an, bn] ) = [cos(bn), cos(an)].

We could implement this in Python with a pair of mutually recursive functions.

    a = lambda n: 0 if n == 0 else cos(b(n-1))
    b = lambda n: 1 if n == 0 else cos(a(n-1))

Here’s a plot of the image of [0, 1] after n iterations.

Note that odd iterations increase the lower bound and even iterations decrease the upper bound.

Numerical interval arithmetic

This post introduced interval arithmetic as a numerical technique, then proceeded to do pure math. Now let’s think about computing again.

The image of [0, 1] under cosine is [cos(1), cos(0)] = [cos(1), 1]. A computer can represent 1 exactly but not cos(1). Suppose we compute

cos(1) = 0.5403023058681398

and assume each digit in the result is correct. Maybe the exact value of cos(1) was slightly smaller and was rounded to this value, but we know for sure that

cos( [0, 1] ) ⊂ [0.5403023058681397, 1]

So in this case we don’t know the image of [0, 1], but we know an interval that contains the image, hence the subset symbol.

We could iterate this process, next computing an interval that contains

cos( [0.5403023058681397, 1] )

and so forth. At each step we would round the left endpoint down to the nearest representable lower bound and round the right endpoint up to the nearest representable upper bound. In practice we’d be concerned with machine representable numbers rather than decimal representable numbers, but the principle is the same.

The potential pitfall of interval arithmetic in practice is that intervals may grow so large that the final result is not useful. But that’s not the case here. The rounding error at each step is tiny, and contraction maps reduce errors at each step rather than magnifying them. In a more complicated calculation, we might have to resort to lose estimates and not have such tight intervals at each step.

Related posts