Tricks for radix conversion by hand

The simplest trick for converting from one base to another is grouping. To convert between base b and base bk, group numbers in sets of k and convert one group at a time. To convert from binary to octal, for instance, group bits in sets of three, starting from the right end, and convert each group to octal.

11110010two → (11)(110)(010) → 362eight

For an example going the other direction, let’s convert 476 in base nine to base three.

476nine → (11)(21)(20) → 112120three

In general, conversion between bases is too tedious to do by hand, but one important case where it’s a little easier than it could be is converting between decimal and octal. In combination with the grouping trick above, this means you could, for example, convert between decimal and binary by first converting decimal to octal. Then the conversion from octal to binary is trivial.

The key to converting between decimal and octal is to exploit the fact that 10 = 8 + 2, so powers of 10 become powers of (8 + 2), or powers of 8 become powers of (10 − 2). These tricks are easier to carry out than to explain. You can find descriptions and examples in Knuth’s TAOCP, volume 2, section 4.4.

Knuth cites a note by Walter Soden from 1953 as the first description of the trick for converting octal to decimal.

The trick for moving between base 9 and base 10 (or by grouping, between base 3 and base 10) is simpler and left as an exercise by Knuth. (Problem 12 in section 4.4, with a solution given at the back of the volume.)

Related posts

Double rounding

The previous post started by saying that rounding has a surprising amount of detail. An example of this is double rounding: if you round a number twice, you might not get the same result as if you rounded directly to the final precision.

For example, let’s say we’ll round numbers ending in 0, 1, 2, 3, or 4 down, and numbers ending in 5, 6, 7, 8, or 9 up. Then if we have a number like 123.45 and round it to one decimal place we have 123.5, and if we round that to an integer we have 124. But if we had rounded 123.45 directly to an integer we would have gotten 123. This is not a mere curiosity; it comes up fairly often and has been an issue in law suits.

The double rounding problem cannot happen in odd bases. So, for example, if you have some fraction represented in base 7 and you round it first from three figures past the radix point to two, then from two to one, you’ll get the same result as if you directly rounded from three figures to one. Say we start with 4231.243seven. If we round it to two places we get 4231.24seven, and if we round again to one place we get 4231.3seven, the same result we would get by rounding directly from three places to one.

The reason this works is that you cannot represent ½ by a finite expression in an odd base.

A magical land where rounding equals truncation

Rounding numbers has a surprising amount of detail. It may seem trivial but, as with most things, there is a lot more to consider than is immediately obvious. I expect there have been hundreds if not thousands of pages devoted to rounding in IEEE journals.

An example of the complexity of rounding is what William Kahan called The Tablemaker’s Dilemma: there is no way in general to know in advance how accurately you’ll need to compute a number in order to round it correctly.

Rounding can be subtle in any number system, but there is an alternative number system in which it is a little simpler than in base 10. It’s base 3, but with a twist. Instead of using 0, 1, and 2 as “digits”, we use −1, 0, and 1. This is known as the balanced ternary system: ternary because of base 3, and balanced because the digits are symmetrical about 0.

We need a symbol for −1. A common and convenient choice is to use T. Think of moving the minus sign from in front of a 1 to on top of it. Now we could denote the number of hours in a day as 10T0 because

1 \times 3^3 + 0 \times 3^2 + (-1)\times 3 + 0 = 24

A more formal way of a describing balanced ternary representation of a number x is a set of coefficients tk such that

x = \sum_{k=-\infty}^\infty t_k 3^k

with the restriction that each tk is in the set {−1, 0, 1}.

Balanced ternary representation has many interesting properties. For example, positive and negative numbers can all be represented without a minus sign. See, for example, Brain Hayes’ excellent article Third Base. The property we’re interested in here is that to round a balanced ternary number to the nearest integer, you simply lop off the fractional part. Rounding is the same as truncation. To see this, note that the largest possible fractional part is a sequence of all 1s, which represents ½:

\frac{1}{3} + \frac{1}{3^2} + \frac{1}{3^3} + \cdots = \frac{1}{2}

Similarly, the most negative possible fractional part is a sequence of all Ts, which represents −½. So unless the fractional part is exactly equal to ½, truncating the fractional part rounds to the nearest integer. If the fractional part is exactly ½ then there is no nearest integer but two integers that are equally near.

Related posts

Duplicating Hankel plot from A&S

Abramowitz and Stegun has quite a few intriguing plots. The post will focus on the follow plot, Figure 9.4, available here.

A&S figure 9.4

We will explain what the plot is and approximately reproduce it.

The plot comes from the chapter on Bessel functions, but the caption says it is a plot of the Hankel function H0(1). Why a plot of a Hankel function and not a Bessel function? The Hankel functions are linear combinations of the Bessel functions of the first and second kind:

H0(1) = J0i Y0

More on that Hankel functions and their relations to Bessel functions here.

The plot is the overlay of two kinds of contour plots: one for lines of constant magnitude and one for lines of constant phase. That is, if the function values are written in the form reiθ then one plot shows lines of constant r and one plot shows lines of constant θ.

We can roughly reproduce the plot of magnitude contours with the following Mathematica command:

ContourPlot[Abs[HankelH1[0, x + I y]], {x, -4, 2 }, {y, -1.5 , 1.5 }, 
 Contours -> 20, ContourShading -> None, AspectRatio -> 1/2]

This produces the following plot.

Absolute value contour

Similarly, we can replace Abs with Arg in the Mathematica command and increase Contours to 30 to obtain the following phase contour plot.

Phase contour

Finally, we can stack the two plots on top of each other using Mathematica’s Show command.

Magnitude and phase contours

By the way, you can clearly see the branch cut in the middle. The Hankel function is continuous (even analytic) as you move clockwise from the second quadrant around to the third, but it is discontinuous across the negative real axis because of the branch cut.

Related posts

Area of a quadrilateral from the lengths of its sides

Last week Heron’s formula came up in the post An Unexpected Triangle. Given the lengths of the sides of a triangle, there is a simple expression for the area of the triangle.

A = \sqrt{s(s-a)(s-b)(s-c)}

where the sides are a, b, and c and s is the semiperimeter, half the perimeter.

Is there an analogous formula for the area of a quadrilateral? Yes and no. If the quadrilateral is cyclic, meaning there exists a circle going through all four of its vertices, then Brahmagupta’s formula for the area of a quadrilateral is a direct generalization of Heron’s formula for the area of a triangle. If the sides of the cyclic quadrilateral are a, b, c, and d, then the area of the quadrilateral is

A = \sqrt{(s-a)(s-b)(s-c)(s-d)}

where again s is the semiperimeter.

But in general, the area of a quadrilateral is not determined by the length of its sides alone. There is a more general expression, Bretschneider’s formula, that expresses the area of a general quadrilateral in terms of the lengths of its sides and the sum of two opposite angles. (Either pair of opposite angles lead to the same value.)

A = \sqrt {(s-a)(s-b)(s-c)(s-d) - abcd \, \cos^2 \left(\frac{\alpha + \gamma}{2}\right)}

In a cyclic quadrilateral, the opposite angles α and γ add up to π, and so the cosine term drops out.

The contrast between the triangle and the quadrilateral touches on an area of math called distance geometry. At first this term may sound redundant. Isn’t geometry all about distances? Well, no. It is also about angles. Distance geometry seeks results, like Heron’s theorem, that only depend on distances.

Related posts

Entering Russian characters in Vim with digraphs

The purpose of this post is to expand on the following sentence [1]:

Russian letters are created by entering [Ctrl-k followed by] a corresponding Latin letter followed by an equals sign -, or, in a few places, a percent sign %.

The Russian alphabet has 33 letters, so there couldn’t be a Latin letter for every Russian letter. Also, there are Latin letters that don’t have a Russian counterpart and vice versa. So the mapping can’t be simple. But still, the above summary is nice: try control-k followed by the English analog and the equal sign. If that doesn’t work, try a percent sign instead.

Which Latin letters does Vim chose as corresponding to Russian letters? Does it go by sound or appearance? For example, the Russian letter Н looks like a Latin H but it sounds like a Latin N. Vim goes by sound. You would enter the Russian letter Н by typing Ctrl-k N =.

For full details, see the Vim documentation :h digraph-table. I give a simplified excerpt from the documentation below. I just look at capital letters because the lower case letters are analogous. All the official Unicode names begin with CYRILLIC CAPITAL LETTER and so I cut that part out.

char    digraph hex     official name 
А       A=      0410    A
Б       B=      0411    BE
В       V=      0412    VE
Г       G=      0413    GHE
Д       D=      0414    DE
Е       E=      0415    IE
Ё       IO      0401    IO
Ж       Z%      0416    ZHE
З       Z=      0417    ZE
И       I=      0418    I
Й       J=      0419    SHORT I
К       K=      041A    KA
Л       L=      041B    EL
М       M=      041C    EM
Н       N=      041D    EN
О       O=      041E    O
П       P=      041F    PE
Р       R=      0420    ER
С       S=      0421    ES
Т       T=      0422    TE
У       U=      0423    U
Ф       F=      0424    EF
Х       H=      0425    HA
Ц       C=      0426    TSE
Ч       C%      0427    CHE
Ш       S%      0428    SHA
Щ       Sc      0429    SHCHA
Ъ       ="      042A    HARD SIGN
Ы       Y=      042B    YERU
Ь       %"      042C    SOFT SIGN
Э       JE      042D    E
Ю       JU      042E    YU
Я       JA      042F    YA

Note that the end of the alphabet is more complicated than simply using a Latin letter and either an equal or percent sign. Also, the table is in alphabetical order, which doesn’t quite correspond to Unicode numerical order because of a quirk with the letter Ё (U+0401) explained here.

[1] Arnold Robbins and Elbert Hannah. Learning the vi & Vim Editors, 8th edition

Chebyshev and Russian transliteration

It’s not simple to transliterate Russian names to English. Sometimes there is a unique mapping, or at least a standard mapping, of a particular name, but often there is not.

An example that comes up frequently in mathematics is Pafnuty Lvovich Chebyshev (1821–1894). This Russian mathematician’s name Пафну́тий Льво́вич Чебышёв has been transliterated at Tchebichef, Tchebychev, Tchebycheff, Tschebyschev, Tschebyschef, Tschebyscheff, Čebyčev, Čebyšev, Chebysheff, Chebychov, Chebyshov, etc.

The American Mathematical Society has settled on “Chebyshev” as its standard, and this is now common in English mathematical writing. But things named after Chebyshev, such as Chebyshev polynomials, are often denoted with a T because the French prefer “Tchebyshev.”

There is an ISO standard, ISO 9, for transliterating Cyrillic characters into Latin characters. Under this standard, Чебышёв becomes Čebyšëv. This maps Cyrillic into Latin characters with diacritical marks but not into ASCII. The AMS realized that the vast majority of Americans would not type Čebyšëv into a search bar, for example, and chose Chebyshev instead.

Related posts

An unexpected triangle

Let J(x) be the function plotted below.

This is the Bessel function J1, but we drop the subscript because it’s the only Bessel function we’re interested in for this post. You can think of J as a sort of damped sine.

We can create versions of J with different frequencies by multiplying the argument x by different constants. Suppose we create versions with three different frequencies — J(ax), J(bx), and J(cx) — and integrate their product. This defines a function f of the frequencies.

f(a, b, c) = \int_0^\infty J(ax) \, J(bx) \, J(cx)\, dx

We can evaluate the integral defining f using Sonine’s formula [1]

f(a, b, c) = \frac{2\Delta(a, b, c)}{\pi abc}

where Δ(a, b, c) is the area of a triangle with sides a, b, c, if such a triangle exists, and zero otherwise.

It’s amazing that this formula takes three parameters with no intrinsic geometric meaning and out pops the area of a triangle with such sides.

Numerical (mis)application

It would be ill-advised, but possible, to invert Sonine’s formula and use it to find the area of a triangle; Heron’s formula would be a far better choice. But just for fun, we’ll take the ill-advised route.

from numpy import linspace, pi, sqrt, inf
from scipy.special import j1
from scipy.integrate import quad

def heron(a, b, c):
    s = (a + b + c)/2
    return sqrt(s*(s-a)*(s-b)*(s-c))

def g(a, b, c):
    integrand = lambda x: j1(a*x) * j1(b*x) * j1(c*x)
    i, _ = quad(integrand, 0, inf, limit=500)
    return i

def area(a, b, c):
    return g(a, b, c)*pi*a*b*c/2

print(area(3,4,5), heron(3,4,5))

SciPy’s quad function has difficulty with the integration, and rightfully issues a warning. The code increases the limit parameter from the default value of 50 to 500, improving the accuracy but not eliminating the warning. The area function computes the error of a 3-4-5 triangle to be 5.9984 and the heron function computes the exact value 6.

Update: I tried the numerical integration above in Mathematica and it correctly computed the integral to 10 decimal places with no help. I suspect it is detecting the oscillatory nature of the integral and using Levin’s integration method; when I explicit set the integration to be Levin’s method, I get the same result.

Impossible triangles

You could calculate the area of a triangle from Heron’s theorem or from Sonine’s theorem. The results are identical when the parameters a, b, and c form a valid triangle. But the two approaches diverge when a, b, and c do not form a triangle. If you pass in parameters like (3, 1, 1) then Heron’s theorem gives a complex number and Sonine’s theorem yields zero.

Related posts

[1] Discovered by Nikolay Yakovlevich Sonin (1849–1915). The formula is usually referred to as Sonine’s formula rather than Sonin’s formula, as far as I’ve seen. This variation is minor compared to what transliteration does to other Russian names.

Sonine’s formula is more general, extending to Bessel functions Jν with Re(ν) > ½. I chose ν = 1 for this post because the Sonin’s formula is simplest in this case.

Can AI Models Reason: Is Data All You Need?

Many are voicing concern that the world is running out of data and that this will be a blocker to progress toward smarter AI models. One paper in fact projects timelines for when we will run out.

AI researchers are looking for ways to adapt.  Nvidia has trained a specific model to generate synthetic data for training other models. Some use this approach, though using AI-generated data to train AI is not without risk.

Others have asked a bigger question, namely, is something fundamentally missing in our approach that relies so heavily on data. Certainly the bitter lesson thesis and the position long advocated by Geoffrey Hinton argue for a data-first approach with “as few” prior assumptions as possible (though every model has a bias).

But it’s currently simply unknown whether just adding more data and compute will do the trick for achieving general intelligence or whether something else is needed. Neurosymbolic approaches are being experimented with, in various forms. But it’s unclear whether these can scale up to the level needed. And the frontier labs, laser-focused on the current paradigm, may not have adequate time or resources to investigate high-risk/high-reward alternatives.

From a theoretical standpoint, sometimes more data is simply not enough. As discussed in a previous post, some problems in mathematics and engineering require exponentially large amount of data to train neural network models. Exponentials can work in your favor, but also can work against you (think of the Tower of Hanoi problem or the Wheat and Chessboard problem). Some problems on certain models cannot be solved by any amount of data available in the entire universe.

The requirements for solving these problems can grow much more quickly than expected. The strength of neural networks, their flexibility, their universal approximation property, can also be a weakness. It can take so much data to nail down all the parameters so that the model is completely error free. Thankfully, many other problems that people want to solve (such as human language modeling) are fundamentally lower dimensional and thus less vulnerable to this problem.

We just don’t know whether the current data-hungry approach will be enough—or whether we’ll need to learn another bitter lesson.

Dimensional analysis for gamma function values

Sometimes it’s useful to apply dimensional analysis where it doesn’t belong, to imagine things having physical dimension when they don’t. This post will look at artificially injecting dimensions into equations involving factorials and related functions.

Factorials

The factorial of n is defined as the product of n terms. If each of these terms had units of length, the factorial would have units of n-dimensional volume. It occurred to me this morning that a lot of identities involving factorials make sense dimensionally if you pretend the terms in the identities have units. The expressions on both sides of an equation have the same dimension, and only terms of the same dimension are added together. This isn’t always the case, so caveat emptor.

We could also think of an nth rising power or nth falling power as an n-dimensional volume. If we do, then the dimensions in a Newton series cancel out, for example.

Dimensional analysis for factorials and related functions could make it easier to remember identities, and easier to spot errors. And it could suggest the correct form of a result before the details of the result are filled in. In other words, artificial dimensional analysis can provide the same benefits of physically meaningful dimensional analysis.

Gamma function

For integers n, Γ(n) = (n − 1)!, and so we could assign dimension n − 1 to Γ(n), and more generally assign dimension z − 1 to Γ(z) for any complex z.

Now for some examples to show this isn’t as crazy as it sounds. For starters, take the identity Γ(z + 1) = z Γ(z). We imagine the left hand side is a z-dimensional volume, and the right hand side is the product of a term with unit length and a term that represents a (z − 1) dimensional volume, so the units check out.

Next let’s take something more complicated, the Legendre duplication formula:

\Gamma (z)\Gamma \left(z+\frac{1}{2}\right)=2^{1-2z}\sqrt{\pi} \; \Gamma (2z)
The left hand side has dimension (z − 1) + (z − ½) = 2z − ½. The right hand side has dimension 2z − 1, apparently contradicting our dimensional analysis scheme. But √π = Γ(½), and if we rewrite √π as Γ(½) in the equation above, both sides have dimension 2z − ½. The dimensions in other identities, like the reflection formula, also balance when you replace π with Γ(½)².

Hypergeometric functions

The hypergeometric function F(a, b; c; z) is defined by its power series representation. If we assign dimensions to the coefficients in the series as we’ve done here, then the numerator and denominator of each term have the same dimensions, and so the hypergeometric function should be dimensionless in our sense. This implies we should expect that identities for hypergeometric functions to be dimensionless as well. And indeed they are. I’ll give two examples.

First, let’s look at Gauss’ summation identity

F (a,b;c;1)= \frac{\Gamma(c)\Gamma(c-a-b)}{\Gamma(c-a)\Gamma(c-b)}

provided the real part of c is greater than the real part of a + b. Notice that the numerator and denominator both have dimension 2cab − 2.

Next, let’s look at Kummer’s identity

F (a,b;1+a-b;-1)= \frac{\Gamma(1+a-b)\Gamma(1+\tfrac12a)}{\Gamma(1+a)\Gamma(1+\tfrac12a-b)}

Both the numerator and denominator have dimension 3a/2 − b.

Finally, let’s look at a more complicated formula.

\begin{align*} F(a, b; c; z) &= \frac{\Gamma(c) \Gamma(b-a)}{\Gamma(b) \Gamma(c-a)} \,(-z)^{-a\phantom{b}} F\left(a, 1-c+a; 1-b+a; \frac{1}{z}\right) \\ &+ \frac{\Gamma(c) \Gamma(a-b)}{\Gamma(a) \Gamma(c-b)} \,(-z)^{-b\phantom{a}} F\left(\,b, 1-c+b; 1-a+b; \,\frac{1}{z}\right) \\ \end{align*}

In both the terms involving gamma functions, the dimensions of the numerator and denominator cancel out.