Mendeleev’s inequality

Dmitri Mendeleev is best known for creating the first periodic table of chemical elements. He also discovered an interesting mathematical theorem. Empirical research led him to a question about interpolation, which in turn led him to a theorem about polynomials and their derivatives.

I ran across Mendeleev’s theorem via a paper by Boas [1]. The opening paragraph describes what Mendeleev was working on.

Some years after the chemist Mendeleev invented the periodic table of the elements he made a study of the specific gravity of a solution as a function of the percentage of the dissolved substance. This function is of some practical importance: for example, it is used in testing beer and wine for alcoholic content, and in testing the cooling system of an automobile for concentration of anti-freeze; but present-day physical chemists do not seem to find it as interesting as Mendeleev did.

Mendeleev fit his data by patching together quadratic polynomials, i.e. he used quadratic splines. A question about the slopes of these splines lead to the following.

Theorem (Mendeleev): Let P(x) be a quadratic polynomial on [−1, 1] such that |P(x)| ≤ 1. Then |P′(x)| ≤ 4.

Mendeleev showed his result to mathematician Andrey Markov who generalized it to the following.

Theorem (Markov): If P(x) is a real polynomial of degree n, and |P(x)| ≤ 1 on [−1, 1] then |P′(x)| ≤ n² on [−1, 1].

Both inequalities are sharp with equality if and only if P(x) = ±Tn(x), the nth Chebyshev polynomial. In the special case of Mendeleev’s inequality, equality holds for

T2(x) = 2x² − 1.

Andrey Markov’s brother Vladimir proved an extension of Andrey’s theorem to higher derivatives,

Related posts

[1] R. P. Boas, Jr. Inequalities for the Derivatives of Polynomials. Mathematics Magazine, Vol. 42, No. 4 (Sep., 1969), pp. 165–174

A lesser-known characterization of the gamma function

The gamma function Γ(z) extends the factorial function from integers to complex numbers. (Technically, Γ(z + 1) extends factorial.) There are other ways to extend the factorial function, so what makes the gamma function the right choice?

The most common answer is the Bohr-Mollerup theorem. This theorem says that if f: (0, ∞) → (0, ∞) satisfies

  1. f(x + 1) = x f(x)
  2. f(1) = 1
  3. log f is convex

then f(x) = Γ(x). The theorem applies on the positive real axis, and there is a unique holomorphic continuation of this function to the complex plane.

But the Bohr-Mollerup theorem is not the only theorem characterizing the gamma function. Another theorem was by Helmut Wielandt. His theorem says that if f is holomorphic in the right half-plane and

  1. f(z + 1) = z f(z)
  2. f(1) = 1
  3. f(z) is bounded for {z: 1 ≤ Re z ≤ 2}

then f(x) = Γ(x). In short, Wielandt replaces the log-convexity for positive reals with the requirement that f is bounded in a strip of the complex plane.

You might wonder what is the bound alluded to in Wielandt’s theorem. You can show from the integral definition of Γ(z) that

|Γ(z)| ≤ |Γ(Re z)|

for z in the right half-plane. So the bound on the complex strip {z: 1 ≤ Re z ≤ 2} equals the bound on the real interval [1, 2], which is 1.

Tighter bounds on alternating series remainder

The alternating series test is part of the standard calculus curriculum. It says that if you truncate an alternating series, the remainder is bounded by the first term that was left out. This fact goes by in a blur for most students, but it becomes useful later if you need to do numerical computing.

To be more precise, assume we have a series of the form

  \sum_{i=1}^\infty (-1)^i a_i

where the ai are positive and monotonically converge to zero. Then the tail of the series is bounded by its first term:

\left|R_n\right| = \left| \sum_{i=n+1}^\infty (-1)^i a_i \right| \leq a_{n+1}

The more we can say about the behavior of the ai the more we can say about the remainder. So far we’ve assumed that these terms go monotonically to zero. If their differences

\Delta a_i = a_i - a_{i+1}

also go monotonically to zero, then we have an upper and lower bound on the truncation error:

\frac{a_{n+1}}{2} \leq |R_n| \leq \frac{a_n}{2}

If the differences of the differences,

\Delta^2 a_i = \Delta (\Delta a_i)

also converge monotonically to zero, we can get a larger lower bound and a smaller upper bound on the remainder. In general, if the differences up to order k of the ai go to zero monotonically, then the remainder term can be bounded as follows.

\frac{a_{n+1}}{2}
+\frac{\Delta a_{n+1}}{2^2}
+\cdots+
\frac{\Delta^k a_{n+1}}{2^{k+1}}
< \left|R_n\right| <
\frac{a_n}{2}
-\left\{
\frac{\Delta a_n}{2^2}
+\cdots+
\frac{\Delta^k a_n}{2^{k+1}}
\right\}.

Source: Mark B. Villarino. The Error in an Alternating Series. American Mathematical Monthly, April 2018, pp. 360–364.

Related posts

Powers don’t clear fractions

If a number has a finite but nonzero fractional part, so do all its powers. I recently ran across a proof in [1] that is shorter than I expected.

Theorem: Suppose r is a real number that is not an integer, and the decimal part of r terminates. Then rk is not an integer for any positive integer k.

Proof: The number r can be written as a reduced fraction a / 10m for some positive m. If s = rk were an integer, then

10mk s = ak.

Now the left side of this equation is divisible by 10 but the right side is not, and so s = rk must not be an integer. QED.

The only thing special about base 10 is that we most easily think in terms of base 10, but you could replace 10 with any other base.

What about repeating decimals, like 1/7 = 0.142857142857…? They’re only repeating decimals in our chosen base. Pick the right base, i.e. 7 in this case, and they terminate. So the theorem above extends to repeating decimals.

[1] Eli Leher. √2 is Not 1.41421356237 or Anything of the Sort. The American Mathematical Monthly, Vol. 125, No. 4 (APRIL 2018), page 346.

Inverse cosine

In the previous two posts, we looked at why Mathematica and SymPy did not simplify sinh(arccosh(x)) to √(x² − 1) as one might expect. After understanding why sinh(arccosh(x)) doesn’t simplify nicely, it’s natural to ask why sin(arccos(x)) does simplify nicely.

In this post I sketched a proof of several identities including

sin(arccos(x)) = √(1 − x²)

saying the identities could be proved geometrically. Let x be between 0 and 1. Construct right triangle with hypotenuse of length 1 and one side of length x. Call the acute angle formed by these two sides θ. Then cos θ = x, and so arccos(x) = θ, and sin θ = √(1 − x²). This proves the identity above, but only for 0 < x < 1.

If we make branch cuts along (−∞, −1] and [1, ∞) we can extend arccos(z) uniquely by analytic continuation. We can extend the definition to the branch cuts by continuity, but from one direction. We either have to choose the extension to be continuous from above the branch cuts or from below; we have to choose one or the other because the two limits are not equal. As far as I know, everyone chooses continuity from above, i.e. continuity with quadrant II, by convention.

In any case, we can define arccos(z) for any complex number z, and the result is a number whose cosine is z. Therefore the square of its cosine is z², and the square of its sine is 1 − z². So we have

sin²(arccos(z)) = 1 − z².

But does that mean

sin(arccos(z)) = √(1 − z²)?

Certainly it does if we’re working with real numbers, but does it for complex numbers?

Recall what square root means for complex numbers: it is the analytic function with branch cut (−∞, 0] that agrees with the real square root function along the real line, and is defined along the branch cut to be continuous with quadrant II. Since

sin(arccos(z)) = √(1 − z²)

for real −1 < z < 1, the two sides of the equation are equal on a set with a limit point, and so as analytical functions they are equal on their common domain.

The only remaining detail is whether the two functions are equal along the branch cut (−∞, −1] where we’ve extended the function by continuity. As

Since we’ve defined both the arccos and square root functions by continuous extension from the second quadrant, equality on the branch cut also follows by continuity.

llms

Simplifying expressions in SymPy

The previous post looked at why Mathematica does not simplify the expression Sinh[ArcCosh[x]] the way you might think it should. This post will be a sort of Python analog of the previous post.

SymPy is a Python library that among other things will simplify mathematical expressions. As before, we seek to verify the entries in the table below, this time using SymPy.

\renewcommand{\arraystretch}{2.2} \begin{array}{c|c|c|c} & \sinh^{-1} & \cosh^{-1} & \tanh^{-1} \\ \hline \sinh & x & \sqrt{x^{2}-1} & \dfrac{x}{\sqrt{1-x^2}} \\ \hline \cosh & \sqrt{x^{2} + 1} & x & \dfrac{1}{\sqrt{1 - x^2}} \\ \hline \tanh & \dfrac{x}{\sqrt{x^{2}+1}} & \dfrac{\sqrt{x^{2}-1}}{x} & x \\ \end{array}

Here’s the code:

from sympy import *

x = symbols('x')

print( simplify(sinh(asinh(x))) )
print( simplify(sinh(acosh(x))) )
print( simplify(sinh(atanh(x))) )
print( simplify(cosh(asinh(x))) )
print( simplify(cosh(acosh(x))) )
print( simplify(cosh(atanh(x))) )
print( simplify(tanh(asinh(x))) )
print( simplify(tanh(acosh(x))) )
print( simplify(tanh(atanh(x))) )

As before, the results are mostly as we’d expect:

x
sqrt(x - 1)*sqrt(x + 1)
x/sqrt(1 - x**2)
sqrt(x**2 + 1)
x
1/sqrt(1 - x**2)
x/sqrt(x**2 + 1)
sqrt(x - 1)*sqrt(x + 1)/x
x

Also as before, sinh(acosh(x)) and tanh(acosh(x)) return more complicated expressions than in the table above. Why doesn’t

√(x − 1) √(x + 1)

simplify to

√(x² − 1)

as you’d expect? Because the equation

√(x − 1) √(x + 1) = √(x² − 1)

does not hold for all x. See the previous post for the subtleties of defining arccosh and sqrt for complex numbers. The equation above does not hold, for example, when x = −2.

As in Mathematica, you can specify the range of variables in SymPy. If we specify that x ≥ 0 we get the result we expect. The code

x = symbols('x', real=True, nonnegative=True)
print( simplify(sinh(acosh(x))) )

prints

sqrt(x**2 - 1)

as expected.

sinh( arccosh(x) )

I’ve written several posts about applying trig functions to inverse trig functions. I intended to write two posts, one about the three basic trig functions and one about their hyperbolic counterparts. But there’s more to explore here than I thought at first. For example, the mistakes that I made in the first post lead to a couple more posts discussing error detection and proofs.

I was curious about how Mathematica would handle these identities. Sometimes it doesn’t simplify expressions the way you expect, and for interesting reasons. It handled the circular functions as you might expect.

\renewcommand{\arraystretch}{2.2} \begin{array}{c|c|c|c} & \sin^{-1} & \cos^{-1} & \tan^{-1} \\ \hline \sin & x & \sqrt{1-x^{2}} & \dfrac{x}{\sqrt{1+x^2}} \\ \hline \cos & \sqrt{1-x^{2}} & x & \dfrac{1}{\sqrt{1 + x^2}} \\ \hline \tan & \dfrac{x}{\sqrt{1-x^{2}}} & \dfrac{\sqrt{1-x^{2}}}{x} & x \\ \end{array}

So, for example, if you enter Sin[ArcCos[x]] it returns √(1 − x²) as in the table above. Then I added an h on the end of all the function names to see whether it would reproduce the table of hyperbolic compositions.

\renewcommand{\arraystretch}{2.2} \begin{array}{c|c|c|c} & \sinh^{-1} & \cosh^{-1} & \tanh^{-1} \\ \hline \sinh & x & \sqrt{x^{2}-1} & \dfrac{x}{\sqrt{1-x^2}} \\ \hline \cosh & \sqrt{x^{2} + 1} & x & \dfrac{1}{\sqrt{1 - x^2}} \\ \hline \tanh & \dfrac{x}{\sqrt{x^{2}+1}} & \dfrac{\sqrt{x^{2}-1}}{x} & x \\ \end{array}

For the most part it did, but not entirely. The results were as expected except when applying sinh or cosh to arccosh. But Sinh[ArcCosh[x]] returns

\sqrt{\frac{x-1}{x+1}} (x+1)

and Tanh[ArcCosh[x]] returns

\frac{\sqrt{\frac{x-1}{x+1}} (x+1)}{x}

Why doesn’t Mathematica simplify as expected?

Why didn’t Sinh[ ArcCosh[x] ] just return √(x² − 1)? The expression it returned is equivalent to this: just square the (x + 1) term, bring it inside the radical, and simplify. That line of reasoning is correct for some values of x but not for others. For example, Sinh[ArcCosh[2]] returns −√3 but √(2² − 1) = √3. The expression Mathematica returns for Sinh[ArcCosh[x]] correctly evaluates to −√3.

Defining ArcCosh

To understand what’s going on, we have to look closer at what arccosh(x) means. You might say it is a function that returns the number whose hyperbolic cosine equals x. But cosh is an even function: cosh(−x) = cosh(x), so we can’t say the value. OK, so we define arccosh(x) to be the positive number whose hyperbolic cosine equals x. That works for real values of x that are at least 1. But what do we mean by, for example, arccosh(1/2)? There is no real number y such that cosh(y) = 1/2.

To rigorously define inverse hyperbolic cosine, we need to make a branch cut. We cannot define arccosh as an analytic function over the entire complex plane. But if we remove (−∞, 1], we can. We define arccosh(x) for real x > 1 to be the positive real number y such that cosh(y) = x, and define it for the rest of the complex plane (with our branch cut (−∞, 1] removed) by analytic continuation.

If we look up ArcCosh in Mathematica’s documentation, it says “ArcCosh[z] has a branch cut discontinuity in the complex z plane running from −∞ to +1.” But what about values of x that lie on the branch cut? For example, we looked at ArcCosh[-2] above. We can extend arccosh to the entire complex plane, but we cannot extend it as an analytic function.

So how do we define arccosh(x) for x in (−∞, 1]? We could define it to be the limit of arccosh(z) as z approaches x for values of z not on the branch cut. But we have to make a choice: do we approach x from above or from below? That is, we can define arccosh(x) for real x ≤ 1 by

\text{arccosh}(x) = \lim_{\varepsilon \to 0^+} \text{arccosh}(x + \varepsilon i)

or by

\text{arccosh}(x) = \lim_{\varepsilon \to 0^-} \text{arccosh}(x + \varepsilon i)

but we have to make a choice because the two limits are not the same. For example, ArcCosh[-2 + 0.001 I] returns 1.31696 + 3.14102 I but ArcCosh[-2 - 0.001 I] returns 1.31696 - 3.14102 I. By convention, we choose the limit from above.

Defining square root

Where did we go wrong when we assumed Mathematica’s expression for sinh(arccosh(x))

\sqrt{\frac{x-1}{x+1}} (x+1)

could be simplified to √(x² − 1)? We implicitly assumed √(x + 1)² = (x + 1). And that’s true, if x ≥ − 1, but not for smaller x. Just as we have be careful about how we define arccosh, we have to be careful about how we define square root.

The process of defining the square root function for all complex numbers is analogous to the process of defining arccosh. First, we define square root to be what we expect for positive real numbers. Then we make a branch cut, in this case (−∞, 0]. Then we define it by analytic continuation for all values not on the cut. Then finally, we define it along the cut by continuity, taking the limit from above.

Once we’ve defined arccosh and square root carefully, we can see that the expressions Mathematica returns for sinh(arccosh(x)) and tanh(arccosh(x)) are correct for all complex inputs, while the simpler expressions in the table above implicitly assume we’re working with values of x for which arccosh(x) is real.

Making assumptions explicit

If we are only concerned with values of x ≥ − 1 we can tell Mathematica this, and it will simplify expressions accordingly. If we ask it for

    Simplify[Sinh[ArcCosh[x]], Assumptions -> {x >= -1}]

it will return √(x² − 1).

Related posts

Trig composition table

I’ve written a couple posts that reference the table below.

\renewcommand{\arraystretch}{2.2} \begin{array}{c|c|c|c} & \sin^{-1} & \cos^{-1} & \tan^{-1} \\ \hline \sin & x & \sqrt{1-x^{2}} & \dfrac{x}{\sqrt{1+x^2}} \\ \hline \cos & \sqrt{1-x^{2}} & x & \dfrac{1}{\sqrt{1 + x^2}} \\ \hline \tan & \dfrac{x}{\sqrt{1-x^{2}}} & \dfrac{\sqrt{1-x^{2}}}{x} & x \\ \end{array}

You could make a larger table, 6 × 6, by including sec, csc, cot, and their inverses, as Baker did in his article [1].

Note that rows 4, 5, and 6 are the reciprocals of rows 1, 2, and 3.

Returning to the theme of the previous post, how could we verify that the expressions in the table are correct? Each expression is one of 14 forms for reasons we’ll explain shortly. To prove that the expression in each cell is the correct one, it is sufficient to check equality at just one random point.

Every identity can be proved by referring to a right triangle with one side of length x, one side of length 1, and the remaining side of whatever length Pythagoras dictates, just as in the first post [2]. Define the sets AB, and C by

A = {1}
B = {x}
C = {√(1 − x²), √(x² − 1), √(1 + x²)}

Every expression is the ratio of an element from one of these sets and an element of another of these sets. You can check that this can be done 14 ways.

Some of the 14 functions are defined for |x| ≤ 1, some for |x| ≥, and some for all x. This is because sin and cos has range [−1, 1], sec and csc have range (−∞, 1] ∪ [1, ∞) and tan and cot have range (−∞, ∞). No two of the 14 functions are defined and have the same value at more than a point or two.

The follow code verifies the identities at a random point. Note that we had to define a few functions that are not built into Python’s math module.

    from math import *

    def compare(x, y):
        print(abs(x - y) < 1e-12)

    sec  = lambda x: 1/cos(x)    
    csc  = lambda x: 1/sin(x)
    cot  = lambda x: 1/tan(x)
    asec = lambda x: atan(sqrt(x**2 - 1))
    acsc = lambda x: atan(1/sqrt(x**2 - 1))
    acot = lambda x: pi/2 - atan(x)

    x = np.random.random()
    compare(sin(acos(x)), sqrt(1 - x**2))
    compare(sin(atan(x)), x/sqrt(1 + x**2))
    compare(sin(acot(x)), 1/sqrt(x**2 + 1))
    compare(cos(asin(x)), sqrt(1 - x**2))
    compare(cos(atan(x)), 1/sqrt(1 + x**2))
    compare(cos(acot(x)), x/sqrt(1 + x**2))
    compare(tan(asin(x)), x/sqrt(1 - x**2))
    compare(tan(acos(x)), sqrt(1 - x**2)/x)
    compare(tan(acot(x)), 1/x)
    
    x = 1/np.random.random()
    compare(sin(asec(x)), sqrt(x**2 - 1)/x)
    compare(cos(acsc(x)), sqrt(x**2 - 1)/x)    
    compare(sin(acsc(x)), 1/x)
    compare(cos(asec(x)), 1/x)
    compare(tan(acsc(x)), 1/sqrt(x**2 - 1))
    compare(tan(asec(x)), sqrt(x**2 - 1))

This verifies the first three rows; the last three rows are reciprocals of the first three rows.

Related posts

[1] G. A. Baker. Multiplication Tables for Trigonometric Operators. The American Mathematical Monthly, Vol. 64, No. 7 (Aug. – Sep., 1957), pp. 502–503.

[2] These geometric proofs only prove identities for real-valued inputs and outputs and only over limited ranges, and yet they can be bootstrapped to prove much more. If two holomorphic functions are equal on a set of points with a limit point, such as a interval of the real line, then they are equal over their entire domains. So the geometrically proven identities extend to the complex plane.

How much certainty is worthwhile?

A couple weeks ago I wrote a post on a composition table, analogous to a multiplication table, for trig functions and inverse trig functions.

\renewcommand{\arraystretch}{2.2} \begin{array}{c|c|c|c} & \sin^{-1} & \cos^{-1} & \tan^{-1} \\ \hline \sin & x & \sqrt{1-x^{2}} & \dfrac{x}{\sqrt{1+x^2}} \\ \hline \cos & \sqrt{1-x^{2}} & x & \dfrac{1}{\sqrt{1 + x^2}} \\ \hline \tan & \dfrac{x}{\sqrt{1-x^{2}}} & \dfrac{\sqrt{1-x^{2}}}{x} & x \\ \end{array}

Making mistakes and doing better

My initial version of the table above had some errors that have been corrected. When I wrote a followup post on the hyperbolic counterparts of these functions I was more careful. I wrote a little Python code to verify the identities at a few points.

\renewcommand{\arraystretch}{2.2} \begin{array}{c|c|c|c} & \sinh^{-1} & \cosh^{-1} & \tanh^{-1} \\ \hline \sinh & x & \sqrt{x^{2}-1} & \dfrac{x}{\sqrt{1-x^2}} \\ \hline \cosh & \sqrt{x^{2} + 1} & x & \dfrac{1}{\sqrt{1 - x^2}} \\ \hline \tanh & \dfrac{x}{\sqrt{x^{2}+1}} & \dfrac{\sqrt{x^{2}-1}}{x} & x \\ \end{array}

Checking a few points

Of course checking an identity at a few points is not a proof. On the other hand, if you know the general form of the answer is right, then checking a few points is remarkably powerful. All the expressions above are simple combinations of a handful of functions: squaring, taking square roots, adding or subtracting 1, and taking ratios. What are the chances that a couple such combinations agree at a few points but are not identical? Very small; zero if you formalize the problem correctly. More on that in the next post.

In the case of polynomials, checking a few points may be sufficient. If two polynomials in one variable agree at enough points, they agree everywhere. This can be applied when it’s not immediately obvious that identity involves polynomials, such as proving theorems about binomial coefficients.

The Schwartz-Zippel lemma is a more sophisticated version of this idea that is used in zero knowledge proofs (ZKP). Statements to be proved are formulated as multivariate polynomials over finite fields. The Schwartz-Zippel lemma quantifies the probability that the polynomials could be equal at a few random points but not be equal everywhere. You can prove that a statement is correct with high probability by only checking a small number of points.

Achilles heel

The first post mentioned above included geometric proofs of the identities, but also had typos in the table. This is an important point: formally verified systems can and do contain bugs because there is inevitably some gap between what it formally verified and what is not. I could have formally verified the identities represented in the table, say using Lean, but introduced errors when I manually transcribe the results into LaTeX to make the diagram.

It’s naive to say “Well then don’t leave anything out. Formally verify everything.” It’s not possible to verify “everything.” And things that could in principle be verified may require too much effort to do so.

There are always parts of a system that are not formally verified, and these parts are where you need to look first for errors. If I had formally verified my identities in Lean, it would be more likely that I made a transcription error in typing LaTeX than that the Lean software had a bug that allowed a false statement to slip through.

Economics

The appropriate degree of testing or formal verification depends on the context. In the case of the two blog posts above, I didn’t do enough testing for the first but did do enough for the second: checking identities at a few random points was the right level of effort. Software that controls a pacemaker or a nuclear power plant requires a higher degree of confidence than a blog post.

Rigorously proving identities

Suppose you want to rigorously prove the identities in the tables above. You first have to specify your domains. Are the values of x real numbers or complex numbers? Extending to the complex numbers doesn’t make things harder; it might make them easier by making some problems more explicit.

The circular and hyperbolic functions are easy to define for all complex numbers, but the inverse functions, including the square root function, require more care. It’s more work than you might expect, but you can find an outline of a full development here. Once you have all the functions carefully defined, the identities can be verified by hand or by a CAS such as Mathematica. Or even better, by both.

Related posts

Differential equation with a small delay

In grad school I specialized in differential equations, but never worked with delay-differential equations, equations specifying that a solution depends not only on its derivatives but also on the state of the function at a previous time. The first time I worked with a delay-differential equation would come a couple decades later when I did some modeling work for a pharmaceutical company.

Large delays can change the qualitative behavior of a differential equation, but it seems plausible that sufficiently small delays should not. This is correct, and we will show just how small “sufficiently small” is in a simple special case. We’ll look at the equation

x′(t) = a x(t) + b x(t − τ)

where the coefficients a and b are non-zero real constants and the delay τ is a positive constant. Then [1] proves that the equation above has the same qualitative behavior as the same equation with the delay removed, i.e. with τ = 0, provided τ is small enough. Here “small enough” means

−1/e exp(−aτ) < e

and

aτ < 1.

There is a further hypothesis for the theorem cited above, a technical condition that holds on a nowhere dense set. The solution to a first order delay-differential like the one we’re looking at here is not determined by an initial condition x(0) = x0 alone. We have to specify the solution over the interval [−τ, 0]. This can be any function of t, subject only to a technical condition that holds on a nowhere-dense set of initial conditions. See [1] for details.

Example

Let’s look at a specific example,

x′(t) = −3 x(t) + 2 x(t − τ)

with the initial condition x(1) = 1. If there were no delay term τ, the solution would be x(t) = exp(1 − t). In this case the solution monotonically decays to zero.

The theorem above says we should expect the same behavior as long as

−1/e < 2τ exp(3τ) < e

which holds as long as τ < 0.404218.

Let’s solve our equation for the case τ = 0.4 using Mathematica.

tau = 0.4
solution = NDSolveValue[
    {x'[t] == -3 x[t] + 2 x[t - tau], x[t /; t <= 1] == t }, 
    x, {t, 0, 10}]
Plot[solution[t], {t, 0, 10}, PlotRange -> All]

This produces the following plot.

The solution initially ramps up to 1, because that’s what we specified, but it seems that eventually the solution monotonically decays to 0, just as when τ = 0.

When we change the delay to τ = 3 and rerun the code we get oscillations.

[1] R. D. Driver, D. W. Sasser, M. L. Slater. The Equation x’ (t) = ax (t) + bx (t – τ) with “Small” Delay. The American Mathematical Monthly, Vol. 80, No. 9 (Nov., 1973), pp. 990–995