National Provider Identifier (NPI) and its checksum

Healthcare providers in the United States are required to have an ID number known as the NPI (National Provider Identifier). This is a 10-digit unique identifier which serves as the primary key in a publicly available database. You can use the NPI number to look up a provider’s name, credentials, their practice location, etc. The use of NPI numbers was required by HIPAA.

The specification for the NPI number format says that the first digit must be either 1 or 2. Currently every NPI in the database starts with 1. There are about 8.4 million NPIs currently, so it’ll be a while before they’ll need to roll the first digit over to 2.

The last digit of the NPI is a check sum. The check sum uses the Luhn algorithm, the same check sum used for credit cards and other kinds of identifiers. The Luhn algorithm was developed in 1954 and was designed to be easy to implement by hand. It’s kind of a quirky algorithm, but it will catch all single-digit errors and nearly all transposition errors.

The Luhn algorithm is not applied to the NPI itself but by first prepending 80840 to the (first nine digits of) the NPI.

For example, let’s look at 1993999998. This is not (currently) anyone’s NPI, but it has a valid NPI format because the Luhn checksum of 80840199399999 is 8. We will verify this with the code below.

Python code for Luhn checksum

The following code computes the Luhn checksum.

    def checksum(payload):
        digits = [int(c) for c in reversed(str(payload))]
        s = 0
        for i, d in enumerate(digits):
            if i % 2 == 0:
                t = 2*d
                if t > 9:
                    t -= 9
                s += t
            else:
                s += d
        return (s*9) % 10

And the following checks whether the last digit of a number is the checksum of the previous digits.

    def verify(fullnumber):
        payload = fullnumber // 10
        return checksum(payload) == fullnumber % 10

And finally, the following validates an NPI number.

    def verify_npi(npi):
        return verify(int("80840" + str(npi)))

Here we apply the code above to the hypothetical NPI number mentioned above.

    assert(checksum(80840199399999) == 8)
    assert(verify(808401993999998))
    assert(verify_npi(1993999998))

Related posts

Getting some (algorithmic) SAT-isfaction

How can you possibly solve a mission-critical problem with millions of variables—when the worst-case computational complexity of every known algorithm for that problem is exponential in the number of variables?

SAT (Satisfiability) solvers have seen dramatic orders-of-magnitude performance gains for many problems through algorithmic improvements over the last couple of decades or so. The SAT problem—finding an assignment of Boolean variables that makes a given Boolean expression true—represents the archetypal NP-complete problem and in the general case is intractable.

However, for many practical problems, solutions can be found very efficiently by use of modern methods. This “killer app” of computer science, as described by Donald Knuth, has applications to many areas, including software verification, electronic design automation, artificial intelligence, bioinformatics, and planning and scheduling.

Its uses are surprising and diverse, from running billion dollar auctions to solving graph coloring problems to computing solutions to Sudoku puzzles. As an example, I’ve included a toy code below that uses SMT, a relative of SAT, to find the English language suffix rule for regular past tense verbs (“-ed”) from data.

When used as a machine learning method, SAT solvers are quite different from other methods such as neural networks. SAT solvers can for some problems have long or unpredictable runtimes (though MAXSAT can sometimes relax this restriction), whereas neural networks have essentially fixed inference cost (though looping agent-based models do not).

On the other hand, answers from SAT solvers are always guaranteed correct, and the process is interpretable; this is currently not so for neural network-based large language models.

To understand better how to think about this difference in method capabilities, we can take a lesson from the computational science community. There, it is common to have a well-stocked computational toolbox of both slow, accurate methods and fast, approximate methods.

In computational chemistry, ab initio methods can give highly accurate results by solving Schrödinger’s equation directly, but only scale to limited numbers of atoms. Molecular dynamics (MD), however, relies more on approximations, but scales efficiently to many more atoms. Both are useful in different contexts. In fact, the two methodologies can cross-pollenate, for example when ab initio calculations are used to devise force fields for MD simulations.

A lesson to take from this is, it is paramount to find the best tool for the given problem, using any and all means at one’s disposal.

The following are some of my favorite general references on SAT solvers:

It would seem that unless P = NP, commonly suspected to be false, the solution of these kinds of problems for any possible input is hopelessly beyond reach of even the world’s fastest computers. Thankfully, many of the problems we care about have an internal structure that makes them much more solvable (and likewise for neural networks). Continued improvement of SAT/SMT methods, in theory and implementation, will greatly benefit the effective solution of these problems.

A toy example: find the English past tense suffix rule using Z3

import csv
import z3

def char2int(c): return ord(c) - ord('a')

def int2char(i): return chr(i + ord('a'))

# Access the language data from the file.
with open('eng_cols.txt', newline='') as csvfile:
    reader = csv.reader(csvfile, delimiter='\t')
    table = [row for row in reader]

nrow, ncol = len(table), len(table[0])

# Identify which columns of input table have stem and targeted word form.
stem_col, form_col = 0, 1

# Calculate word lengths.
nstem = [len(table[i][stem_col]) for i in range(nrow)]
nform = [len(table[i][form_col]) for i in range(nrow)]

# Length of suffix being sought.
ns = 2

# Initialize optimizer.
solver = z3.Optimize()

# Define variables to identify the characters of suffix; add constraints.
var_suf = [z3.Int(f'var_suf_{i}') for i in range(ns)]

for i in range(ns):
    solver.add(z3.And(var_suf[i] >= 0, var_suf[i] < 26))

# Define variables to indicate whether the given word matches the rule.
var_m = [z3.Bool(f'var_m_{i}') for i in range(nrow)]

# Loop over words.
for i in range(nrow):

    # Constraint on number of characters.
    constraints = [nform[i] == nstem[i] + ns]

    # Constraint that the form contains the stem.
    for j in range(nstem[i]):
        constraints.append(
            table[i][stem_col][j] == table[i][form_col][j]
                if j < nform[i] else False)

    # Constraint that the end of the word form matches the suffix. 
    for j in range(ns):
        constraints.append(
            char2int(table[i][form_col][nform[i]-1-j]) == var_suf[j]
                if j < nform[i] else False)

    # var_m[i] is the "and" of all these constraints.
    solver.add(var_m[i] == z3.And(constraints))

# Seek suffix that maximizes number of matches.
count = z3.Sum([z3.If(var_m[i], 1, 0) for i in range(nrow)])
solver.maximize(count)

# Run solver, output results.
if solver.check() == z3.sat:
    model = solver.model()
    suf = [model[var_suf[i]] for i in range(ns)]
    print('Suffix identified: ' +
          ''.join(list([int2char(suf[i].as_long())
                        for i in range(ns)]))[::-1])
    print('Number of matches: ' + str(model.evaluate(count)) +
          ' out of ' + str(nrow) + '.')

    var_m_values = [model[var_m[i]] for i in range(nrow)]

    print('Matches:')
    for i in range(nrow):
        if var_m_values[i]:
            print(table[i][stem_col], table[i][form_col])

Calculating trig functions from tables

It takes some skill to use tables of mathematical functions; it’s not quite as simple as it may seem. Although it’s no longer necessary to use tables, it’s interesting to look into the details of how it is done.

For example, the Handbook of Mathematical Functions edited by Abramowitz and Stegun tabulates sines and cosines in increments of one tenth of a degree, from 0 degrees to 45 degrees. What if your angle was outside the range 0° to 45° or if you needed to specify your angle more precisely than 1/10 of a degree? What if you wanted, for example, to calculate cos 203.147°?

The high-level answer is that you would use range reduction and interpolation. You’d first use range reduction to reduce the problem of working with any angle to the problem of working with an angle between 0° and 45°, then you’d use interpolation to get the necessary accuracy for a value within this range.

OK, but how exactly do you do the range reduction and how exactly do you to the interpolation? This isn’t deep, but it’s not trivial either.

Range reduction

Since sine and cosine have a period of 360°, you can add or subtract some multiple of 360° to obtain an angle between −180° and 180°.

Next, you can use parity to reduce the range further. That is, since sin(−x) = −sin(x) and cos(−x) = cos(x) you can reduce the problem to computing the sine or cosine of an angle between 0 and 180°.

The identities sin(180° − x) = sin(x) and cos(180° −x) = −cos(x) let you reduce the range further to between 0 and 90°.

Finally, the identities cos(x) = sin(90° − x) and sin(x) = cos(90° − x) can reduce the range to 0° to 45°.

Interpolation

You can fill in between the tabulated angles using interpolation, but how accurate will your result be? How many interpolation points will you need to use in order to get single precision, e.g. an error on the order of 10−7?

The tables tell you. As explained in this post on using a table of logarithms, the tables have a notation at the bottom of the table that tells you how many Lagrange interpolation points to use and what kind of accuracy you’ll get. Five interpolation points will give you roughly single precision accuracy, and the notation gives you a little more accurate error bound. The post on using log tables also explains how Lagrange interpolation works.

Beyond trig functions

I intend to write more posts on using tables. The general pattern is always range reduction and interpolation, but it takes more advanced math to reduce the range of more advanced functions.

Update: The next post shows how to use tables to compute the gamma function for complex arguments.

Using a table of logarithms

My favorite quote from Richard Feynman is his remark that “nearly everything is really interesting if you go into it deeply enough.” This post will look at something that seems utterly trivial—looking up numbers in a table—and show that there’s much more to it when you dig a little deeper.

More than just looking up numbers

Before calculators were common, function values would be looked up in a table. For example, here is a piece of a table of logarithms from Abramowitz and Stegun, affectionately known as A&S.

But you wouldn’t just “look up” logarithm values. If you needed to know the value of a logarithm at a point where it is explicitly tabulated, then yes, you’d simply look it up. If you wanted to know the log of 1.754, then there it is in the table. But what if, for example, you wanted to know the log of 1.7543?

Notice that function values are given to 15 significant figures but input values are only given to four significant figures. If you wanted 15 sig figs in your output, presumably you’d want to specify your input to 15 sig figs as well. Or maybe you only needed 10 figures of precision, in which case you could ignore the rightmost column of decimal places in the table, but you still can’t directly specify input values to 10 figures.

Lagrange interpolation

If you go to the bottom of the column of A&S in the image above, you see this:

What’s the meaning of the mysterious square bracket expression? It’s telling you that for the input values in the range of this column, i.e. between 1.750 and 1.800, the error using linear interpolation will be less than 4 × 10−8, and that if you want full precision, i.e. 15 sig figs, then you’ll need to use Lagrange interpolation with 5 points.

So going back to the example of wanting to know the value of log(1,7543), we could calculate it using

0.7 × log(1.754) + 0.3 × log(1.755)

and expect the error to be less than 4 × 10−8.

We can confirm this with a little Python code.

>>> from math import log
>>> exact = log(1.7543)
>>> approx = 0.7*log(1.754) + 0.3*log(1.755)
>>> exact - approx
3.411265947494968e-08

Python uses double precision arithmetic, which is accurate to between 15 and 16 figures—more on that here—and so the function calls above are essentially the same as the tabulated values.

Now suppose we want the value of x = 1.75430123456789. The hint in square brackets says we should use Lagrange interpolation at five points, centered at the nearest tabulated value to x. That is, we’ll use the values of log at 1.752, 1.753, 1.754, 1.755, and 1.756 to compute the value of log(x).

Here’s the Lagrange interpolation formula, given in A&S as equation 25.2.15.

We illustrate this with the following Python code.

def interpolate(fs, p, h):
    s = (p**2 - 1)*p*(p-2)*fs[0]/24
    s -= (p - 1)*p*(p**2 - 4)*fs[1]/6
    s += (p**2 - 1)*(p**2 - 4)*fs[2]/4
    s -= (p + 1)*p*(p**2 - 4)*fs[3]/6
    s += (p**2 - 1)*p*(p + 2)*fs[4]/24
    return s

xs = np.linspace(1.752, 1.756, 5)
fs = np.log(xs)
h = 0.001
x = 1.75430123456789
p = (x - 1.754)/h

print(interpolate(fs, p, h))
print(np.log(x))

This prints

0.5620706206909348
0.5620706206909349

confirming that the interpolated value is indeed accurate to 15 figures.

Lagrange interpolation takes a lot of work to carry out by hand, and so sometimes you might use other techniques, such as transforming your calculation into one for which a Taylor series approximation converges quickly. In any case, sophisticated use of numerical tables was not simply a matter of looking things up.

Contemporary applications

A book of numerical tables enables you to do calculations without a computer. More than that, understanding how to do calculations without a computer helps you program calculations with a computer. Computers have to evaluate functions somehow, and one way is interpolating tabulated values.

For example, you could think of a digital image as a numerical table, the values of some ideal analog image sampled at discrete points. The screenshots above are interpolated: the HTML specifies the width to be less than that of the original screenshots,. You’re not seeing the original image; you’re seeing a new image that your computer has created for you using interpolation.

Interpolation is a kind of compression. A&S would be 100 billion times larger if it tabulated functions at 15 figure inputs. Instead, it tabulated functions for 4 figure inputs and gives you a recipe (Lagrange interpolation) for evaluating the functions at 15 figure inputs if you desire. This is a very common pattern. An SVG image, for example, does not tell you pixel values, but gives you equations for calculating pixel values at whatever scale is needed.

Related posts

Duct tape value creation

Excerpt from John Carmack’s review of the book Bullshit Jobs.

He talks about how software developers bemoan duct taping systems together, and would rather work on core technologies. He thinks it is some tragic failure, that if only wise system design was employed, you wouldn’t be doing all the duct taping.

Wrong.

Every expansion in capabilities opens up the opportunity to duct tape it to new areas, and this is where a lot of value creation happens. Eventually, when a sufficient amount of duct tape is found in an area, it is an opportunity for systemic redesigns, but you don’t wait for that before grabbing newly visible low hanging fruit!

The realistic alternative to duct tape and other aesthetically disappointing code is often no code.

Related posts

One-liner to troubleshoot LaTeX references

In LaTeX, sections are labeled with commands like \label{foo} and referenced like \ref{foo}. Referring to sections by labels rather than hard-coded numbers allows references to automatically update when sections are inserted, deleted, or rearranged.

For every reference there ought to be a label. A label without a corresponding reference is fine, though it might be a mistake. If you have a reference with no corresponding label, and one label without a reference, there’s a good chance the reference is a typo variation on the unreferenced label.

We’ll build up a one-liner for comparing labels and references. We’ll use grep to find patterns that look like labels by searching for label{ followed by any string of letters up to but not including a closing brace. We don’t want the label{ part, just what follows it, so we’ll use look-behind syntax, to exclude it from the match.

Here’s our regular expression:

    (?<=label{)[^}]+

We’re using Perl-style look-behind syntax, so we’ll need to give grep the -P option. Also, we only want the match itself, not matching lines, so we’ll also using the -o option. This will print all the labels:

    grep -oP '(?<=label{)[^}]+' foo.tex

The regex for finding references is the same with label replaced with ref.

To compare the list of labels and the list of references, we’ll use the comm command. For more on comm, see Set theory at the command line.

We could save the labels to a file, save the references to a file, and run comm on the two files. But we’re more interested in the differences between the two lists than the two lists, so we could pass both as streams to comm using the <(...) syntax. Finally, comm assumes its inputs are sorted so we pipe the output of both grep commands to sort.

Here’s our one-liner

    comm -12 <(grep -oP '(?<=label{)[^}]+' foo.tex | sort) 
             <(grep -oP '(?<=ref{)[^}]+' foo.tex | sort)

This will produce three sections of output: labels which are not references, references which not labels, and labels that are also references.

If you just want to see references that don’t refer to a label, give comm the option -13. This suppresses the first and third sections of output, leaving only the second section, references that are not labels.

You can also add a -u option (u for unique) to the calls to sort to suppress multiple instances of the same label or same reference.

Choosing a Computer Language for a Project

Julia. Scala. Lua. TypeScript. Haskell. Go. Dart. Various computer languages new and old are sometimes proposed as better alternatives to mainstream languages. But compared to mainstream choices like Python, C, C++ and Java (cf. Tiobe Index)—are they worth using?

Certainly it depends a lot on the planned use: is it a one-off small project, or a large industrial-scale software application?

Yet even a one-off project can quickly grow to production-scale, with accompanying growing pains. Startups sometimes face a growth crisis when the nascent code base becomes unwieldy and must be refactored or fully rewritten (or you could do what Facebook/Meta did and just write a new compiler to make your existing code base run better).

The scope of different types of software projects and their requirements is so incredibly diverse that any single viewpoint from experience runs a risk of being myopic and thus inaccurate for other kinds of projects. With this caveat, I’ll share some of my own experience from observing projects for many dozens of production-scale software applications written for leadership-scale high performance computing systems. These are generally on a scale of 20,000 to 500,000 lines of code and often require support of mathematical and scientific libraries and middleware for build support, parallelism, visualization, I/O, data management and machine learning.

Here are some of the main issues regarding choice of programming languages and compilers for these codes:

1. Language and compiler sustainability. While the lifetime of computing systems is measured in years, the lifetime of an application code base can sometimes be measured in decades. Is the language it is written in likely to survive and be well-supported long into the future? For example, Fortran, though still used and frequently supported, is is a less common language thus requiring special effort from vendors, with fewer developer resources than more popular languages. Is there a diversity of multiple compilers from different providers to mitigate risk? A single provider means a single point of failure, a high risk; what happens if the supplier loses funding? Are the language and compilers likely to be adaptable for future computer hardware trends (though sometimes this is hard to predict)? Is there a large customer base to help ensure future support? Similarly, is there an adequate pool of available programmers deeply skilled in the language? Does the language have a well-featured standard library ecosystem and good support for third-party libraries and frameworks? Is there good tool support (debuggers, profilers, build tools)?

2. Related to this is the question of language governance. How are decisions about future updates to the language made? Is there broad participation from the user community and responsiveness to their needs? I’ve known members of the C++ language committee; from my limited experience they seem very reasonable and thoughtful about future directions of the language. On the other hand, some standards have introduced features that scarcely anyone ever uses—a waste of time and more clutter for the standard.

3. Productivity. It is said that programmer productivity is limited by the ability of a few lines of code to express high level abstractions that can do a lot with minimal syntax. Does the language permit this? Does the language standard make sense (coherent, cohesive) and follow the principle of least surprise? At the same time, the language should not engulf what might better be handled modularly by a library. For example, a matrix-matrix product that is bound up with the language might be highly productive for simple cases but have difficulty supporting the many variants of matrix-matrix product provided for example by the NVIDIA CUTLASS library. Also, in-language support for CUDA GPU operations, for example, would make it hard for the language not to lag behind in support of the frequent new releases of CUDA.

4. Strategic advantage. The 10X improvement rule states that an innovation is only worth adopting if it offers 10X improvement compared to existing practice according to some metric . If switching to a given new language doesn’t bring significant improvement, it may not be worth doing. This is particularly true if there is an existing code base of some size. A related question is whether the new language offers an incremental transition path for an existing code to the new language (in many cases this is difficult or impossible).

5. Performance (execution speed). Does the language allow one to get down to bare-metal performance rather than going through costly abstractions or an interpreter layer? Are the features of the underlying hardware exposed for the user to access? Is performance predictable? Can one get a sense of the performance of each line of code just by inspection, or is this occluded by abstractions or a complex compilation process? Is the use of just-in-time compilation or garbage collection unpredictable, which could be a problem for parallel computing wherein unexpected “hangs” can be caused by one process unexpectedly performing one of these operations? Do the compiler developers provide good support for effective and accurate code optimization options? Have results from standardized non-cherry-picked benchmarks been published (kernel benchmarks, proxy apps, full applications)?

Early adopters provide a vibrant “early alert” system for new language and tool developments that are useful for small projects and may be broadly impactful. Python was recognized early in the scientific computing community for its potential complementary use with standard languages for large scientific computations. When it comes to planning large-scale software projects, however, a range of factors based on project requirements must be considered to ensure highest likelihood of success.

 

More is less

When I first started using Unix, I used a program called “more” to read files. The name makes sense because each time you press the space bar, more will show you more of your file, one screen at a time.

Now everyone uses less, and more is all but forgotten.

Daniel Halbert wrote more in 1978. Mark Nudelman a similar program with more functionality in 1984 which he named less. The name was a pun on the aphorism “less is more” [1]. Soon less completely replaced more.

I’m curious why I ever used more, since less had taken over before I touched Unix. One possibility is that someone who was accustomed to more showed me that command. Another possibility is that I learned it from reading The Unix Programming Environment which came out in November 1983. It includes more but not less.

My laptop contains executables for more and less in /usr/bin. The command

    diff less more

returns nothing, indicating that the binaries are identical: less literally is more.

My desktop has distinct binaries for less and more. The more binary is much smaller, and so I assume it is limited to the original functionality of more, more or less.

Related posts

[1] I don’t know who coined the phrase “less is more,” but it is associated with architect Ludwig Mies van der Rohe (1886–1969) who often quoted it. He did not apply the principle to is own name, however. He was born Ludwig Mies and later appended van der Rohe.

Distance from a point to a line

Eric Lengyel’s new book Projective Geometric Algebra Illuminated arrived yesterday and I’m enjoying reading it. Imagine if someone started with ideas like dot products, cross products, and determinants that you might see in your first year of college, then thought deeply about those things for years. That’s kinda what the book is.

Early in the book is the example of finding the distance from a point q to a line of the form p + tv.

If you define u = q − p then a straightforward derivation shows that the distance d from q to the line is given by

d = \sqrt{{\bold u}^2 - \frac{({\bold u}\cdot {\bold v})^2}{{\bold v}^2}}

But as the author explains, it is better to calculate d by

d = \sqrt{\frac{({\bold u}\times {\bold v})^2}{{\bold v}^2}}

Why is that? The two expressions are algebraically equal, but the latter is better suited for numerical calculation.

The cardinal rule of numerical calculation is to avoid subtracting nearly equal floating point numbers. If two numbers agree to b bits, you may lose up to b bits of significance when computing their difference.

If u and v are vectors with large magnitude, but q is close to the line, then the first equation subtracts two large, nearly equal numbers under the square root.

The second equation involves subtraction too, but it’s less obvious. This is a common theme in numerical computing. Imagine this dialog.

[Student produces first equation.]

Mentor: Avoid subtracting nearly equal numbers.

[Student produces section equation.]

Student: OK, did it.

Mentor: That’s much better, though it could still have problems.

Where is there a subtraction in the second equation? We started with a subtraction in defining u. More subtly, the definition of cross product involves subtractions. But these subtractions involve smaller numbers than the first formula, because the first formula subtracts squared values. Eric Lengyel points this out in his book.

None of this may matter in practice, until it does matter, which is a common pattern in numerical computing. You implement something like the first formula, something that can be derived directly. You implicitly have in mind vectors whose magnitude is comparable to d and this guides your choice of unit tests, which all pass.

Some time goes by and a colleague tells you your code is failing. Impossible! You checked your derivation by hand and in Mathematica. Your unit tests all pass. Must be your colleague’s fault. But it’s not. Your code would be correct in infinite precision, but in an actual computer it fails on inputs that violate your implicit assumptions.

This can be frustrating, but it can also be fun. Implementing equations from a freshman textbook accurately, efficiently, and robustly is not a freshman-level exercise.

Related posts

Experiences with Thread Programming in Microsoft Windows

Lately I’ve been helping a colleague to add worker threads to his GUI-based Windows application.

Thread programming can be tricky. Here are a few things I’ve learned along the way.

Performance. This app does compute-intensive work. It is helpful to offload this very compute-heavy work to a worker thread. Doing this frees the main thread to service GUI requests better.

Thread libraries. Windows has multiple thread libraries, for example Microsoft Foundation Class library threads and C++ standard library threads. It is hazardous to use different thread libraries in the same app. In the extreme case, different thread libraries, such as GOMP  vs. LOMP, used in resp. the GCC and LLVM compiler families, have different threading runtimes which keep track of threads in different ways. Mixing them in the same code can cause hazardous silent errors.

Memory fences are a thing. Different threads can run on different processor cores and hold variables in different respective L1 caches that are not flushed (this to maintain high performance). An update to a variable by one thread is not guaranteed to be visible to other threads without special measures. For example, one could safely transfer information using ::PostMessage coupled with a handler function on the receiver thread. Or one could send a signal using an MFC CEvent on one thread and read its Lock on the other. Also, a thread launch implicitly does a memory fence, so that, at least then, the new thread is guaranteed to correctly see the state of all memory locations.

GUI access should be done from the master thread only, not a worker thread. Doing so can result in deadlock. A worker thread can instead ::PostMessage to ask the master thread to do a GUI action.

Thread launch. By default AfxBeginThread returns a thread handle which MFC takes care of deleting when no longer needed. If you want to manage the life cycle of the handle yourself, you can do something like:

myWorkerThreadHandle = AfxBeginThread(myFunc, myParams,
  THREAD_PRIORITY_NORMAL, 0, CREATE_SUSPENDED);
myWorkerThreadHandle->m_bAutoDelete = false;
myWorkerThreadHandle->ResumeThread();

Joint use of a shared library like the DAO database library has hazards. One should beware of using the library to allocate something in one thread and deallocating in another, as this will likely allocate in a thread-local heap or stack instead of a shared thread-safe heap, this resulting in a crash.

Initialization. One should call CoInitializeEx(NULL, COINIT_APARTMENTTHREADED) and AfxDaoInit() (if using DAO) at thread initialization on both master and worker threads, and correspondingly CoUninitialize() and AfxDaoTerm() at completion of the thread.

Monitoring of thread state can be done with
WaitForSingleObject(myWorkerThreadHandle->m_hThread, 0) to determine if the thread has completed or WaitForSingleObject(myWorkerThreadHandle->m_hThread, INFINITE) for a blocking wait until completion.

Race conditions are always a risk but can be avoided by careful reasoning about execution. Someone once said up to 90% of code errors can be found by desk checking [1]. Race conditions are notoriously hard to debug, partly because they can occur nondeterministically. There are tools for trying to find race condition errors, though I’ve never tried them.

So far I find no rigorous specification of the MFC threading model online that touches on all these concerns. Hopefully this post is useful to someone else working through these issues.

References

[1] Dasso, Aristides., Funes, Ana. Verification, Validation and Testing in Software Engineering. United Kingdom: Idea Group Pub., 2007, p. 163.