One-liner to troubleshoot LaTeX references

In LaTeX, sections are labeled with commands like \label{foo} and referenced like \ref{foo}. Referring to sections by labels rather than hard-coded numbers allows references to automatically update when sections are inserted, deleted, or rearranged.

For every reference there ought to be a label. A label without a corresponding reference is fine, though it might be a mistake. If you have a reference with no corresponding label, and one label without a reference, there’s a good chance the reference is a typo variation on the unreferenced label.

We’ll build up a one-liner for comparing labels and references. We’ll use grep to find patterns that look like labels by searching for label{ followed by any string of letters up to but not including a closing brace. We don’t want the label{ part, just what follows it, so we’ll use look-behind syntax, to exclude it from the match.

Here’s our regular expression:

    (?<=label{)[^}]+

We’re using Perl-style look-behind syntax, so we’ll need to give grep the -P option. Also, we only want the match itself, not matching lines, so we’ll also using the -o option. This will print all the labels:

    grep -oP '(?<=label{)[^}]+' foo.tex

The regex for finding references is the same with label replaced with ref.

To compare the list of labels and the list of references, we’ll use the comm command. For more on comm, see Set theory at the command line.

We could save the labels to a file, save the references to a file, and run comm on the two files. But we’re more interested in the differences between the two lists than the two lists, so we could pass both as streams to comm using the <(...) syntax. Finally, comm assumes its inputs are sorted so we pipe the output of both grep commands to sort.

Here’s our one-liner

    comm -12 <(grep -oP '(?<=label{)[^}]+' foo.tex | sort) 
             <(grep -oP '(?<=ref{)[^}]+' foo.tex | sort)

This will produce three sections of output: labels which are not references, references which not labels, and labels that are also references.

If you just want to see references that don’t refer to a label, give comm the option -13. This suppresses the first and third sections of output, leaving only the second section, references that are not labels.

You can also add a -u option (u for unique) to the calls to sort to suppress multiple instances of the same label or same reference.

Earth : Jupiter :: Jupiter : Sun

The size of Jupiter is approximately the geometric mean of the sizes of Sun and Earth.

In terms of radii,

\frac{R_\Sun}{R_{\text{\Jupiter}}} \approx \frac{R_\Jupiter}{R_\Earth}

The ratio on the left equals 9.95 and the ratio on the left equals 10.98.

The subscripts are the astronomical symbols for the Sun (☉, U+2609), Jupiter (♃, U+2643), and Earth (, U+1F728). I produced them in LaTeX using the mathabx package and the commands \Sun, Jupiter, and Earth.

The mathabx symbol for Jupiter is a little unusual. It looks italicized, but that’s not because the symbol is being used in math mode. Notice that the vertical bar in the symbol for Earth is vertical, i.e. not italicized.

 

Do comments in a LaTeX file change the output?

When you add a comment to a LaTeX file, it makes no visible change to the output. The comment is ignored as far as the appearance of the file. But is that comment somehow included in the file anyway?

If you compile a LaTeX file to PDF, then edit it by throwing in a comment, and compile again, your two files will differ. As I wrote about earlier, the time that a file is created is embedded in a PDF. That time stamp is also included in two or three hashes, so the files will differ by more than just the bits in the time stamp.

But even if you compile two files at the same time (within the resolution of the time stamp, which is one second), the PDF files will still differ. Apparently some kind of hash of the source file is included in the PDF.

So suppose you have two files. The content of foo.tex is

    \documentclass{article}
    \begin{document}
    Hello world.
    \end{document}

and the content of bar.tex is

    \documentclass{article}
    \begin{document}
    Hello world. % comment
    \end{document}

then the output of running pdflatex on both files will look the same.

Suppose you compile the files at the same time so that the time stamps are the same.

    pdflatex foo.tex && pdflatex bar.tex

It’s possible that the two time stamps could be different, one file compiling a little before the tick of a new second and one compiling a little after. But if your computer is fast enough and you don’t get unlucky, the time stamps will be the same.

Then you can compare hex dumps of the two PDF files with

    diff  <(xxd foo.pdf) <(xxd bar.pdf)

This produces the following

    < ...  ./ID [<F12AF1442
    < ...  E03CC6B3AB64A5D9
    < ... 8DEE2FE> <F12AF1
    < ...  442E03CC6B3AB64A
    < ... 5D98DEE2FE>]./Le
    --
    > ...  ./ID [<4FAA0E9F1
    > ...  CC6EFCC5068F481E
    > ...  0419AD6> <4FAA0E
    > ...  9F1CC6EFCC5068F4
    > ...  81E0419AD6>]./Le

You can’t recover the comment from the binary dump, but you can tell that the files differ.

I don’t know what hash is being used. My first guess was MD5, but that’s not it. It’s a 128-bit hash, so that rules out newer hashes like SHA256. I tried searching for it but didn’t find anything. If you know what hash pdflatex uses, please let me know.

LaTeX will also let you add text at the end of the file, after the \end{document} command. This also will change the hash code but will not change the appearance of the output.

Related posts

Navigating a LaTeX file

I like generating long LaTeX documents from org-mode because, for one thing, org-mode has nice section folding. But not everyone I work with uses Emacs, so its better to work in LaTeX directly rather than have Emacs generate LaTeX.

AUCTeX has section folding for LaTeX documents, though so far I’ve only has limited success at getting it to work. However, RefTeX worked right out of the box.

If you enter reftex-mode ctrlc = then RefTeX will open a table of contents window.

RefTeX screen shot

Scrolling through the table of contents window scrolls through the body of the document. This isn’t exactly section folding, but it serves a similar purpose.

RefTeX ships with Emacs, so there’s probably no need to install it, but the mode is not enabled by default.

Lehman’s inequality, circuits, and LaTeX

Let A, B, C, and D be positive numbers. Then Lehman’s inequality says

\frac{(A+B)(C+D)}{A+B+C+D} \geq \frac{AC}{A+C} + \frac{BD}{B+D}

Proof by circuit

This inequality can be proved analytically, but Lehman’s proof is interesting because he uses electrical circuits [1].

Let A, B, C, and D be the resistances of resistors arranges as in the circuit on the left.

Resistors R1 and R2 in series have a combined resistance of

R = R_1 + R_2

and the same resistors in parallel have a combined resistance of

R = \frac{R_1 R_2}{R_1 + R_2}

This means the circuit on the left has combined resistance

\frac{(A+B)(C+D)}{A+B+C+D}

The resistance of the circuit on the right is

\frac{AC}{A+C} + \frac{BD}{B+D}

Adding a short cannot increase resistance, so the resistance of the circuit on the right must be the same or lower than the resistance of the one on the left. Therefore

\frac{(A+B)(C+D)}{A+B+C+D} \geq \frac{AC}{A+C} + \frac{BD}{B+D}

Drawing circuits in LaTeX

I drew the circuits above using the circuitikz package in LaTeX. Here’s the code to draw the circuit on the left.

    \documentclass{article}
    
    \usepackage{tikz}
    \usepackage{circuitikz}
    
    \begin{document}
    
    \begin{figure}[h!]
      \begin{center}
        \begin{circuitikz}
          \draw (0,0)
          to[R=$B$] (0,2) 
          to[R=$A$] (0,4)
          to[short] (2,4) 
          to[R=$C$] (2,2)
          to[R=$D$] (2,0)
          to[short] (0,0);
          \draw (1,4)
          to[short] (1, 5);
          \draw (1, 0)
          to[short] (1, -1);
          %\draw (0, 2)
          %to[short] (2, 2);
        \end{circuitikz}
      \end{center}
    \end{figure}
    
    \end{document}

The code to draw the second circuit removes the %’s commenting out the code that draws the short between the two parallel arms of the circuit.

[1] Alfred Lehman proves a more general result using circuits in SIAM Review, Vol. 4, No. 2 (April 1962) pp. 150–151. Fazlollah Reza gives a purely mathematical proof on pages 151 and 152 of the same journal.

Convert LaTeX to Microsoft Word

I create nearly all my documents in LaTeX, even documents that might be easier to create in Word. The reason is that even if a particular document would be easier to write in Word, my workflow is more efficient if everything is in LaTeX. LaTeX makes small, plain text files that work well with version control and searching, and I can edit them with the same editor I use for writing code and everything else I do.

Usually I send read-only documents to clients. They don’t know or care what program created the PDF I sent them. The fact that they cannot edit my reports is a feature, not a bug: if I’m going to sign off on something, I need to be sure that it doesn’t include any changes that someone else made that I’m unaware of.

But occasionally I do need to send clients a file they can edit, and this usually means Microsoft Word. Lawyers particularly want Word documents.

It’s possible to create a PDF using LaTeX and copy-and-paste the content into a Word document. This works, but you’ll have to redo all your formatting.

A better approach is to use Pandoc. The command

    pandoc foo.tex -o -s foo.docx

will convert the LaTeX file foo.tex directly to the Word document foo.docx. You may have to touch up the Word document a little, but it will retain more of the original formatting than if you when from LaTeX to Word via PDF.

You could wrap this in a script for convenience and so you don’t have to remember the pandoc syntax.

    #!/opt/local/bin/perl

    $tex = $ARGV[0];
    ($doc = $tex) =~ s/\.tex$/.docx/;
    exec "pandoc $tex -o $doc";

You could save this to tex2doc and run

    tex2doc foo.tex

to produce foo.docx.

Update: The syntax when I wrote this post did not work when I revisited this today (2023-11-30) but instead gave several warnings.  What worked today was

    pandoc foo.tex --from latex --to docx > foo.docx

Unfortunately I don’t have the version number that I used when I first wrote this post. Today I was using pandoc version 2.9.2.1.

Matching delimiters and chiastic patterns

When I first started programming I’d occasionally get an error because a delimiter wasn’t matched. Maybe I had a { without a matching }, for example. Or if I were writing Pascal, which I did back in the day, I might have had a BEGIN statement without a corresponding END statement.

At some point I saw someone type an opening delimiter, then the closing delimiter, then back up and insert the content in between. That seemed strange for about five seconds, then I realized it was brilliant and have adopted that practice ever since.

I’ve rarely seen a mismatched delimiter since then, but I ran into one this morning. In order to explain what happened, I’ll need to talk about LaTeX syntax.

LaTeX delimiters

LaTeX has two modes for math symbols: inline and display. Inline symbols appear in the middle of a sentence, while displayed symbols are set apart on their own line and centered. This is analogous to inline elements and block elements in HTML.

When I learned LaTeX, everyone used one dollar sign for inline mode and two dollar signs for display mode. So, for example, you would write $x$ for an x in the context of prose, and $$x = 5$$ to display an equation stating that x equals 5.

Now the preferred syntax is to use \[ to begin display mode and \] to end it. This has several advantages. In particular it makes it easier to debug mismatched delimiter errors since you can count the number of open and closed delimiters; you couldn’t tell without context whether $$ begins or ends a displayed equation.

It’s still common to use dollar signs for inline math mode, but there is an alternate notation that uses \( to begin and \) to end. This notation is not commonly used as far as I know. Escaped parentheses have all the theoretical advantages of escaped brackets, but in practice the scope of inline math content is very small and so it’s not a problem that the opening and closing delimiters are the same.

Org-mode

The only time I use \( and \) is when I’m not directly writing LaTeX but writing an org-mode file that generates LaTeX. I chose to write a client’s repoort in org-mode rather than directly in LaTeX because the document is very long, and so org-mode’s outlining is convenient. I can hide or expand parts of the tree by pressing a tab, for example. Also, the document has a few tables, and tables are easier to write in org-mode’s markdown than in LaTeX.

Unfortunately org-mode sometimes understands expressions like $x$ and sometimes it doesn’t. It would be safer to always write \(x\) but the former works often enough that I tend to use it out of habit. The root of my problem this morning was I had used dollar sign delimiters in a way that confused org-mode.

All else being equal, it’s best when opening and closing delimiters are different. I might advise someone learning LaTeX today to use the \(x\) notation, even though I continue to use $x$.

Chiastic patterns

When I was trying to debug my unmatched delimiter problem, I searched for all \begin and \end statements in my (org-mode generated) LaTeX file by searching on the regular expression \\begin\|\\end. Here’s an example of what I got back (with indentation added).

    \begin{quote}
    \end{quote}
    \begin{table}[htbp]
        \begin{tabular}{ll}
        \end{tabular}
    \end{table}
    \begin{algorithm}
        \begin{algorithmic}[1]
            \begin{enumerate}
            \end{enumerate}
        \end{algorithmic}
    \end{algorithm}

Sometimes opening and closing delimiters are consecutive, such as the beginning and end of the quote. But sometimes you’ll find a chiasmus pattern, named for the Greek letter χ. Notice the table above has an ABBA pattern and the algorithm has an ABCCBA pattern.

This kind of pattern is extremely common in the Bible, though it’s easy for modern readers to miss it. One reason this pattern matters is that the context of a sentence is not only the sentences above and below, but also the parallel sentence in the chiastic pattern. Furthermore, the most important point in a chiasmus is often at the middle.

While chiastic patterns were common in ancient eastern prose, they are far less common in modern western prose. However, chiastic patters are common in programming, so common that no one ever comments on them.

You can see the chiastic pattern in the indentation of source code. The context of a delimiter is its matching delimiter, which may many lines away. And the most important line of code for optimization is the code in the innermost loop, at the focus of a chiasmus.

I became aware of chiastic patterns in the Bible by reading Paul Through Mediterranean Eyes. This book has many outline illustrations that look a lot like source code such as nested for-loops.

Quartal melody: Star Trek fanfare

Intervals of a fourth, such as the interval from C to F, are common in western music, but consecutive intervals of this size are not. Quartal harmony is based on intervals of fourths, and quartal melodies use a lot of fourths, particularly consecutive fourths.

Maybe the most famous quartal melody is the opening fanfare to Star Trek (original series). Here’s a transcription of the opening line:

And here is the same music with the intervals of a fourth circled.

The theme opens with two consecutive fourths, there’s an augmented fourth in the middle, then two more consecutive fourths. There are two major thirds in the phrase above, which you could call diminished fourths.

Incidentally, there are four bell tones before the melody above begins, and the interval between the first two tones is a fourth.

Making the sheet music

Here’s the Lilypond source code I used to create the images above.

    \begin{lilypond}

    \score {
        \relative e'{
        \time 4/4
        \partial 2 a4. d8 |
        \tuplet 3/2 {g4~ 4 ges4} \tuplet 3/2 {d4 b4 e4} |
        a2 ~ 4 ~ 8 8 |
        des1
    }
    \end{lilypond}

This uses a few Lilypond features I hadn’t used before.

  • The \partial command for the two pickup notes.
  • The \tuple command for the triples.
  • The shortcut of not repeating the names repeated notes.

The last point applies twice, writing g4 4 rather than g4 g4 and writing a2 4 8 8 rather than a2 a4 a8 a8.

Related posts

Morse code in musical notation

Maybe this has been done before, but I haven’t seen it: Morse code in musical notation.

Here’s the Morse code alphabet, one letter per measure; in practice there would be less space between letters [1]. A dash is supposed to be three times as long as a dot, so a dot is a sixteenth note and a dash is a dotted eighth note.

Morse code is often at a frequency between 600 and 800 Hz. I picked the E above middle C (660 Hz) because it’s in that range.

Rhythm

Officially a dash is three times as long as a dot. But there’s also a space equal to the length of a dot between parts of a letter. So the sheet music above would be more accurate if you imagined all the sixteenth notes are staccato and the dotted eighth notes are really eighth notes followed by a sixteenth rest.

This doesn’t make much difference because individual operators have varying “fists,” styles of sending Morse code, and won’t exactly follow the official length and spacing rules.

You could rewrite the music above as follows, but it’s all an approximation.

Tempo

According to Wikipedia, “the dit length at 20 words per minute is 50 milliseconds.” So if a sixteenth note has a duration of 50 milliseconds, this would mean five quarter notes per second, or 300 beats per minute. But according to this video, the shortest duration people can distinguish is about 50 milliseconds.

That would imply that copying Morse code at 20 wpm is pushing the limits of human hearing. But copying at 20 wpm is common. Some people can copy Morse code at more than 50 words per minute or more, but at that speed they’re not hearing individual dits and dahs. An H, for example, four dits in a row, sounds like a single rough sound. In fact, they’re not really hearing letters at all but recognizing the shape of words.

How the image was made

I made the image above with LaTeX and Lilypond.

Adding the letters above each measure was kind of a hack. I used rehearsal markings to label the measures, but there was one problem: the software skips from letter H to letter J. That meant that the labels I and all subsequent letters were one ahead of what they should be, and the final letter Z was labeled AA. I tried several tricks, and Lilypond steadfastly refused to label a measure with ‘I’ even though I’ve seen such a label in the documentation.

My way around this was to make it label two consecutive measures with H, then in image editing software I turned the second H into an I. No doubt there’s a better way, but this worked.

I may play around with this and try to improve it a bit. If you have any suggestions, particularly related to Lilypond, please let me know.

Related posts

[1] You could think of the musical score above as a sort of transcription of the Farnsworth method of teaching Morse code. Students learn the letters at full speed, but with extra space between the letters at first. The faster speed discourages consciously counting the dits and dahs, forcing the student to listen to the overall rhythm of the letters.

LaTeX and Lawyers

Lawyers write Word documents and mathematicians write LaTeX documents. Of course makes collaboration awkward, but there are ways to make it better.

One solution is to simply use Word. People who use LaTeX probably know how to use Word, even if they’d rather not, and asking someone else to learn LaTeX is a non-starter. So if I’m coauthoring a document with a lawyer, I’ll use Word.

If I’m writing a report that a lawyer needs to review, I’ll use LaTeX. Using different programs actually helps because it makes a clear distinction between copy editing feedback and authorial responsibility.

This post will give a couple tips for writing reports in LaTeX to be delivered to a lawyer, one trivial and one not quite trivial.

The trivial tip is that \S produces the section sign § (U+00A7) common in legal documents but not so common elsewhere. The not so trivial tip is that the enumitem package lets you change the default labels that LaTeX uses with enumerated items.

Changing enumerated item labels

LaTeX was designed under the assumption that the user wants to focus on logical structure and leave the formatting up to the typesetting program. Consistent with this design philosophy, nested enumerated lists simply wrapped with \begin{enumerate} and \end{enumerate} and individual list items are marked with \item, regardless of the level of nesting. LaTeX takes responsibility for displaying different labels at different levels: Arabic numerals for top-level lists, Roman letters for the next level of list, etc.

When you’re quoting legal documents, however, you don’t want to simply preserve the logical structure of (nested) lists; you want to preserve the labels as well.

Suppose you have the following nested list.

    \begin{enumerate} 
    \item First top-level item
    \item Second top-level item
      \begin{enumerate}
      \item A sub-item
      \item Another sub-item
        \begin{enumerate}
        \item A third-level item
        \item Another third-level item
          \begin{enumerate}
            \item Four levels in
            \end{enumerate}
        \end{enumerate}
      \end{enumerate}
    \end{enumerate}

By default, LaTeX will format this as follows.

But suppose in order to match another document you need the labels to progress as (a), (1), (A), and (i). The following LaTeX code will accomplish this.

    \begin{enumerate} [label={(\alph*)}]
    \item First top-level item
    \item Second top-level item
      \begin{enumerate} [label={(\arabic*)}]
      \item A sub-item
      \item Another sub-item
        \begin{enumerate} [label={(\Alph*)}]
        \item A third-level item
        \item Another third-level item
          \begin{enumerate} [label={(\roman*)}]
            \item Four levels in
            \end{enumerate}      
        \end{enumerate}
      \end{enumerate}
    \end{enumerate}

This produces the following.

Note the parentheses in the labels above. You can replace remove one or both, replace them with square brackets, add periods, etc. as the following example shows.

    \begin{enumerate} [label={\alph*)}]
    \item First top-level item
    \item Second top-level item
      \begin{enumerate} [label={\arabic*.}]
      \item A sub-item
      \item Another sub-item
        \begin{enumerate} [label={(\Alph*)}]
        \item A third-level item
        \item Another third-level item
          \begin{enumerate} [label={[\roman*]}]
            \item Four levels in
            \end{enumerate}      
        \end{enumerate}
      \end{enumerate}
    \end{enumerate}

Here’s what this looks like when compiled.

Related posts