In LaTeX, sections are labeled with commands like \label{foo}
and referenced like \ref{foo}
. Referring to sections by labels rather than hard-coded numbers allows references to automatically update when sections are inserted, deleted, or rearranged.
For every reference there ought to be a label. A label without a corresponding reference is fine, though it might be a mistake. If you have a reference with no corresponding label, and one label without a reference, there’s a good chance the reference is a typo variation on the unreferenced label.
We’ll build up a one-liner for comparing labels and references. We’ll use grep
to find patterns that look like labels by searching for label{
followed by any string of letters up to but not including a closing brace. We don’t want the label{
part, just what follows it, so we’ll use look-behind syntax, to exclude it from the match.
Here’s our regular expression:
(?<=label{)[^}]+
We’re using Perl-style look-behind syntax, so we’ll need to give grep
the -P
option. Also, we only want the match itself, not matching lines, so we’ll also using the -o
option. This will print all the labels:
grep -oP '(?<=label{)[^}]+' foo.tex
The regex for finding references is the same with label
replaced with ref
.
To compare the list of labels and the list of references, we’ll use the comm
command. For more on comm
, see Set theory at the command line.
We could save the labels to a file, save the references to a file, and run comm
on the two files. But we’re more interested in the differences between the two lists than the two lists, so we could pass both as streams to comm
using the <(...)
syntax. Finally, comm
assumes its inputs are sorted so we pipe the output of both grep
commands to sort
.
Here’s our one-liner
comm -12 <(grep -oP '(?<=label{)[^}]+' foo.tex | sort) <(grep -oP '(?<=ref{)[^}]+' foo.tex | sort)
This will produce three sections of output: labels which are not references, references which not labels, and labels that are also references.
If you just want to see references that don’t refer to a label, give comm
the option -13
. This suppresses the first and third sections of output, leaving only the second section, references that are not labels.
You can also add a -u
option (u for unique) to the calls to sort
to suppress multiple instances of the same label or same reference.