How TeX macros actually work: Part 6

Part 1 Part 2 Part 3 Part 4 Part 5 Part 6

Introduction and overview: The story so far

Over the previous 5 parts of this series we have seen:

how TeX reads the characters within an input file and uses category codes to recognize different “classes” of character and subsequently convert them to character tokens and command tokens;
that a macro is, in effect, comprised of four sections:

<TeX macro primitive><macro name><parameter text>{<replacement text>}

where:

<TeX macro primitive> = one of \def, \edef, \gdef or \xdef;
<macro name>=the name of your macro, such as \foo;
<parameter text> can be “null” (not present) or it can be an string of delimiter tokens and macro parameter tokens;
<replacement text> is the actual body of your macro: the section that is “executed” (expanded) when you call the macro.

how the <parameter text> section can contain a wide range of tokens and that TeX uses this section as a “token template” to match a macro call to its original definition and work out the arguments used with the macro—and how TeX expects your use of a macro to match its original definition;
that, inside TeX, a macro definition is stored as continuous sequence of tokens representing the <parameter text> and <replacement text> sections.

When you use a macro command TeX will first check to see if it takes any parameters. If so, TeX then has to identify the actual arguments being used in your macro call. TeX has to test your macro call against the “token template” definition it has stored in memory. Specifically, TeX uses its internal (stored) definition of your macro’s <parameter text> section as the template through which it can pick out tokens that are the actual arguments, and which tokens are just there to act as delimiters.

The meaning of macro expansion

We are now, finally, ready to move on to the most important topic: how TeX processes macro arguments and actually executes the macro: a process TeX refers to as macro expansion.

But first, a short example: something odd?

To “set the scene” for explaining the mechanism by which TeX processes macros and their arguments, we’ll use a short example to indicate the issues we need to consider.

Arguments are first converted to tokens

The following example is based on one discussed in pages 114–5 of The Advanced TeXbook written by David Salomon. It is chosen because it nicely encapsulates the central ideas within a very short TeX macro.

During normal TeX/LaTeX operations, the $ sign has category code 3 (“math shift”) which switches TeX in/out of inline math mode ( $...$ ) or display math mode ($$...$$)—of course, LaTeX uses $..$ and \[..\] for the same purposes.

Suppose we want a macro that changes the category code of a $ sign to, say, 11 so that we can typeset it like any regular character. We can use the TeX primitive command \catcode and our first attempt at such a macro, \docat, might be

\def\docat #1{\catcode`\$=11 #1}

However, when we try to use it, like this

\begin{document}
\def\docat #1{\catcode`\$=11 #1}
I paid \docat{$90} for that book.
\end{document}

we expect TeX to typeset I paid $90 for that book. but it fails with an error message:

! Missing $ inserted.
<inserted text> 
                $
<to be read again> 
                   \par 
l.7

From the error it seems that the $ used in our macro’s argument is still triggering TeX to typeset maths; clearly, TeX did not change the category code of the $ used in our macro’s argument ($90). The question is why didn’t TeX change the category code of $ to 11 and typeset it as a regular character? The short answer is that TeX first converts macro arguments to tokens before it feeds them into the token list of the <replacement text>—but we’ll look at the underlying mechanisms in much more detail.

What we need to remember is that our notion of TeX using text/characters is only relevant to the content of the file that TeX is reading: as soon as TeX has read-in any characters we are in the world of tokens. TeX macro calls work with tokens, not the actual written/text representation of TeX/LaTeX commands—this will become clearer as we work through the example.

Initially, we might think that our use of the \docat macro in I paid \docat{$90} for that book. is the same as directly writing the equivalent TeX (or LaTeX) code—such as the following, which does work:

\begin{document}
I paid \catcode`\$=11 $90 for that book.
\end{document}

Some TeX code running on Overleaf

However, as we saw above, the way that TeX processes macro arguments produces a result (! Missing $ inserted.) that is quite different to writing out the TeX code: we’ll now explore why that happens.

Macros and arguments as token lists

To fully understand the behaviour of the \docat macro, and its argument ($90), and why it fails, we again need to visualize the definition of the \docat macro and any arguments used (when \docat is called) as lists of tokens, not as a sequence of characters.

As TeX scans your input text, it would recognize \docat as a macro command; after that it checks to see if it takes any parameters—how TeX does that is explained in the following section for readers interested in the finer details.

For those who like the details...

After the macro is called, TeX checks to see if the very first token (in the macro’s stored definition token list) is the end match token: if so, TeX can be certain that the macro does not take any parameters.

An example

The following node-list diagrams compare the token lists for two macros:

\def\foo A#1B{#1}: this has <parameter text> of A#1B, consequently the end match token is not the first token so TeX would proceed to look for parameters;
\def\foo{X}: this does not have a <parameter text> section, consequently the end match token is the first one in the token list and TeX knows not to look for any parameters.

How TeX checks if a macro takes parameters

Toward the “Grand Finale”: expansion

Let’s remind ourselves of the question: why didn’t the following macro work; i.e., why doesn’t TeX change the category code of any $ signs used in the argument of the \docat macro, such as \docat{$90}?

\begin{document}
\def\docat #1{\catcode`\$=11 #1}
I paid \docat{$90} for that book.
\end{document}

As explained above, when TeX scans your input and recognizes a macro command—at a time when TeX is going to execute it—TeX first checks to see if that macro takes any parameters. If so, TeX will need to further scan the input file to identify the actual arguments the user has provided for this specific macro call: TeX has to do this before it can call the actual macro code. Clearly, TeX needs to determine the data that the user wants to provide to the macro.

To identify the arguments present in the input (the user’s macro call), TeX will be guided by the internally stored definition of that macro: specifically, the <parameter text> section of the stored macro definition (token list)—that provides a sort of “token template”. Using that “token template” TeX has to determine which tokens in the user’s macro call are just delimiters (essentially “punctuation”) and which tokens form part of an argument. It is when TeX encounters a match parameter token in the stored macro definition <parameter text> section (“token template”) that it knows to start forming a list of tokens for that particular argument.

As soon as TeX recognizes the need to identify the user’s argument, TeX will scan the input to generate tokens and very carefully check them, token-by-token, against the stored macro definition. TeX carries on gathering tokens for an argument until it detects a token that is actually a delimiter, or if it detects the end match token: in either case, TeX then knows it is time to stop looking for tokens that form part of that argument.

Why the `\docat` macro failed

As noted, before TeX can actually call a macro it has to identify and prepare any arguments that are to be used with that macro. However, to identify the argument(s), ready for feeding into the macro, TeX has to generate each argument as a list of tokens: and that’s the reason for \docat’s failure.

In our example, we provided \docat with an argument of $90 but that argument is first converted to a list of tokens as TeX scans the macro call—the argument is converted to tokens before the macro is actually called. Here, for the argument $90, TeX will generate three character tokens: one token for each of $, 9 and 0.

The following graphic shows the token list generated for the argument $90, prior to being fed into the body of the \docat macro:

TeX token list generated for a macro argument

In the above graphic we can clearly see that the argument token list contains the $ as a character token based on a category code of 3.

As we saw in Parts 1 to 3, character tokens are created using the category code values in operation at the time the character is read-in—i.e., at the time the argument’s token list is created (turned into tokens). At the time the arguments are being tokenized, the \docat macro has not yet been executed so the category-code change we put in the macro call (\catcode`\$=11) does not affect the category codes being used to generate the argument’s tokens.

Once TeX has generated a token list representing the argument of $90, those three character tokens are fed into the actual macro <replacement text>. However, that results in the $ being fed-in as a character token created using category code 3: “math on” and we’ve seen that once a character token is formed, the attached category code is permanent. The $ is not fed into the macro as a character, but as a character token based on the $ having category code 3.

Running `\docat`: macro expansion

TeX refers to the process of “executing” a macro as macro expansion; a term which, in this author’s opinion, is a little confusing but it’s the accepted terminology so we’ll continue to use it.

The real meaning of macro expansion

After TeX detects the \docat command in the user’s input, it scans the arguments and generates a token list for its argument($90). To execute (expand) the macro, TeX switches its gaze away from the user’s input file and starts to read the tokens contained \docat’s <replacement text> token list stored in TeX’s memory.

As TeX processes \docat’s definition it will then see, and execute, the series of tokens originally used to define the macro (catcode, `, \$, =, 1, 1, #1).

The following graphic shows the process of expanding the \docat macro: TeX stops getting tokens from the input file and starts to read tokens from the <replacement text> section of the \docat macro definition stored in memory. TeX proceeds to execute these pre-prepared tokens until it sees an output parameter token which instructs TeX to read (“inject”) and “execute” the argument tokens at this point. In our example, that is three character tokens representing $90 and that results in an error because the pre-prepared character token for $ has category code 3. Because we are dealing with character tokens, not characters, the $ is unaffected by the previous category code change caused by the tokens in \catcode`\$=11.

$Showing the process of expanding the \docat macro$

After TeX has processed the tokens representing \catcode`\$=11, the category code change for $ will now be in effect. TeX then encounters the “special token” called output parameter that tells TeX to insert the token list for the argument. However, that token list is three character tokens, the first of which is a token for a $ with a category code 3 (“math on”) assigned to it: the previous category code change within the macro cannot affect this character token so TeX treats that token as a signal to begin math processing which causes the macro to fail.

Can the `\docat` macro be fixed?

From the discussions above, it is clear that any characters appearing in macro arguments are tokenized using the category codes in operation at the time that tokenization takes place—which, in our example, is always before the <replacement text> of the \docat macro is actually executed. So, how can we ensure that a macro’s arguments have their category codes changed?

One way is to modify \docat to be a parameterless macro which only makes the category code change—it does not have any arguments to tokenize. We then use a second macro, \getarg, which takes a single parameter, and arrange for that macro to have its argument tokenized when the appropriate category code for $ is operational.

\begin{document}
\def\docat{\catcode`\$=11 \getarg} % No parameters, calls a second macro \getarg
\def\getarg#1{#1} %1 parameter whose argument will be tokenized
Now you can run it like this and it will work:

I paid \docat{$90} for that book.
\end{document}

When we use our new version of \docat (like this \docat{$90}) it appears as if the $90 is still being used as an argument for the \docat macro. However, as discussed above, when TeX detects \docat in the input it checks to see if it has any arguments: now it doesn’t, so TeX proceeds to execute (expand) it. The expansion of \docat is the sequence of tokens catcode, `, \$, =, 1, 1, space, getarg and this takes place before TeX starts to read (tokenize) the next characters contained in the input file—i.e., the group {$90}. Remember that when TeX expands a macro it gets its next input by reading the tokens contained in the token list of that macro’s definition; i.e., from its <replacement text> section stored in memory.

TeX will process, and execute, the expansion of \docat and detect the token getarg, recognizing it as a token which represents a command that takes parameters. At this point, TeX will scan the input file for getarg’s argument: the characters: {$90}. As usual, these are tokenized but because TeX has read and processed the expansion of \docat, the characters $90 are tokenized when the category code of $ has been changed to 11. The definition (<replacement text>) of \getarg is simply #1 which means typeset the argument supplied, and that is what happens, resulting in a $ with category code 11 being generated and safely typeset.

Concluding remarks: the story in nodes

The sequence of events arising from re-writing \docat to use the macro \getarg is contained in the following annotated node-list diagram which shows the expansion process for the macro \docat. Readers wishing to carefully study this diagram can download the graphic as a PDF or SVG file for offline use.

$Showing the process of expanding the modified \docat macro and the \getarg macro$