NOTA BENE: This is a reproduction of Jukka Korpela's page via the wayback machine. Links on this page go to the associated archived pages on the wayback machine, except for this one which returns to the Thoughtpiece Including Surprises on The Moonspeaker.
The Web Authoring FAQ contains an answer to the question How do I include one file in another?, mentioning C preprocessor as one possible technique. This document gives some details about that approach, which lets you do some common tasks like file inclusion and simple macros easily, assuming just that you have a C compiler at your disposal.
The definition of the
C programming language specifies, in addition to the
statements of the language, a set of
preprocessing directives. For example,
#include "foo.c"
instructs the compiler to fetch the
content of the file foo.c
and behave as if that content
were in place of the directive.
A typical C compiler can be instructed to execute such directives
only, without processing the resulting data as a C program.
This means that one can
use a C compiler as a general-purpose preprocessor
for files other than C source programs, too.
The options (switches) you need to give in such a case depend on the C compiler. The following instructions apply to the Gnu C compiler (gcc). For other compilers, the options could be similar, but please check the applicable manuals.
option | effect |
---|---|
-E | preprocessing only |
-x c | interpret files as C source files (instead of treating them as object files); this option is given to make the compiler preprocess them |
-P | don't generate #line directives
(which would of course mess things up in HTML documents!)
|
-C | do not ignore comments (since an HTML document might contain data which would be a comment in C) |
When these options are used, gcc writes the preprocessed data
(e.g. with #include
directives replaced by the contents of
the files referred to) to standard output. Thus, assuming you have
a document
demo.htm
which is an HTML document except for the use of #include
directives, you can generate an HTML document demo.html
from it
with the command
gcc -E -x c -P -C demo.htm >demo.html
Do you find the use of the extension .htm
for such files confusing? Well, you can use any extension you like,
of course. I use .htm
because then the
Emacs
editor automatically enters a mode suitable for editing HTML documents
when it opens such a file.
The following simple document contains two #include
directives which refer to start.html
(containing, in this
case, just a DOCTYPE
declaration) and to tail.htm
which contains some simple "trailer" data. It also uses
the __DATE__
macro which gives the
date of preprocessing as a string. Note that it could be misleading
to use it in a statement about last update.
#include "start.html"
<title>Demo</title>
<p>This is just a demonstration.</p>
#include "tail.html"
Processed the way described above, we get the following HTML document:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN">
<title>Demo</title>
<p>This is just a demonstration.</p>
<hr title="Information about this document">
<p>Last update: "May 24 1999".</p>
<address>
<a href="../">Jukka Korpela</a>,
<a href="mailto:jkorpela@malibutelecom.com">jkorpela@malibutelecom.com</a>
</address>
You might get error messages like the following from the preprocessor:
foo.htm:8: unterminated character constant
The reason is that the preprocessor parses its input
according to certain rules, recognizing things like quoted strings.
For example, if your document contains the word
Don't
then the preprocessor will take the apostrophe as a starting
single quote and look for a closing single quote at the same line;
when it does not find one, it issues the error message.
It is advisable to take a look at the lines reported in such
messages. They might contain real typos, even unclosed attribute
values in HTML. But in cases like the one mentioned as an example above,
you can just ignore the messages. However, if they bother you, you
can prevent them by
presenting the "homeless" apostrophes and quotation marks as
numeric character references, namely '
for the apostrophe (')
and
"
(or "
) for the quotation mark (").
This cannot be done, however, for quotation marks and apostrophes
used to delimit attribute values. (On the other hand,
it is usually advisable not to split attribute values across
lines.)
You can also define and macros of your own. The simplest use
is for defining constants such as short names
for long words and phrases. Example:
#define i18n internationalization
You can then use i18n
wherever you like in the document
and it will be expanded by the preprocessor. Note that this
is a simple, case-sensitive, exact-match textual substitution.
Moreover, the preprocessor inserts a space character
after the expansion,
but in HTML this usually does not matter (except within
PRE
elements).
Remember that in C, a newline terminates a macro definition.
Use backslash (\
) at the end of a line in order to suppress
that, i.e. to write a macro to two or more lines.
A macro can have arguments too. An example:
#define TITLE(thetitle) <title>thetitle</title><h2>thetitle</h2>
You could put that definition into your generic start.html
(or whatever you'd call it), and then you could begin your documents
in the following style:
#include "start.html" TITLE(Simple demo)
Some extra spaces might be inserted by the preprocessor, but normally
they don't matter in HTML. The macro invocation
TITLE(Simple demo)
would expand to
<title> Simple demo </title><h2> Simple demo </h2>
Care must be taken when a macro invocation appears between quotation marks. You need to write a directive like
#define Q(string) # string
for such cases. The usage is probably best illustrated by an example which I used when generating a page for testing the effect of some CSS rules. I needed to write lots of things like
<span style="font-size: xx-small ;font-family:sans-serif">
with different size specifiers appearing in place of xx-small
.
So I defined
#define Q(string) # string #define STY(siz) <span style=Q(font-size: SIZE ;font-family:sans-serif)>
and use STY(xx-small)
, STY(x-small)
, etc.
Actually, it was a somewhat more complicated macro, but this hopefully
illustrates the technique. Due to HTML syntax, the quotation marks need to
be there (in the HTML document generated by the preprocessor) but if I wrote
them directly, it wouldn't work, since a C preprocessor does not
recognize macro invocations within quoted strings.
Having written this document, I found the document Using the C Preprocessor to Maintain HTML Code by Dr. George F. Corliss. It gives a nice overview of what else you might do with a C preprocessor, and a real-life illustration.
And I found it via Micodocs, specifically via the nice, descriptive list HTML Preprocessors. Consider taking a look before deciding which preprocessor you'd like to use. It depends on the complexity of the desired preprocessing; the usefulness of a C preprocessor for HTML documents is relatively limited. In particular, note that the GTML preprocessor lets you use a syntax similar to the one discussed here, and is easy to use; you can start with simple features, then proceed to more advanced issues if needed.
Date of last update: 2000-06-07
Jukka Korpela