An xmlformat configuration file specifies formatting
options to be associated with particular elements in XML documents. For
example, you can format <itemizedlist>
elements
differently than <orderedlist>
elements.
(However, you cannot format <listitem>
elements
differentially depending on the type of list in which they occur.) You
can also specify options for a "pseudo-element" named
*DEFAULT
. These options are applied to any element
for which the options are not specified explicitly.
The following sections describe the general syntax of configuration files, then discuss the allowable formatting options that can be assigned to elements.
A configuration file consists of sections. Each section begins with a line that names one or more elements. (Element names do not include the "<" and ">" angle brackets.) The element line is followed by option lines that each name a formatting option and its value. Each option is applied to every element named on its preceding element line.
Element lines and option lines are distinguished based on leading whitespace (space or tab characters):
Element lines have no leading whitespace.
Option lines begin with at least one whitespace character.
On element lines that name multiple elements, the names should be separated by spaces or commas. These are legal element lines:
para title para,title para, title
On option lines, the option name and value should be separated by whitespace and/or an equal sign. These are legal option lines:
normalize yes normalize=yes normalize = yes
Blank lines are ignored.
Lines that begin "#" as the first non-white character are taken as comments and ignored. Comments beginning with "#" may also follow the last element name on an element line or the option value on an option line.
Example configuration file:
para format block entry-break 1 exit-break 1 normalize yes wrap-length 72 literal replaceable userinput command option emphasis format inline programlisting format verbatim
It is not necessary to specify all of an element's options at the same time. Thus, this configuration file:
para, title format block normalize yes title wrap-length 50 para wrap-length 72
Is equivalent to this configuration file:
para format block normalize yes wrap-length 72 title format block normalize yes wrap-length 50
If an option is specified multiple times for an element, the last value
is used. For the following configuration file, para
ends up with a wrap-length
value of 68:
para format block wrap-length 60 wrap-length 72 para wrap-length 68
To continue an element line onto the next line, end it with a backslash character. xmlformat will interpret the next line as containing more element names for the current section:
chapter appendix article \ section simplesection \ sect1 sect2 sect3 \ sect4 sect5 format block entry-break 1 element-break 2 exit-break 1 normalize no subindent 0
Continuation can be useful when you want to apply a set of formatting options to a large number of elements. Continuation lines are allowed to begin with whitespace (though it's possible they may appear to the casual observer to be option lines if they do).
Continuation is not allowed for option lines.
A configuration file may contain options for two special
"pseudo-element" names: *DOCUMENT
and
*DEFAULT
. (The names begin with a "*" character so as
not to conflict with valid element names.)
*DEFAULT
options apply to any element that appears in
the input document but that was not configured explicitly in the
configuration file.
*DOCUMENT
options are used primarily to control line
breaking between top-level nodes of the document, such as the XML
declaration, the DOCTYPE
declaration, the root
element, and any comments or processing instructions that occur outside
the root element.
It's common to supply *DEFAULT
options in a
configuration file to override the built-in values. However, it's
normally best to leave the *DOCUMENT
options alone,
except possibly to change the element-break
value.
Before reading the input document, xmlformat sets up formatting options as follows:
It initializes the built-in *DOCUMENT
and
*DEFAULT
options,
It reads the contents of the configuration file, assigning formatting options to elements as listed in the file.
Note that although *DOCUMENT
and
*DEFAULT
have built-in default values, the defaults
they may be overridden in the configuration file.
After reading the configuration file, any missing formatting options for
each element are filled in using the options from the
*DEFAULT
pseudo-element. For example, if
para
is defined as a block element but no
subindent
value is defined, para
"inherits" the subindent
value from the
*DEFAULT
settings.
Missing options are filled in from the *DEFAULT
options only after reading the entire configuration
file. For the settings below, *DEFAULT
has a
subindent
value of 2 (not 0) after the file has been
read. Thus, para
also is assigned a
subindent
value of 2.
*DEFAULT subindent 0 para format block normalize yes *DEFAULT subindent 2
The allowable formatting options are as follows:
format {block | inline | verbatim} entry-breakn
element-breakn
exit-breakn
subindentn
normalize {no | yes} wrap-lengthn
A value list shown as { value1 | value2 | ... }
indicates that the option must take one of the values in the list. A
value shown as n
indicates that the option
must have a numeric value.
Details for each of the formatting options follow.
format {block | inline | verbatim}
This option is the most important, because it determines the general way in which the element is formatted, and it determines whether the other formatting options are used or ignored:
For block elements, all other formatting options are significant.
For inline elements, all other formatting options are ignored. Inline elements are normalized, wrapped, and indented according to the formatting options of the enclosing block element.
For verbatim elements, all other formatting options are ignored. The element content is written out verbatim (literally), without change, even if it contains other sub-elements. This means no normalization of the contents, no indenting, and no line-wrapping. Nor are any breaks added within the element.
A configuration file may specify any option for elements of any type, but xmlformat will ignore inapplicable options. One reason for this is to allow you to experiment with changing an element's format type without having to disable other options.
If you use the --show-config
command-line option to see
the configuration that xmlformat will use for
processing a document, it displays only the applicable options for each
element.
entry-break
n
element-break
n
exit-break
n
These options indicate the number of newlines (line breaks) to write after the element opening tag, between child sub-elements, and before the element closing tag. They apply only to block elements.
A value of 0 means "no break". A value of 1 means one newline, which
causes the next thing to appear on the next line with no intervening
blank line. A value n
greater than 1 produces
n
-1 intervening blank lines. Some examples:
An entry-break
value of 0 means the next token will
appear on same line immediately after the opening tag.
An exit-break
value of 0 means the closing tag will
appear on same line immediately after the preceding token.
subindent
n
This option indicates the number of spaces by which to indent child
sub-elements, relative to the indent of the enclosing parent. It applies
only to block elements. The value may be 0 to suppress indenting, or a
number n
greater than 0 to produce indenting.
This option does not affect the indenting of the element itself. That is
determined by the subindent
value of the element's
own parent.
Note: subindent
does not apply to text nodes in
non-normalized blocks, which are written as is without reformatting.
subindent
also does not apply to verbatim elements or
to the following non-element constructs, all of which are written with
no indent:
Processing instructions
Comments
DOCTYPE
declarations
CDATA
sections
normalize {no | yes}
This option indicates whether or not to perform whitespace normalization in text. This option is used for block elements, but it also affects inline elements because their content is normalized the same way as their enclosing block element.
If the value is no
, whitespace-only text nodes are
not considered significant and are discarded, possibly to be replaced
with line breaks and indentation.
If the value is yes
, normalization causes removal of
leading and trailing whitespace within the element, and conversion of
runs of whitespace characters (including line-ending characters) to
single spaces.
Text normalization is discussed in more detail in Section 3.3, “ Text Handling ”.
wrap-length
n
Line-wrapping length. This option is used only for block elements and line-wrapping occurs only if normalization is enabled. The option affects inline elements because they are line-wrapped the same way as their enclosing block element.
Setting the wrap-length
option to 0 disables
wrapping. Setting it to a value n
greater
than 0 enables wrapping to lines at most n
characters long. (Exception: If a word longer than
n
characters occurs in text to be wrapped, it
is placed on a line by itself. A word will never be broken into pieces.)
The line length is adjusted by the current indent when wrapping is
performed to keep the right margin of wrapped text constant. For example
if the wrap-length
value is 60 and the current indent
is 10, lines are wrapped to a maximum of 50 characters.
Any prevailing indent is added to the beginning of each line, unless the
text will be written immediately following a tag on the same line. This
can occur if the text occurs after the opening tag of the block and the
entry-break
is 0, or the text occurs after the
closing tag of a sub-element and the element-break
is
0.