pp - preprocessor for text files
pp [ options ] input files...
pp reads one or more input files and produces an output stream. Input is
copied to the output stream unchanged until the special character ``@'' is
encountered. Input processing then takes place, such as macro expansion,
file inclusion, or conditional output, as described below.
pp
can be used to preprocess only text files as it cannot handle 0 bytes.
If input files are specified on the command line, they are read as input.
If no files are given as arguments, standard input is read. If a single
``-'' character is given as the file name, standard input is read at that
point.
Most arguments are scanned and executed in the order in which they appear
on the command line.
- -I directory
-
Append directory to the search path used to locate files with the
@include command. This argument can be used several times. The current directory is
appended to the search path after all options have been read. See the
description of the @include command.
- -I -
-
Flush the directory search path.
- -D symbol[=value]
-
Immediately define a symbol, with or without a value. This argument can
appear several times.
- -U symbol
-
Immediately undefine a symbol. It is not an error to undefine a symbol that
is not defined. This argument can appear several times.
- -o outputfile
-
Write output to outputfile rather than to the standard output.
- -n
-
Suppress output.
- -N
-
Do not read from the standard input when no filenames are present on the
command line.
- -i filename
-
Immediately read filename as input without producing output. Definitions in the file will take
effect. This argument can appear several times. The current directory is
not part of the include path at this point.
- -c string
-
Use string as input when all other options have been parsed, but before reading any
input files. This argument can appear several times.
- -s N
-
Turn sccsB exception mode on or off depending on the numeric value of N. A zero value turns <B>sccs<B> recognition off, a non-zero
value turns it on. Enabling exception mode will let the string ``@(#)''
pass through unaffected. Exception mode is enabled by default.
- -S type
-
After all input has been read, dump the symbol table onto the standard
output (regardless of the -n or -o flags) in a format specified to the type argument.
Functions are not dumped. Symbol names that are not legal identifiers in
the target language are silently discarded.
Type
should be one of the following:
- pp
-
Dump in a format suitable as input to
pp.
- pp,fun
-
Dump in a format suitable as input to pp and dump function definitions.
- cpp
-
Dump in a format suitable as input to the C preprocessor. Symbols that look
like integers are defined without quotes; symbols with empty values are
given no value. All other symbols are defined as C strings.
- tcl
-
Dump Tcl variable definitions.
- sh
-
Dump Bourne shell variable definitions.
- -r N
-
Limit the number of recursive calls to the same function to N. Since b<pp> supports almost no arithmetic operations or list
manipulation, any deep recursion is probably unintentional. By default,
execution will abort at the 100th call to a defined function. A value equal
to or less than zero removes this limitation. Built-in functions are not
affected. Recursive inclusion of files is limited by the number of files
that a program may keep open simultaneously.
- -x c
-
Modify the behavior when executing commands with
@syscmd(), described below. If the single character
c
is ``v,'' commands will be printed to
stderr
before execution; if the single character is ``q,'' commands will not be
printed to stderr (this is the default).
If c is ``n,'' command execution will be suppressed. The ``f'' flag forces pp to continue even when commands executed with @syscmd
return a non-zero exit status.
- -d level
-
Set debug level.
- -l
-
Accept only ``legal'' identifiers. See the Input Syntax section.
- -E
-
Causes references to undefined symbols to import values from the
environment, if definitions exist there.
Any occurrence of the character ``@'' triggers input processing. Symbol
names are made up of characters from the set [a-zA-Z_0-9] and must begin with a non-numeric. Symbol names starting with (and
consisting entirely of) numerics are reserved for numbered function
parameters.
All balanced constructs, such as quoted input or a symbol definition, must
be located in the same input source (file or function).
- @@
-
Outputs a single ``@'' character.
- @ newline
-
Removes the ``@'' and Newline from the input, thus causing line
continuation.
- @# comment ...
-
All characters before the next Newline are removed from the input.
- @*
-
Expands to the list of variable arguments (but not those named, see
@function below), separated by spaces. If used as an argument (consisting only of
``@*'') in an argument list to a defined function or to the @for construct, this symbol propagates the list of variable arguments as several
arguments.
- @lt ... gt@
-
Quoted input. All characters between ``@<'' and the next occurrence of ``>@'' will be copied to the output with no further processing. Quotes do not
nest.
- @symbol
-
Macro expansion. The value of symbol is written to the output. Produces an error if the symbol is undefined.
- @function(argument list)
-
Call a defined or built-in function.
- @?symbol
-
Expands to ``1'' if symbol is defined and to ``0'' if it is not. This is intended primarily for use in
expressions, as described under the
@if construct, see below.
- @+symbol
-
Expands to the value of symbol if it is defined, otherwise expands to an empty string. This is equivalent
to the @ifdef(symbol,
@symbol).
- @;
-
Skips any horizontal or vertical whitespace that follows.
- @.
-
Stops macro processing. The rest of the current input file is copied
unchanged.
To delimit a symbol name from the surrounding input, enclose it in
parentheses, for example, @foo can be written as
@(foo). The string inside the parentheses is subject to expansion. Unless the -l option has been used to enforce symbol name checking, the string inside the
parentheses can contain any characters.
Lines starting with a ``@'' are handled in a special way. Any number of
horizontal whitespace (spaces and tabs) can be included between the ``@''
and any characters that follow it; the whitespace is ignored. Also, if the
line produces no output, the terminating Newline character is removed.
Observe that the definition of ``no output'' may not be as expected. For
example, the expansion of a symbol, the value of which is a zero-length
string, counts as output.
An argument list starts with a ``('' character and ends with the next matching ``)''
character. Nested parentheses in the list are allowed. Arguments are
separated by commas ``,'' outside of any nested parentheses. Leading
whitespace (vertical or horizontal) is stripped from the arguments; any
other characters, including non-leading whitespace, are considered part of
each argument. Horizontal whitespace can separate the argument list from
the function name. Argument lists are evaluated (or at least partially, see
the description of @if, below).
- @include([options,]file)
-
Read input from file. Files are located using the search path defined on the command line. The
current directory is searched last although it may have been included in
the search path on the command line. File names that begin with ``./'',
``../'', or ``/'' bypass the search path. Optional options, which should be
separated from each other by commas, are:
- -noerror
-
Suppress the error if the file cannot be found or opened.
- -nooutput
-
Include the file in no-output mode. Only definitions in the file will take
effect and no text will be included. This is the same as the -i command line option.
- -verbatim
-
Include the file unchanged and without evaluation.
- -string, string
-
If the file cannot be opened, the command expands to string
instead. string is the argument after the option.
- -or
-
Any remaining arguments after this option will be used as filenames; the
first file that can be opened will be used for inclusion.
- -path
-
Immediately append any remaining arguments to the file search path. No file
is included. If a path is specified as ``-'', the entire search path will
be flushed. This is equivalent to using the -I command line option.
- @if
-
Conditional input. A maximum of one branch is read as input. The syntax of
an if construct is as follows:
@if(expression)
any characters ...
[@elseif(expression)]
any characters ...
[ @else ]
any characters ...
@endif
You can specify any number of @elseif parts and a maximum of one
@else.
Expressions can contain the binary operators && , ||, ==,
>=, <=, >, <
, or a unary !
, with the meaning and order of precedence usually used, for example, in C.
Parentheses are used for grouping. Evaluation is lazy.
Values that look like integers compare as integers, other values compare as
strings. To force a value to be handled as a string, enclose it in double
quotes ````''. This is also necessary if the value contains whitespace.
A numeric value of zero is interpreted as false, any non-zero value is
interpreted as true. Expressions evaluate to ``1'' or ``0''. The unary
negation operator will not operate on a string value.
- @if(expression, true-part [,false-part])
-
Conditional input. A maximum of one second argument and one (optional)
third argument can be read as input.
- @ifdef(symbol) and @ifndef(symbol)
-
These are equivalent to @if(@?symbol) and @if(!@?symbol)
respectively. They can be used anywhere that
@if
can be used, but they take a single symbol name instead of an expression,
and choose a branch depending on whether or not the symbol is defined.
- @define(symbol)
-
Define a symbol. The syntax is as follows:
@define(symbol)
any characters ...
@enddef
Any characters between @define and @enddef are evaluated and the result is stored as the new value of symbol. Previously undefined symbols are always stored in the global symbol
table.
- @define(symbol, value)
-
Shorter form of symbol definition.
- @undef([symbol, ...])
-
Undefine symbols from the local or the global symbol table. Built-in
functions cannot be undefined. It is not an error to undefine symbols that
are not defined.
- @function(name [,arg ...][,...])
-
Define a function. The syntax is as follows:
@function(name [,arg ...][,...])
any characters ...
@endfun
Characters between @function and @endfun are read but not
evaluated and stored as the new definition of function name.
If named arguments appear in the definition, argument values will be
assigned their positional names upon invocation. The number of arguments
used when calling the function must agree with the function's definition.
The parameters are locally defined during evaluation of the function and
shadow any global symbols with the same names.
If the last (or only) argument is ``...'', the function will take a
variable number of additional arguments. These can be referred to as
@1 through @n, with the additional arguments, n, stored as @0. A reference to an undefined extra argument always returns an empty
string, even though a test for its existence returns false.
- @local([symbol ...])
-
Introduces a number of new symbols in the local symbol table of the current
function. Further references or new definitions of these symbols will be
local. If used in the global scope, @local has no effect. Local variables disappear when a function returns.
- @for(name [, arg ...])
-
Loop construct. The syntax is as follows:
@for(name [, arg ...])
any characters ...
@endfor
Parses any characters up to @endfor once for each argument, letting symbol name contain the value of the current argument. When used inside a function, name will be defined locally; otherwise, name
is defined globally, overwriting any previous definition.
- @splitfor(name, string [, splitchars])
-
Operates as @for but the @splitfor form splits its second argument at whitespace or at occurrences of one or
more characters from
splitchars, then iterates over the resulting list, setting symbol name to each of the resulting values.
- @sum(number [, number ...])
-
Returns the sum of its arguments, interpreted as signed integers.
Non-numeric arguments are silently ignored.
- @length(string)
-
Expands to the length of its argument.
- @substr(start, length, string)
-
Returns substring string, which starts at index start and continues for length characters. The first character of the string has index 0.
If either the start or the length parameter begins with a ``-'' character, it refers to a position that is
relative to the end of the string.
- @index(bigstring, substring [, index])
-
Returns the index to the indexth occurrence of substring within
bigstring, starting from 0.
If no index is specified, the first occurrence is located. Negative indices are
subtracted from the number of occurrences of substring, so that -1 returns the last occurrence. If no match is found, an empty string is
returned.
- @shift(value)
-
Right-shifts the list of variable function arguments value
positions. Does not affect named arguments.
- @syscmd([flags,]command)
-
Execute command in the command interpreter. The standard implementation of @syscmd uses popen and is thus limited by the behavior of that function. Expands to the output
from the command. The optional, comma-separated, flags can be a combination of the following:
- nl
-
Ensure the output is terminated with a Newline. A Newline is added if
necessary.
- -nl
-
Remove the last Newine from the output, if the last character is a Newline.
- number
-
Specify the largest number of lines to be read from the command.
- @tr(fromchars, tochars, string)
-
Translate characters in string. Any character occurring in
fromchars will be mapped to its positional counterpart in
tochars.
If tochars is shorter than fromchars, characters with no mapping will be removed. Both strings can contain
ranges (for example: a-z or 0-9) of arbitrary characters in ASCII order, and the strings will be
substituted before translation. To include a ``-'' character in the string,
specify it first. If the last character in tochars is followed by an asterisk ``*'' character, the string is expanded to be
filled out with the character that precedes the asterisk.
- @message([filename,] string)
-
Print string to the standard error output or append to a file. If the filename is
specified as ``-'', the message is sent to the standard output.
- @exit(code [, message [, lines]])
-
Exit immediately from pp with exit code code. If called with a
message, print the message as an error message together with the usual error
location trace output or a maximum of lines of trace output. If the code is non-numeric, it is handled as 1.
The following variables are predefined in the global symbol table. Variable
names that begin with pp and an uppercase letter should be considered to be reserved by pp for use in future versions. Built-in variables cannot be redefined or
undefined.
- @ppPid
-
The process id of the current pp process.
- @ppFile
-
The name of the current input file.
- @ppLine
-
The number of the current line in the current input file.
- @ppVersion
-
The version of pp.
- @ppGlobals
-
A space-separated list of the names of all user-defined global variables.
- @ppFunctions
-
A space-separated list of the names of all user-defined functions.
Line numbers reported in error messages may be offset by one.
The syntax cannot distinguish between a function called without arguments
and a function called with a single argument that is an empty string. This
means that functions that take a variable number of arguments will always
receive at least one.
The semantics of
@*
are less than strict.
Some of the syntactic idiosyncrasies are intended to be practical when
dealing with input that is structured in some way, for example, source code
for other programs.
The -S option accepts any value for type and silently processes any characters that it does not understand as pp, thus producing pp symbols as output.
cpp, m4, popen