NAME

pp - preprocessor for text files

SYNOPSIS

pp [ options ] input files...

DESCRIPTION

pp reads one or more input files and produces an output stream. Input is copied to the output stream unchanged until the special character ``@'' is encountered. Input processing then takes place, such as macro expansion, file inclusion, or conditional output, as described below.

pp can be used to preprocess only text files as it cannot handle 0 bytes.

OPTIONS

If input files are specified on the command line, they are read as input. If no files are given as arguments, standard input is read. If a single ``-'' character is given as the file name, standard input is read at that point.

Most arguments are scanned and executed in the order in which they appear on the command line.

-I directory
Append directory to the search path used to locate files with the @include command. This argument can be used several times. The current directory is appended to the search path after all options have been read. See the description of the @include command.

-I -
Flush the directory search path.

-D symbol[=value]
Immediately define a symbol, with or without a value. This argument can appear several times.

-U symbol
Immediately undefine a symbol. It is not an error to undefine a symbol that is not defined. This argument can appear several times.

-o outputfile
Write output to outputfile rather than to the standard output.

-n
Suppress output.

-N
Do not read from the standard input when no filenames are present on the command line.

-i filename
Immediately read filename as input without producing output. Definitions in the file will take effect. This argument can appear several times. The current directory is not part of the include path at this point.

-c string
Use string as input when all other options have been parsed, but before reading any input files. This argument can appear several times.

-s N
Turn sccsB exception mode on or off depending on the numeric value of N. A zero value turns <B>sccs<B> recognition off, a non-zero value turns it on. Enabling exception mode will let the string ``@(#)'' pass through unaffected. Exception mode is enabled by default.

-S type
After all input has been read, dump the symbol table onto the standard output (regardless of the -n or -o flags) in a format specified to the type argument.

Functions are not dumped. Symbol names that are not legal identifiers in the target language are silently discarded. Type should be one of the following:

pp
Dump in a format suitable as input to pp.

pp,fun
Dump in a format suitable as input to pp and dump function definitions.

cpp
Dump in a format suitable as input to the C preprocessor. Symbols that look like integers are defined without quotes; symbols with empty values are given no value. All other symbols are defined as C strings.

tcl
Dump Tcl variable definitions.

sh
Dump Bourne shell variable definitions.

-r N
Limit the number of recursive calls to the same function to N. Since b<pp> supports almost no arithmetic operations or list manipulation, any deep recursion is probably unintentional. By default, execution will abort at the 100th call to a defined function. A value equal to or less than zero removes this limitation. Built-in functions are not affected. Recursive inclusion of files is limited by the number of files that a program may keep open simultaneously.

-x c
Modify the behavior when executing commands with @syscmd(), described below. If the single character c is ``v,'' commands will be printed to stderr before execution; if the single character is ``q,'' commands will not be printed to stderr (this is the default).

If c is ``n,'' command execution will be suppressed. The ``f'' flag forces pp to continue even when commands executed with @syscmd return a non-zero exit status.

-d level
Set debug level.

-l
Accept only ``legal'' identifiers. See the Input Syntax section.

-E
Causes references to undefined symbols to import values from the environment, if definitions exist there.

INPUT SYNTAX

Any occurrence of the character ``@'' triggers input processing. Symbol names are made up of characters from the set [a-zA-Z_0-9] and must begin with a non-numeric. Symbol names starting with (and consisting entirely of) numerics are reserved for numbered function parameters.

All balanced constructs, such as quoted input or a symbol definition, must be located in the same input source (file or function).

@@
Outputs a single ``@'' character.

@ newline
Removes the ``@'' and Newline from the input, thus causing line continuation.

@# comment ...
All characters before the next Newline are removed from the input.

@*
Expands to the list of variable arguments (but not those named, see @function below), separated by spaces. If used as an argument (consisting only of ``@*'') in an argument list to a defined function or to the @for construct, this symbol propagates the list of variable arguments as several arguments.

@lt ... gt@
Quoted input. All characters between ``@<'' and the next occurrence of ``>@'' will be copied to the output with no further processing. Quotes do not nest.

@symbol
Macro expansion. The value of symbol is written to the output. Produces an error if the symbol is undefined.

@function(argument list)
Call a defined or built-in function.

@?symbol
Expands to ``1'' if symbol is defined and to ``0'' if it is not. This is intended primarily for use in expressions, as described under the @if construct, see below.

@+symbol
Expands to the value of symbol if it is defined, otherwise expands to an empty string. This is equivalent to the @ifdef(symbol, @symbol).

@;
Skips any horizontal or vertical whitespace that follows.

@.
Stops macro processing. The rest of the current input file is copied unchanged.

To delimit a symbol name from the surrounding input, enclose it in parentheses, for example, @foo can be written as @(foo). The string inside the parentheses is subject to expansion. Unless the -l option has been used to enforce symbol name checking, the string inside the parentheses can contain any characters.

Lines starting with a ``@'' are handled in a special way. Any number of horizontal whitespace (spaces and tabs) can be included between the ``@'' and any characters that follow it; the whitespace is ignored. Also, if the line produces no output, the terminating Newline character is removed. Observe that the definition of ``no output'' may not be as expected. For example, the expansion of a symbol, the value of which is a zero-length string, counts as output.

An argument list starts with a ``('' character and ends with the next matching ``)'' character. Nested parentheses in the list are allowed. Arguments are separated by commas ``,'' outside of any nested parentheses. Leading whitespace (vertical or horizontal) is stripped from the arguments; any other characters, including non-leading whitespace, are considered part of each argument. Horizontal whitespace can separate the argument list from the function name. Argument lists are evaluated (or at least partially, see the description of @if, below).

BUILT-IN FUNCTIONS AND CONSTRUCTS

@include([options,]file)
Read input from file. Files are located using the search path defined on the command line. The current directory is searched last although it may have been included in the search path on the command line. File names that begin with ``./'', ``../'', or ``/'' bypass the search path. Optional options, which should be separated from each other by commas, are:

-noerror
Suppress the error if the file cannot be found or opened.

-nooutput
Include the file in no-output mode. Only definitions in the file will take effect and no text will be included. This is the same as the -i command line option.

-verbatim
Include the file unchanged and without evaluation.

-string, string
If the file cannot be opened, the command expands to string instead. string is the argument after the option.

-or
Any remaining arguments after this option will be used as filenames; the first file that can be opened will be used for inclusion.

-path
Immediately append any remaining arguments to the file search path. No file is included. If a path is specified as ``-'', the entire search path will be flushed. This is equivalent to using the -I command line option.

@if
Conditional input. A maximum of one branch is read as input. The syntax of an if construct is as follows:

@if(expression)

any characters ...

[@elseif(expression)]

any characters ...

[ @else ]

any characters ...

@endif

You can specify any number of @elseif parts and a maximum of one @else.

Expressions can contain the binary operators && , ||, ==, >=, <=, >, <, or a unary !, with the meaning and order of precedence usually used, for example, in C. Parentheses are used for grouping. Evaluation is lazy.

Values that look like integers compare as integers, other values compare as strings. To force a value to be handled as a string, enclose it in double quotes ````''. This is also necessary if the value contains whitespace.

A numeric value of zero is interpreted as false, any non-zero value is interpreted as true. Expressions evaluate to ``1'' or ``0''. The unary negation operator will not operate on a string value.

@if(expression, true-part [,false-part])
Conditional input. A maximum of one second argument and one (optional) third argument can be read as input.

@ifdef(symbol) and @ifndef(symbol)
These are equivalent to @if(@?symbol) and @if(!@?symbol) respectively. They can be used anywhere that @if can be used, but they take a single symbol name instead of an expression, and choose a branch depending on whether or not the symbol is defined.

@define(symbol)
Define a symbol. The syntax is as follows:

@define(symbol)

any characters ...

@enddef

Any characters between @define and @enddef are evaluated and the result is stored as the new value of symbol. Previously undefined symbols are always stored in the global symbol table.

@define(symbol, value)
Shorter form of symbol definition.

@undef([symbol, ...])
Undefine symbols from the local or the global symbol table. Built-in functions cannot be undefined. It is not an error to undefine symbols that are not defined.

@function(name [,arg ...][,...])
Define a function. The syntax is as follows:

@function(name [,arg ...][,...])

any characters ...

@endfun

Characters between @function and @endfun are read but not evaluated and stored as the new definition of function name.

If named arguments appear in the definition, argument values will be assigned their positional names upon invocation. The number of arguments used when calling the function must agree with the function's definition.

The parameters are locally defined during evaluation of the function and shadow any global symbols with the same names.

If the last (or only) argument is ``...'', the function will take a variable number of additional arguments. These can be referred to as @1 through @n, with the additional arguments, n, stored as @0. A reference to an undefined extra argument always returns an empty string, even though a test for its existence returns false.

@local([symbol ...])
Introduces a number of new symbols in the local symbol table of the current function. Further references or new definitions of these symbols will be local. If used in the global scope, @local has no effect. Local variables disappear when a function returns.

@for(name [, arg ...])
Loop construct. The syntax is as follows:

@for(name [, arg ...])

any characters ...

@endfor

Parses any characters up to @endfor once for each argument, letting symbol name contain the value of the current argument. When used inside a function, name will be defined locally; otherwise, name is defined globally, overwriting any previous definition.

@splitfor(name, string [, splitchars])
Operates as @for but the @splitfor form splits its second argument at whitespace or at occurrences of one or more characters from splitchars, then iterates over the resulting list, setting symbol name to each of the resulting values.

@sum(number [, number ...])
Returns the sum of its arguments, interpreted as signed integers. Non-numeric arguments are silently ignored.

@length(string)
Expands to the length of its argument.

@substr(start, length, string)
Returns substring string, which starts at index start and continues for length characters. The first character of the string has index 0.

If either the start or the length parameter begins with a ``-'' character, it refers to a position that is relative to the end of the string.

@index(bigstring, substring [, index])
Returns the index to the indexth occurrence of substring within bigstring, starting from 0.

If no index is specified, the first occurrence is located. Negative indices are subtracted from the number of occurrences of substring, so that -1 returns the last occurrence. If no match is found, an empty string is returned.

@shift(value)
Right-shifts the list of variable function arguments value positions. Does not affect named arguments.

@syscmd([flags,]command)
Execute command in the command interpreter. The standard implementation of @syscmd uses popen and is thus limited by the behavior of that function. Expands to the output from the command. The optional, comma-separated, flags can be a combination of the following:

nl
Ensure the output is terminated with a Newline. A Newline is added if necessary.

-nl
Remove the last Newine from the output, if the last character is a Newline.

number
Specify the largest number of lines to be read from the command.

@tr(fromchars, tochars, string)
Translate characters in string. Any character occurring in fromchars will be mapped to its positional counterpart in tochars.

If tochars is shorter than fromchars, characters with no mapping will be removed. Both strings can contain ranges (for example: a-z or 0-9) of arbitrary characters in ASCII order, and the strings will be substituted before translation. To include a ``-'' character in the string, specify it first. If the last character in tochars is followed by an asterisk ``*'' character, the string is expanded to be filled out with the character that precedes the asterisk.

@message([filename,] string)
Print string to the standard error output or append to a file. If the filename is specified as ``-'', the message is sent to the standard output.

@exit(code [, message [, lines]])
Exit immediately from pp with exit code code. If called with a message, print the message as an error message together with the usual error location trace output or a maximum of lines of trace output. If the code is non-numeric, it is handled as 1.

BUILT-IN VARIABLES

The following variables are predefined in the global symbol table. Variable names that begin with pp and an uppercase letter should be considered to be reserved by pp for use in future versions. Built-in variables cannot be redefined or undefined.

@ppPid
The process id of the current pp process.

@ppFile
The name of the current input file.

@ppLine
The number of the current line in the current input file.

@ppVersion
The version of pp.

@ppGlobals
A space-separated list of the names of all user-defined global variables.

@ppFunctions
A space-separated list of the names of all user-defined functions.

BUGS

Line numbers reported in error messages may be offset by one.

The syntax cannot distinguish between a function called without arguments and a function called with a single argument that is an empty string. This means that functions that take a variable number of arguments will always receive at least one.

The semantics of @* are less than strict.

Some of the syntactic idiosyncrasies are intended to be practical when dealing with input that is structured in some way, for example, source code for other programs.

The -S option accepts any value for type and silently processes any characters that it does not understand as pp, thus producing pp symbols as output.

SEE ALSO

cpp, m4, popen