bibclean(1) prettyprint and syntax check BibTeX and Scribe bibliography data base files

SYNOPSIS

bibclean-author ] [ -error-log filename ] [ -help ] [ -? ] [ -init-file filename ] [ -long-field fieldname ] [ -max-width nnn ] [ -[no-]align-equals ] [ -[no-]check-values ] [ -[no-]delete-empty-values ] [ -[no-]file-position ] [ -[no-]fix-font-changes ] [ -[no-]fix-initials ] [ -[no-]fix-names ] [ -[no-]German-style ] [ -[no-]keep-linebreaks ] [ -[no-]keep-parbreaks ] [ -[no-]keep-preamble-spaces ] [ -[no-]keep-spaces ] [ -[no-]keep-string-spaces ] [ -[no-]parbreaks ] [ -[no-]prettyprint ] [ -[no-]print-patterns ] [ -[no-]read-init-files ] [ -[no-]remove-OPT-prefixes ] [ -[no-]scribe ] [ -[no-]trace-file-opening ] [ -[no-]warnings ] [ -version ] ( <infile | bibfile1 bibfile2 bibfile3 ... ) >outfile

All options can be abbreviated to a unique leading prefix.

An explicit file name of -'' represents standard input; it is assumed if no input files are specified.

DESCRIPTION

bibclean prettyprints input BibTeX files to stdout, and checks the brace balance and bibliography entry syntax as well. It can be used to detect problems in BibTeX files that sometimes confuse even BibTeX itself, and importantly, can be used to normalize the appearance of collections of BibTeX files.

Here is a summary of the formatting actions:

• BibTeX items are formatted into a consistent structure with one field = "value" pair per line, and the initial @ and trailing right brace in column 1.
• Tabs are expanded into blank strings; their use is discouraged because they inhibit portability, and can suffer corruption in electronic mail.
• Long string values are split at a blank and continued onto the next line with leading indentation.
• A single blank line separates adjacent bibliography entries.
• Text outside BibTeX entries is passed through verbatim.
• Outer parentheses around entries are converted to braces.
• Personal names in author and editor field values are normalized to the form P. D. Q. Bach'', from P.D.Q. Bach'' and Bach, P.D.Q.''.
• Hyphen sequences in page numbers are converted to en-dashes.
• Month values are converted to standard BibTeX string abbreviations.
• In titles, sequences of upper-case characters at brace level zero are braced to protect them from being converted to lower-case letters by some bibliography styles.
• CODEN, ISBN (International Standard Book Number) and ISSN (International Standard Serial Number) entry values are examined to verify the checksums of each listed number, and correct ISBN hyphenation is automatically supplied.

The standardized format of the output of bibclean facilitates the later application of simple filters, such as bibcheck(1), bibdup(1), bibextract(1), bibindex(1), bibjoin(1), biblabel(1), biblook(1), biborder(1), bibsort(1), citefind(1), and citetags(1), to process the text, and also is the one expected by the GNU Emacs BibTeX support functions.

OPTIONS

Command-line switches may be abbreviated to a unique leading prefix, and letter case is not significant. All options are parsed before any input bibliography files are read, no matter what their order on the command line. Options that correspond to a yes/no setting of a flag have a form with a prefix "no-" to set the flag to no. For such options, the last setting determines the flag value used. This is significant when options are also specified in initialization files (see the INITIALIZATION FILES manual section).

The leading hyphen that distinguishes an option from a filename may be doubled, for compatibility with GNU and POSIX conventions. Thus, -author and --author are equivalent.

To avoid confusion with options, if a filename begins with a hyphen, it must be disguised by a leading absolute or relative directory path, e.g., /tmp/-foo.bib or ./-foo.bib.

-author
Display an author credit on the standard error unit, stderr, and then exit with a success return code. Sometimes an executable program is separated from its documentation and source code; this option provides a way to recover from that.
-error-log filename
Redirect stderr to the indicated file, which will then contain all of the error and warning messages. This option is provided for those systems that have difficulty redirecting stderr.
-help or -?
Display a help message on stderr, giving a usage description, similar to this section of the manual pages, and then exit with a success return code.
-init-file filename
Provide an explicit value pattern initialization file. It will be processed after any system-wide and job-wide initialization files, and may override them. It in turn may be overridden by a subsequent file-specific initialization file. For further details, see the INITIALIZATION FILES manual section.
-long-field fieldname
Suppress warnings that field named fieldname have lenghts exceeding the standard BibTeX limits. NB! This is a Debian-specific extension!
-max-width nnn
bibclean normally limits output line widths to 72 characters, and in the interests of consistency, that value should not be changed. Occasionally, special-purpose applications may require different maximum line widths, so this option provides that capability. The number following the option name can be specified in decimal, octal (starting with 0), or hexadecimal (starting with 0x). A zero or negative value is interpreted to mean unlimited, so -max-width 0 can be used to ensure that each field/value pair appears on a single line.
When -no-prettyprint requests bibclean to act as a lexical analyzer, the default line width is unlimited, unless overridden by this option.
When bibclean is prettyprinting, line wrapping will be done only at a space. Consequently, a long non-blank character sequence may result in the output exceeding the requested line width.
When bibclean is lexing, line wrapping is done by inserting a backslash-newline pair when the specified maximum is reached, so no line length will ever exceed the maximum.
-[no-]align-equals
With the positive form, align the equals sign in key/value assignments at the same column, separated by a single space from the value string. Otherwise, the equals sign follows the key, separated by a single space. Default: no.
-[no-]check-values
With the positive form, apply heuristic pattern matching to field values in order to detect possible errors (e.g., year = "192"'' instead of year = "1992"''), and issue warnings when unexpected patterns are found.
This checking is usually beneficial, but if it produces too many bogus warnings for a particular bibliography file, you can disable it with the negative form of this option. Default: yes.
-[no-]delete-empty-values
With the positive form, remove all field/value pairs for which the value is an empty string. This is helpful in cleaning up bibliographies generated from text editor templates. Compare this option with -[no-]remove-OPT-prefixes described below. Default: no.
-[no-]file-position
With the positive form, give detailed file position information in warning and error messages. Default: no.
-[no-]fix-font-changes
With the positive form, supply an additional brace level around font changes in titles to protect against downcasing by some BibTeX styles. Font changes that already have more than one level of braces are not modified.
For example, if a title contains the Latin phrase {\em Dictyostelium Discoideum} or {\em {D}ictyostelium {D}iscoideum}, then downcasing will incorrectly convert the phrase to lower-case letters. Most BibTeX users are surprised that bracing the initial letters does not prevent the downcase action. The correct coding is {{\em Dictyostelium Discoideum}}. However, there are also legitimate cases where an extra level of bracing wrongly protects from downcasing. Consequently, bibclean will normally not supply an extra level of braces, but if you have a bibliography where the extra braces are routinely missing, you can use this option to supply them.
If you think that you need this option, it is strongly recommended that you apply bibclean to your bibliography file with and without -fix-font-changes, then compare the two output files to ensure that extra braces are not being supplied in titles where they should not be present. You will have to decide which of the two output files is the better choice, then repair the incorrect title bracing by hand.
Since font changes in titles are uncommon, except for cases of the type which this option is designed to correct, it should do more good than harm. Default: no.
-[no-]fix-initials
With the positive form, insert a space after a period following author initials. Default: yes.
-[no-]fix-names
With the positive form, reorder author and editor name lists to remove commas at brace level zero, placing first names or initials before last names. Default: yes.
-[no-]German-style
With the positive form, interpret quote characters ["] inside braced value strings at brace level 1 according to the conventions of the TeX style file german.sty, which overloads quote to simplify input and representation of German umlaut accents, sharp-s (es-zet), ligature separators, invisible hyphens, raised/lowered quotes, French guillemets, and discretionary hyphens. Recognized character combinations will be braced to prevent BibTeX from interpreting the quote as a string delimiter.
Quoted strings receive no special handling from this option, and since German nouns in titles must anyway be protected from the downcasing operation of most BibTeX bibliography styles, German value strings that use the overloaded quote character can always be entered in the form "{...}", without the need to specify this option at all.
Default: no.
-[no-]keep-linebreaks
Normally, line breaks inside value strings are collapsed into a single space, so that long value strings can later be broken to provide lines of reasonable length.
With the positive form, linebreaks are preserved in value strings. If -max-width is set to zero, this preserves the original line breaks. Spacing outside value strings remains under bibclean's control, and is not affected by this option.
Default: no.
-[no-]keep-parbreaks
With the positive form, preserve paragraph breaks (either formfeeds, or lines containing only spaces) in value strings. Normally, paragraph breaks are collapsed into a single space. Spacing outside value strings remains under bibclean's control, and is not affected by this option. Default: no.
-[no-]keep-preamble-spaces
With the positive form, preserve all whitespace in @Preamble{...} entries. Default: no.
-[no-]keep-spaces
With the positive form, preserve all spaces in value strings. Normally, multiple spaces are collapsed into a single space. This option can be used together with -keep-linebreaks, -keep-parbreaks, and -max-width 0 to preserve the form of value strings while still providing syntax and value checking. Spacing outside value strings remains under bibclean's control, and is not affected by this option. Default: no.
-[no-]keep-string-spaces
With the positive form, preserve all whitespace in @String{...} entries. Default: no.
-[no-]parbreaks
With the negative form, a paragraph break (either a formfeed, or a line containing only spaces) is not permitted in value strings, or between field/value pairs. This may be useful to quickly trap runaway strings arising from mismatched delimiters. Default: yes.
-[no-]prettyprint
Normally, bibclean functions as a prettyprinter. However, with the negative form of this option, it acts as a lexical analyzer instead, producing a stream of lexical tokens. See the LEXICAL ANALYSIS manual section for further details. Default: yes.
-[no-]print-patterns
With the positive form, print the value patterns read from initialization files as they are added to internal tables. Use this option to check newly-added patterns, or to see what patterns are being used.
These patterns are the ones that will be used in checking value strings for valid syntax, and all of them are specified in initialization files, rather than hard-coded into the program. For further details, see the INITIALIZATION FILES manual section. Default: no.
With the negative form, suppress loading of system-, user-, and file-specific initialization files. Initializations will come only from those files explicitly given by -init-file filename options. Default: yes.
-[no-]remove-OPT-prefixes
With the positive form, remove the OPT'' prefix from each field name where the corresponding value is not an empty string. The prefix OPT'' must be entirely in upper-case to be recognized.
This option is for bibliographies generated with the help of the GNU Emacs BibTeX editing support, which generates templates with optional fields identified by the OPT'' prefix. Although the function M-x bibtex-remove-OPT normally bound to the keystrokes C-c C-o does the job, users often forget, with the result that BibTeX does not recognize the field name, and ignores the value string. Compare this option with -[no-]delete-empty-values described above. Default: no.
-[no-]scribe
With the positive form, accept input syntax conforming to the Scribe document system. The output will be converted to conform to BibTeX syntax. See the SCRIBE BIBLIOGRAPHY FORMAT manual section for further details. Default: no.
-[no-]trace-file-opening
With the positive form, record in the error log file the names of all files which bibclean attempts to open. Use this option to identify where initialization files are located. Default: no.
-[no-]warnings
With the positive form, allow all warning messages. The negative form is not recommended since it may mask problems that should be repaired. Default: yes.
-version
Display the program version number on stderr, and then exit with a success return code. This will also include an indication of who compiled the program, the host name on which it was compiled, the time of compilation, and the type of string-value matching code selected, when that information is available to the compiler.

ERROR RECOVERY AND WARNINGS

When bibclean detects an error, it issues an error message to both stderr and stdout. That way, the user is clearly notified, and the output bibliography also contains the message at the point of error.

Error messages begin with a distinctive pair of queries, ??, beginning in column 1, followed by the input file name and line number. If the -file-position option was specified, they also contain the input and output positions of the current file, entry, and value. Each position includes the file byte number, the line number, and the column number. In the event of a runaway string argument, the entry and value positions should precisely pinpoint the erroneous bibliography entry, and the file positions will indicate where it was detected, which may be rather later in the files.

Warning messages identify possible problems, and are therefore sent only to stderr, and not to stdout, so they never appear in the output file. They are identified by a distinctive pair of percents, %%, beginning in column 1, and as with error messages, may be followed by file position messages if the -file-position option was specified.

For convenience, the first line of each error and warning message sent to stderr is formatted according to the expectations of the GNU Emacs next-error command. You can invoke bibclean with the Emacs M-x compile<RET>bibclean filename.bib >filename.new command, then use the next-error command, normally bound to C-x  (that's a grave, or back, accent), to move to the location of the error in the input file.

If error messages are ignored, and left in the output bibliography file, they will precipitate an error when the bibliography is next processed with BibTeX.

After issuing an error message, bibclean then resynchronizes its input by copying it verbatim to stdout until a new bibliography entry is recognized on a line in which the first non-blank character is an at-sign (@). This ensures that nothing is lost from the input file(s), allowing corrections to be made in either the input or the output files. However, if bibclean detects an internal error in its data structures, it will terminate abruptly without further input or output processing; this kind of error should never happen, and if it does, it should be reported immediately to the author of the program. Errors in initialization files, and running out of dynamic memory, will also immediately terminate bibclean.

INITIALIZATION FILES

bibclean can be compiled with one of three different types of pattern matching; the choice is made by the installer at compile time:
• The original version uses explicit hand-coded tests of value-string syntax.
• The second version uses regular-expression pattern-matching host library routines together with regular-expression patterns that come entirely from initialization files.
• The third version uses special patterns that come entirely from initialization files.

This Debianized version of bibclean uses the third version. However, command-line options can also be specified in initialization files, no matter which pattern matching choice was selected.