bib2ris-utf8(1) converts bibtex bibliographic data to the RIS format

Other Alias

bib2ris

SYNOPSIS

bib2ris [-e log-destination] [-h] [-j] [-l log-level] [-L log-file] [-q] [-s separator] [-v] [-y confdir] file
bib2ris-utf8 [-e log-destination] [-h] [-j] [-l log-level] [-L log-file] [-q] [-s separator] [-v] [-y confdir] file

DESCRIPTION

bib2ris converts BibTeX bibliography files into RIS files. Latex commands, including non-ASCII characters written as commands, are preserved in the output. Importing the output of the bib2ris utility directly into RefDB is useful only if you use the data exclusively for LaTeX.

bib2ris-utf8 is a variant which converts foreign characters to UTF-8 and strips all other LaTeX commands by means of the refdb_latex2utf8txt (1) tool. The output of bib2ris-utf8 is the preferred format for import into RefDB as it is suitable for both LaTeX and SGML/XML bibliographies.

Unfortunately the concepts underlying BibTeX and RIS bibliographic data are quite different so that BibTeX data do not readily lend themselves to a clean conversion to the RIS format. This is not meant as an excuse to provide a bad filter but you should be aware that a few compile-time assumptions have to be made in order to get reasonable results. In any case, as the data models differ considerably, a loss-free round-trip conversion between the two data types is not possible: If you convert a BibTeX bibliography file to RIS and then back, the result will differ considerably from your input.

The following considerations apply to the data import into RefDB and the data export from RefDB:

1. BibTeX input data that are not written in UTF-8, that use formatting commands like font name, weight, or posture specifications, or that use LaTeX commands to write foreign and special characters should always be converted with bib2ris-utf8.

2. BibTeX output data will have the LaTeX command characters properly escaped. The data will use the default encoding of your reference database unless you specifically request a different encoding with the getref command or with the refdbib tool. Keep in mind that recent LaTeX installations can work with UTF-8 data using the following incantation in the prolog, allowing the easiest support for all kinds of foreign characters:

\usepackage[utf8]{inputenc}
        

OPTIONS

-e log-destination

log-destination can have the values 0, 1, or 2, or the equivalent strings stderr, syslog, or file, respectively. This value specifies where the log information goes to. 0 (zero) means the messages are sent to stderr. They are immediately available on the screen but they may interfere with command output. 1 will send the output to the syslog facility. Keep in mind that syslog must be configured to accept log messages from user programs, see the syslog(8) man page for further information. Unix-like systems usually save these messages in /var/log/user.log. 2 will send the messages to a custom log file which can be specified with the -L option.

-h

Displays help and usage screen, then exits.

-j

Force bib2ris to use JO RIS fields in all cases. If this option is not used, bib2ris tries to infer whether a journal name is an abbreviation or not. If the string contains at least one period, JO will be used, otherwise JF will be used.

-l log-level

Specify the priority up to which events are logged. This is either a number between 0 and 7 or one of the strings emerg, alert, crit, err, warning, notice, info, debug, respectively (see also Log level definitions). -1 disables logging completely. A low log level like 0 means that only the most critical messages are logged. A higher log level means that less critical events are logged as well. 7 will include debug messages. The latter can be verbose and abundant, so you want to avoid this log level unless you need to track down problems.

-L log-file

Specify the full path to a log file that will receive the log messages. Typically this would be /var/log/refdba.

-q

Start without reading the configuration files. The client will use the compile-time defaults for all values that you do not set with command-line switches.

-s separator

Specify the delimiter which separates individual keywords in a non-standard keyword field. Use the string spc for whitespace-separated lists (spaces and tabs).

-v

Prints version and copyright information, then exits.

-y confdir

Specify the directory where the global configuration files are Note: By default, all RefDB applications look for their configuration files in a directory that is specified during the configure step when building the package. That is, you don't need the -y option unless you use precompiled binaries in unusual locations, e.g. by relocating a rpm package.

file

If used, this parameter denotes the names of one or more bibtex files. If no file is specified, bib2ris tries to read the data from stdin. Output is always sent to stdout.

DIAGNOSTICS

The exit code of bib2ris indicates what went wrong in general (the details can be found in the log output). The code is the sum of the following error values:

1

general error; includes out of memory situations and invalid command-line options

2

incomplete entry (at least one essential field in an entry was missing)

4

unknown field name

8

unknown publication type

16

invalid BibTeX->RIS type mapping

32

parse error; includes file access errors

CONFIGURATION

bib2ris evaluates the file bib2risrc to initialize itself.


Table 1. bib2risrc

Variable Default Comment
logfile /var/log/bib2ris.log The full path of a custom log file. This is used only if logdest is set appropriately.
logdest 1 The destination of the log information. 0 = print to stderr; 1 = use the syslog facility; 2 = use a custom logfile. The latter needs a proper setting of logfile.
loglevel 6 The log level up to which messages will be sent. A low setting (0) allows only the most important messages, a high setting (7) allows all messages including debug messages. -1 means nothing will be logged.
abbrevfirst t If this option is set to "t", the first names of all authors and editors will be abbreviated to the initials. If set to "f", the first names will be used as they are found in the BibTeX bibliography file.
listsep ; This is the delimiter which separates individual keywords in a non-standard keyword field. Use the string "spc" for whitespace-separated lists (spaces and tabs).
forcejabbrev f If this is set to "t", journal names will be wrapped in RIS "JO" entries. If it is set to "f", bib2ris will use "JO" entries only if the journal name contains at least one period, otherwise it will use "JF".
maparticle JOUR map the BibTeX article publication type to a RIS type
mapbook BOOK map the BibTeX book publication type to a RIS type
mapbooklet PAMP map the BibTeX booklet publication type to a RIS type
mapconference CHAP map the BibTeX conference publication type to a RIS type
mapinbook CHAP map the BibTeX inbook publication type to a RIS type
mapincollection CHAP map the BibTeX incollection publication type to a RIS type
mapinproceedings CHAP map the BibTeX inproceedings publication type to a RIS type
mapmanual BOOK map the BibTeX manual publication type to a RIS type
mapmastersthesis THES map the BibTeX mastersthesis publication type to a RIS type
mapmisc GEN map the BibTeX misc publication type to a RIS type
mapphdthesis THES map the BibTeX phdthesis publication type to a RIS type
mapproceedings CONF map the BibTeX proceedings publication type to a RIS type
maptechreport RPRT map the BibTeX techreport publication type to a RIS type
mapunpublished UNPB map the BibTeX unpublished publication type to a RIS type
nsf_xyz (none) You can specify an unlimited number of these entries to map non-standard BibTeX fields to RIS tags. The BibTeX field name in this variable has to be in lowercase, regardless of the case in your input data (bib2ris treats field names as case-insensitive). The two-letter RIS tag has to be in uppercase. E.g. to map your BibTeX "Abstract" field to the RIS "N2" tag, the entry would read: "nsf_abstract N2".

DATA PROCESSING

This section provides a few hints about the data conversion itself and the BibTeX format requirements.

• The parsing of the input data is done by the btparse library. All limitations of that library apply to bib2ris as well. This applies very specifically to two hardcoded settings in btparse which, simply put, limit the size and complexity (in terms of macros) of an input file that btparse can handle. If you run into this kind of problem (I had to pull a 2 MB BibTeX bibliography from the net in order to verify this limit) you should increase the values of NUM_MACROS and STRING_SIZE in the source file macros.c and recompile the btparse library.

• All entry names and field names in the BibTeX input file are treated as case-insensitive, i.e. "BoOk" is the same as "book" and "AUTHOR" is the same as "aUthoR".

• The entries are checked for completeness. An error is generated if an entry lacks fields which are considered essential for the particular publication type.

• Non-standard fields can be imported in addition to the predefined BibTeX fields. Create an entry for each non-standard BibTeX field name that your input data use in your bib2ris configuration file. The data are handled differently based on the type of RIS field they are imported to. If the data are imported to the RIS fields AD, N1, or N2, which basically have an unlimited size, all occurrences of these fields will be concatenated into a single AD, N1, or N2 tag line, respectively. If the data are mapped to the RIS KW field, the string will be tokenized based on the list separator specified in the listsep configuration variable. Each token will be written as a separate KW tag line. A special case is the RIS pseudo-field "PY.day". Data imported to this tag are integrated as the day part in the publication date tag line "PY" (year and month, but not day, are standard BibTeX fields and are recognized by default). All other fields will be printed with their requested RIS tag. It is at the discretion of any RIS importing application to decide what to do with duplicate tag lines. Multiples are allowed for author tags (AU, A2, A3) and the keyword tag (KW). refdb will use the last occurrence of a tag line that does not allow multiple occurrences.

• Abbreviated journal names are detected only if they use periods. E.g. "J. Biol. Chem." will be mapped to a "JO" RIS element whereas "J Biol Chem" will be (incorrectly) mapped to a "JF" element ("Journal of Biological Chemistry" would correctly end up here too). Spaces after periods are optional. To capture "J Biol Chem" in a "JO" element, use the -j command line option or the "forcejabbrev" configuration file variable.

• The mapping of BibTeX publication types (book, inproceedings...) to RIS types as specified in the configuration file is checked for valid RIS types. If an invalid RIS type is specified, an error is generated and the compile-time default is used instead.

• By default the first names of authors and editors are not abbreviated. If you wish you can configure bib2ris to abbreviate first and middle names.

FILES

PREFIX/etc/refdb/bib2risrc

The global configuration file of bib2ris.

$HOME/.bib2risrc

The user configuration file of bib2ris.

AUTHOR

bib2ris was written by Markus Hoenicka <[email protected]>.