frog(1) Dutch morpho-syntactic analyzer, IOB chunker and dependency parser

SYNOPSYS

frog [options]

frog -t test-file

DESCRIPTION

frog is an integration of memory--based natural language processing (NLP) modules developed for Dutch. frog's current version will tokenize, tag, lemmatize, and morphologically segment word tokens in Dutch text files, add IOB chunks and will assign a dependency graph to each sentence.

OPTIONS

-c <configfile>

set the configuration using 'file'

--debug=<modele><level>,...

set debug level per module. Tokenizer (t), Lemmatizer (l), Morphological Analyzer (a), Chunker (c), Multi-Word Units (m), Named Entity Recognition (n), or Parser (p).

(e.g. --debug=l5,n3 sets the level for the Lemmatizer to 5 and for the NER to 3 )

-d <level>

set global debug level. (for all modules)

-e <encoding>

set input encoding. (default UTF8)

-h

give some help

--keep-parser-files=[yes|no]

keep the intermediate files from the parser. Last sentence only!

-n

assume inputfile to hold one sentence per line.

Very useful when running interactive, otherwise an empty line is needed to signal end of input.

-o <file>

send output to 'file' instead of stdout. Defaults to the name of the inputfile with '.out' appended.

--outputdir <dir>

send all output to 'dir' instead of stdout. Creates filenames from the inputfilename(s) with '.out' appended.

--skip=[aclmnpt]

skip parts of the proces: Tokenizer (t), Chunker (c), Lemmatizer (l), Morphological Analyzer (a), Multi--Word unit (m), Named-Entity recognizer (n) or Parser (p)

-Q

Enable quotedetection in the tokenizer. May run havock!

-S <port>

Run a server on 'port'

-t <file>

process 'file'.

When -t is omitted, Frog will run in interactive mode.

-x <xmlfile>

process 'xmlfile', which is supposed to be in FoLiA format! If 'xmlfile' is empty, and --testdir=<dir> is provided, all '.xml' files in 'dir' will be processed as FoLia XML.

--textclass=<cls>

When -x is given, use 'cls' to find text in the FoLiA document(s).

--testdir=<dir>

process all files in 'dir'. When the input mode is XML, only '.xml' files are teken from 'dir'. see also --outputdir

--tmpdir=<dir>

location to store intermediate files. Default /tmp.

--threads=<n>

use a maximum of 'n' threads. The default is to take whatever is needed. In servermode we always run on 1 thread.

-V or --version

show version info

--xmldir=<dir>

generate FoLiA XML output and send it to 'dir'. Creates filenames from the inputfilename with '.xml' appended. (Except when it already ends with '.xml')

-X <file>

generate FoLiA XML output and send it to 'file'. Defaults to the name of the inputfile(s) with '.xml' appended. (Except when it already ends with '.xml')

--id=<id>

When -X for FoLia is given, use 'id' to give the doc an ID.

BUGS

likely

AUTHORS

Maarten van Gompel [email protected]

Ko van der Sloot [email protected]

Antal van den Bosch [email protected]