man cuneiform (1): multi-language OCR system

SYNOPSIS

cuneiform [--dotmatrix] [--fax] [--singlecolumn] [-f format] [-l language] [-o output] input

DESCRIPTION

Cuneiform is an OCR system. In addition to text recognition it also does layout analysis and text format recognition. Cuneiform supports several languages.

OPTIONS

--dotmatrix

Use recognition mode optimized for text printed with a dot matrix printer.

--fax

Use recognition mode optimized for text that has been faxed.

--singlecolumn

Disable page layout analysis and assumes that the image consists of only one column of text.

-f format

Select output format. The following formats are available: html (HTML format), hocr (hOCR HTML format), native (native Cuneiform 2000), rtf (RTF format), smarttext (plain text with TeX paragraphs), text (plain text). The default is plain text.

-l language

By default Cuneiform recognizes English text. To change the language use the command line switch -l followed by a language code (typically an ISO 639-2 three-letter code). The following languages are supported:

bul	Bulgarian
cze	Czech
dan	Danish
dut	Dutch
eng	English
est	Estonian
fra	French
ger	German
hrv	Croatian
hun	Hungarian
ita	Italian
lav	Latvian
lit	Lithuanian
pol	Polish
por	Portugese
rum	Romanian
rus	Russian
ruseng	mixed Russian/English
slv	Slovenian
spa	Spanish
srp	Serbian
swe	Swedish
tur	Turkish
ukr	Ukrainian

-o output

If you do not define an output file with the -o switch, Cuneiform writes the result to a file 'cuneiform-out.format'. The file extension depends on your output format.

INPUT FORMAT

Cuneiform can process any single-page image that GraphicsMagick knows how to open. Please consult the gm(1) manual page for the comprehensive list of supported image formats.

HOMEPAGE

More information about cuneiform can be found at <http://launchpad.net/cuneiform-linux/>.

AUTHOR

cuneiform was written by Cognitive Technologies and Jussi Pakkanen <[email protected]>.