man ocrmypdf (1): add an OCR text layer to PDF files

DESCRIPTION

usage: ocrmypdf [-h] [--verbose [VERBOSE]] [--version] [-n] [--flowchart FILE]

: [-l LANGUAGE] [-j N] [--title TITLE] [--author AUTHOR] [--subject SUBJECT] [--keywords KEYWORDS] [-r] [-d] [-c] [-i] [--oversample DPI] [-f] [-s] [--skip-big MPixels] [--tesseract-config CFG] [--tesseract-pagesegmode PSM] [--pdf-renderer {auto,tesseract,hocr}] [--tesseract-timeout SECONDS] [--rotate-pages-threshold CONFIDENCE] [-k] [-g] input_file output_file

Generate searchable PDF file from an image-only PDF file.

--verbose [VERBOSE], -v [VERBOSE]: Print more verbose messages for each additional verbose level.
--version: show program's version number and exit

-n, --just_print: Don't actually run any commands; just print the pipeline.
--flowchart FILE: Don't run any commands; just print pipeline as a flowchart.

: Improve OCR quality and final image
-r, --rotate-pages: automatically rotate pages based on detected text orientation
-d, --deskew: deskew each page before performing OCR
-c, --clean: clean pages from scanning artifacts before performing OCR
-i, --clean-final: incorporate the cleaned image in the final PDF file
--oversample DPI: oversample images to at least the specified DPI, to improve OCR results slightly

: Control how OCR is applied
-f, --force-ocr: rasterize any fonts or vector images on each page and apply OCR
-s, --skip-text: skip OCR on any pages that already contain text, but include the page in final output
--skip-big MPixels: skip OCR on pages larger than the specified amount of megapixels, but include skipped pages in final output

: Advanced options for power users
--tesseract-config CFG: additional Tesseract configuration files
--tesseract-pagesegmode PSM: set Tesseract page segmentation mode (see tesseract --help)
--pdf-renderer {auto,tesseract,hocr}: choose OCR PDF renderer
--tesseract-timeout SECONDS: give up on OCR after the timeout, but copy the preprocessed page into the final output
--rotate-pages-threshold CONFIDENCE: only rotate pages when confidence is above this value (arbitrary units reported by tesseract)

: Arguments to help with troubleshooting and debugging
-k, --keep-temporary-files: keep temporary files (helpful for debugging)
-g, --debug-rendering: render each page twice with debug information on second page