SYNOPSIS
simstring [OPTIONS]DESCRIPTION
This utility finds strings in the database (DB) such that they have similarity, in the similarity measure (SIM), no smaller than the threshold (TH) with queries read from STDIN. When -b (--build) option is specified, this utility builds a database (DB) for strings read from STDIN.
OPTIONS
These programs follow the usual GNU command line syntax, with long options starting with two dashes (`-'). A summary of options is included below. For a complete description, see the Info files.- -b, --build
- build a database for strings read from STDIN
- -d, --database=DB
- specify a database file
- -u, --unicode
- use Unicode (wchar_t) for representing characters
- -n, --ngram=N
- specify the unit of n-grams (DEFAULT=3)
- -m, --mark
- include marks for begins and ends of strings
- -s, --similarity=SIM
-
pecify a similarity measure (DEFAULT='cosine'):
exact exact match dice dice coefficient cosine] cosine coefficient jaccard jaccard coefficient overlap overlap coefficient - -t, --threshold=TH
- specify the threshold (DEFAULT=0.7)
- -e, --echo-back
- echo back query strings to the output
- -q, --quiet
- suppress supplemental information from the output
- -b, --benchmark
- show benchmark result (retrieved strings are suppressed)
- -v, --version
- show this version information and exit
- -h, --help
- show summary of options and exit