simstring(1) build database and find similar words


simstring [OPTIONS]


This utility finds strings in the database (DB) such that they have similarity, in the similarity measure (SIM), no smaller than the threshold (TH) with queries read from STDIN. When -b (--build) option is specified, this utility builds a database (DB) for strings read from STDIN.


These programs follow the usual GNU command line syntax, with long options starting with two dashes (`-'). A summary of options is included below. For a complete description, see the Info files.
-b, --build
build a database for strings read from STDIN
-d, --database=DB
specify a database file
-u, --unicode
use Unicode (wchar_t) for representing characters
-n, --ngram=N
specify the unit of n-grams (DEFAULT=3)
-m, --mark
include marks for begins and ends of strings
-s, --similarity=SIM
pecify a similarity measure (DEFAULT='cosine'):

exact exact match
dice dice coefficient
cosine] cosine coefficient
jaccard jaccard coefficient
overlap overlap coefficient
-t, --threshold=TH
specify the threshold (DEFAULT=0.7)
-e, --echo-back
echo back query strings to the output
-q, --quiet
suppress supplemental information from the output
-b, --benchmark
show benchmark result (retrieved strings are suppressed)
-v, --version
show this version information and exit
-h, --help
show summary of options and exit