simstring(1)
            build database and find similar words
        
      
        
SYNOPSIS
simstring
[OPTIONS]
DESCRIPTION
This utility finds strings in the database (DB) such that they have similarity,
in the similarity measure (SIM), no smaller than the threshold (TH) with
queries read from STDIN. When 
-b (--build) option is specified, this utility
builds a database (DB) for strings read from STDIN.
 
OPTIONS
These programs follow the usual GNU command line syntax, with long
options starting with two dashes (`-').
A summary of options is included below.
For a complete description, see the Info files.
- -b, --build
- 
build a database for strings read from STDIN
- -d, --database=DB
- 
specify a database file
- -u, --unicode
- 
use Unicode (wchar_t) for representing characters
- -n, --ngram=N
- 
specify the unit of n-grams (DEFAULT=3)
- -m, --mark
- 
include marks for begins and ends of strings
- -s, --similarity=SIM
- 
pecify a similarity measure (DEFAULT='cosine'):
 
| exact | exact match 
 |  | dice | dice coefficient 
 |  | cosine] | cosine coefficient 
 |  | jaccard | jaccard coefficient 
 |  | overlap | overlap coefficient 
 |  
 
- -t, --threshold=TH
- 
specify the threshold (DEFAULT=0.7)
- -e, --echo-back
- 
echo back query strings to the output
- -q, --quiet
- 
suppress supplemental information from the output
- -b, --benchmark
- 
show benchmark result (retrieved strings are suppressed)
- -v, --version
- 
show this version information and exit
- -h, --help
- 
show summary of options and exit