langident [OPTIONS] file1 [file2 ...]
DESCRIPTIONIdentifies the language files are written in using Perl module Lingua::Identify.
-aShow all results (not just the most probable language).
-cShow confidence level for most probable language (it will be the first value right after the most probable language).
-dDebug (development only).
-E ENCODINGSelect an input encoding. Defaults to UTF-8.
# use ISO-8859-1 (latin1) langident -E ISO-8859-1 file
-e METHODSSelect the method(s) to use. There are three ways of doing this:
# simply using a method langident -e ngrams3 file # using several methods (separate them with a comma) langident -e prefixes3,suffixes3 # using several methods and assign different weights to each of them langident -e smallwords=2,prefixes=1,ngrams3=1.3
The available methods are the following: smallwords, prefixes1, prefixes2, prefixes3, prefixes4, suffixes1, suffixes2, suffixes3, suffixes4, ngrams1, ngrams2, ngrams3 and ngrams4.
-hDisplay help message and exit.
-lList all available languages and exit.
-m NUMBERSet maximum number of results (languages) to display (shows the N most probable languages, by descending order of probability).
Overrides the -a switch.
-o LANGUAGESOnly work with specified languages.
# identify between Portuguese and English only langident -o pt,en *
-pAlso show percentages.
-s SIZEMaximum size to examine.
-vShow version and exit.
EXAMPLESUse methods ngrams2 and ngrams1, assigning the double of importance to ngrams2 (-e switch); output will include the three most probable languages (-m switch) with its percentages (-p switch) and also the confidence level (-c switch) of the first result.
$ langident -e ngrams2=2,ngrams1 -c -p -m 3 README README:en 65.7209505939491 7.8971987481393 ga 4.11905889385895 tr 4.08487011400505 $
- Add a switch to ignore HTML tags (and maybe other formats too)
AUTHORJose Alves de Castro, <[email protected]>
COPYRIGHT AND LICENSECopyright 2004 by Jose Alves de Castro
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.