sgml-spell-checker(1) SGML spell checker


nsgmls -l yourdoc.sgml | sgml-spell-checker [option] ...


sgml-spell-checker is a tool that you can use to automatically spell-check your SGML documents. One of the advantages of this tool over some other SGML-aware spell checkers is that it scans your documents in the form in which the SGML parser actually sees it, which means it is not line-based, system entities are resolved, marked sections are treated appropriately, etc.

Also, this tool can be made aware of particular DTDs, in the sense that it knows not to spell-check the content of elements that do not represent human-language text, such as <programlisting> in DocBook. An exclusion list for the DocBook DTD is included, others can be added trivially.

The input to sgml-spell-checker is the text representation of your SGML document's Element Structure Information Set as generated by nsgmls (from SP or OpenSP; sometimes installed under the name onsgmls). In other words, you need to pipe the output of nsgmls into sgml-spell-checker as shown in the synopsis. Provide to nsgmls the options you need, such as -c to search more catalogs, -i to include a marked section, or more source files. Do not forget the -l option, or you won't get any file or line references for the misspellings.

The second part of the pipe takes a couple of options; see below. Note that if the language of the document does not match your system's locale settings, you need to use the --language option.

The output of sgml-spell-checker is a list of the words that are misspelled (in the opinion of aspell), together with file name and line number. Note that the line number designates where the element that contains the word started, not where the word actually is. So most likely you will have to search a few lines below the indicated location.


Debug mode. Generates lots of output not of interest to the normal user.
Sets the language of the document. (The format depends on the aspell installation, but something like en or en_US should work.) By default the language is taken from the system locale settings.
Shows correction suggestions for misspelled words.
Uses an additional aspell dictionary file. This option may be used multiple times.
Uses the exclusion list for the specified DTD (e.g., docbook).
Shows a brief help, then exits.


nsgmls -l -D . mydoc.sgml | \
sgml-spell-checker --language=en --dtd=docbook \
   --dictionary=mydict1.aspell --dictionary=mydict2.aspell

(You can enter this command all on one line without the backslashes, or on several lines with the backslashes.)


Read the aspell documentation about how to set up the appropriate dictionaries. In case you're having trouble interpreting the aspell documentation, here's how to make an aspell dictionary file from a flat word list:

rm -f mydict1.aspell    # aspell won't overwrite existing files
aspell --language-tag=xx create master ./mydict1.aspell < mywordlist.txt

Watch the slashes. aspell likes to see a slash in the name or it will search some default location.


This program should be able to identify the language from the document (e.g., <book lang="de">), but aspell doesn't handle changing the language on the fly.


Peter Eisentraut ([email protected])