parse_index(3) functions for searching the WordNet database

Other Alias

findtheinfo, findtheinfo_ds, is_defined, in_wn, index_lookup, getindex, read_synset, parse_synset, free_syns, free_synset, free_index, traceptrs_ds, do_trace


#include "wn.h"

char *findtheinfo(char *searchstr, int pos, int ptr_type, int sense_num);

SynsetPtr findtheinfo_ds(char *searchstr, int pos, int ptr_type, int sense_num );

unsigned int is_defined(char *searchstr, int pos);

unsigned int in_wn(char *searchstr, int pos);

IndexPtr index_lookup(char *searchstr, int pos);

IndexPtr parse_index(long offset, int dabase, char *line);

IndexPtr getindex(char *searchstr, int pos);

SynsetPtr read_synset(int pos, long synset_offset, char *searchstr);

SynsetPtr parse_synset(FILE *fp, int pos, char *searchstr);

void free_syns(SynsetPtr synptr);

void free_synset(SynsetPtr synptr);

void free_index(IndexPtr idx);

SynsetPtr traceptrs_ds(SynsetPtr synptr, int ptr_type, int pos, int depth);

char *do_trace(SynsetPtr synptr, int ptr_type, int pos, int depth);


These functions are used for searching the WordNet database. They generally fall into several categories: functions for reading and parsing index file entries; functions for reading and parsing synsets in data files; functions for tracing pointers and hierarchies; functions for freeing space occupied by data structures allocated with malloc(3).

In the following function descriptions, pos is one of the following:

1       NOUN
2       VERB
4       ADVERB

findtheinfo() is the primary search algorithm for use with database interface applications. Search results are automatically formatted, and a pointer to the text buffer is returned. All searches listed in WNHOME/include/wn.h can be done by findtheinfo(). findtheinfo_ds() can be used to perform most of the searches, with results returned in a linked list data structure. This is for use with applications that need to analyze the search results rather than just display them.

Both functions are passed the same arguments: searchstr is the word or collocation to search for; pos indicates the syntactic category to search in; ptr_type is one of the valid search types for searchstr in pos. (Available searches can be obtained by calling is_defined() described below.) sense_num should be ALLSENSES if the search is to be done on all senses of searchstr in pos, or a positive integer indicating which sense to search.

findtheinfo_ds() returns a linked list data structures representing synsets. Senses are linked through the nextss field of a Synset data structure. For each sense, synsets that match the search specified with ptr_type are linked through the ptrlist field. See "Synset Navigation", below, for detailed information on the linked lists returned.

is_defined() sets a bit for each search type that is valid for searchstr in pos, and returns the resulting unsigned integer. Each bit number corresponds to a pointer type constant defined in WNHOME/include/wn.h. For example, if bit 2 is set, the HYPERPTR search is valid for searchstr. There are 29 possible searches.

in_wn() is used to find the syntactic categories in the WordNet database that contain one or more senses of searchstr. If pos is ALL_POS, all syntactic categories are checked. Otherwise, only the part of speech passed is checked. An unsigned integer is returned with a bit set corresponding to each syntactic category containing searchstr. The bit number matches the number for the part of speech. 0 is returned if searchstr is not present in pos.

index_lookup() finds searchstr in the index file for pos and returns a pointer to the parsed entry in an Index data structure. searchstr must exactly match the form of the word (lower case only, hyphens and underscores in the same places) in the index file. NULL is returned if a match is not found.

parse_index() parses an entry from an index file and returns a pointer to the parsed entry in an Index data structure. Passed the byte offset and syntactic category, it reads the index entry at the desired location in the corresponding file. If passed line, line contains an index file entry and the database index file is not consulted. However, offset and dbase should still be passed so the information can be stored in the Index structure.

getindex() is a "smart" search for searchstr in the index file corresponding to pos. It applies to searchstr an algorithm that replaces underscores with hyphens, hyphens with underscores, removes hyphens and underscores, and removes periods in an attempt to find a form of the string that is an exact match for an entry in the index file corresponding to pos. index_lookup() is called on each transformed string until a match is found or all the different strings have been tried. It returns a pointer to the parsed Index data structure for searchstr, or NULL if a match is not found.

read_synset() is used to read a synset from a byte offset in a data file. It performs an fseek(3) to synset_offset in the data file corresponding to pos, and calls parse_synset() to read and parse the synset. A pointer to the Synset data structure containing the parsed synset is returned.

parse_synset() reads the synset at the current offset in the file indicated by fp. pos is the syntactic category, and searchstr, if not NULL, indicates the word in the synset that the caller is interested in. An attempt is made to match searchstr to one of the words in the synset. If an exact match is found, the whichword field in the Synset structure is set to that word's number in the synset (beginning to count from 1).

free_syns() is used to free a linked list of Synset structures allocated by findtheinfo_ds(). synptr is a pointer to the list to free.

free_synset() frees the Synset structure pointed to by synptr.

free_index() frees the Index structure pointed to by idx.

traceptrs_ds() is a recursive search algorithm that traces pointers matching ptr_type starting with the synset pointed to by synptr. Setting depth to 1 when traceptrs_ds() is called indicates a recursive search; 0 indicates a non-recursive call. synptr points to the data structure representing the synset to search for a pointer of type ptr_type. When a pointer type match is found, the synset pointed to is read is linked onto the nextss chain. Levels of the tree generated by a recursive search are linked via the ptrlist field structure until NULL is found, indicating the top (or bottom) of the tree. This function is usually called from findtheinfo_ds() for each sense of the word. See "Synset Navigation", below, for detailed information on the linked lists returned.

do_trace() performs the search indicated by ptr_type on synset synptr in syntactic category pos. depth is defined as above. do_trace() returns the search results formatted in a text buffer.

Synset Navigation

Since the Synset structure is used to represent the synsets for both word senses and pointers, the ptrlist and nextss fields have different meanings depending on whether the structure is a word sense or pointer. This can make navigation through the lists returned by findtheinfo_ds() confusing.

Navigation through the returned list involves the following:

Following the nextss chain from the synset returned moves through the various senses of searchstr. NULL indicates that end of the chain of senses.

Following the ptrlist chain from a Synset structure representing a sense traces the hierarchy of the search results for that sense. Subsequent links in the ptrlist chain indicate the next level (up or down, depending on the search) in the hierarchy. NULL indicates the end of the chain of search result synsets.

If a synset pointed to by ptrlist has a value in the nextss field, it represents another pointer of the same type at that level in the hierarchy. For example, some noun synsets have two hypernyms. Following this nextss pointer, and then the ptrlist chain from the Synset structure pointed to, traces another, parallel, hierarchy, until the end is indicated by NULL on that ptrlist chain. So, a synset representing a pointer (versus a sense of searchstr) having a non-NULL value in nextss has another chain of search results linked through the ptrlist chain of the synset pointed to by nextss.

If searchstr contains more than one base form in WordNet (as in the noun axes, which has base forms axe and axis), synsets representing the search results for each base form are linked through the nextform pointer of the Synset structure.

WordNet Searches

There is no extensive description of what each search type is or the results returned. Using the WordNet interface, examining the source code, and reading wndb(5WN) are the best ways to see what types of searches are available and the data returned for each.

Listed below are the valid searches that can be passed as ptr_type to findtheinfo(). Passing a negative value (when applicable) causes a recursive, hierarchical search by setting depth to 1 when traceptrs() is called.


ISMEMBERPTR6#mMember meronym
ISSTUFFPTR7#sSubstance meronym
ISPARTPTR8#pPart meronym
HASMEMBERPTR9%mMember holonym
HASSTUFFPTR10%sSubstance holonym
HASPARTPTR11%pPart holonym
MERONYM12%All meronyms
HOLONYM13#All holonyms
PPLPTR15<Participle of verb
PERTPTR17\Pertains to noun or derived from adjective
VERBGROUP19$Verb group
DERIVATION20+Derivationally related form
CLASSIFICATION21;Domain of synset
CLASS22-Member of this domain
SYNS23n/aFind synonyms
FRAMES25n/aVerb example sentences and generic frames
COORDS26n/aNoun coordinates
RELATIVES27n/aGroup related senses
HMERONYM28n/aHierarchical meronym search
HHOLONYM29n/aHierarchical holonym search
WNGREP30n/aFind keywords by substring
OVERVIEW31n/aShow all synsets for word
CLASSIF_CATEGORY32;cShow domain topic
CLASSIF_USAGE33;uShow domain usage
CLASSIF_REGIONAL34;rShow domain region
CLASS_CATEGORY35-cShow domain terms for topic
CLASS_USAGE36-uShow domain terms for usage
CLASS_REGIONAL37-rShow domain terms for region
INSTANCE38@iInstance of
INSTANCES39≈iShow instances

findtheinfo_ds() cannot perform the following searches:



Applications that use WordNet and/or the morphological functions must call wninit() at the start of the program. See wnutil(3WN) for more information.

In all function calls, searchstr may be either a word or a collocation formed by joining individual words with underscore characters (_).

The SearchResults structure defines fields in the wnresults global variable that are set by the various search functions. This is a way to get additional information, such as the number of senses the word has, from the search functions. The searchds field is set by findtheinfo_ds().

The pos passed to traceptrs_ds() is not used.


parse_synset() must find an exact match between the searchstr passed and a word in the synset to set whichword. No attempt is made to translate hyphens and underscores, as is done in getindex().

The WordNet database and exception list files must be opened with wninit prior to using any of the searching functions.

A large search may cause findtheinfo() to run out of buffer space. The maximum buffer size is determined by computer platform. If the buffer size is exceeded the following message is printed in the output buffer: "Search too large. Narrow search and try again...".

Passing an invalid pos will probably result in a core dump.