filterfile(5) filter configuration file for fetchnews(8)

DESCRIPTION

Leafnode is a USENET package intended for small sites, where there are few users and little disk space, but where a large number of groups is desired.

Fetchnews (formerly called "fetch") is the program which submits and retrieves new articles to or from the upstream NNTP server. It has a filtering facility which makes it possible to select for or against certain articles, based on what is stored in the article headers. The headers are matched against patterns stored in a file which Fetchnews determines from the filterfile entry in the main leafnode configuration file.

The filterfile stores a number of entries in the following format.

newsgroups = [!]PCRE
pattern = [!]PCRE
action = {kill|select|score}

The right hand sides of the filterfile follow the same quoting conventions as the main configuration file does. See the CONFIGURATION section of the leafnode(8) manual page for details.

Based on these entries, fetchnews decides whether to retrieve an article from the upstream server or not. The entries are considered from the top down.

The [!]newsgroups parameter determines which newsgroup(s) the following patterns should be applied to. You can use PCRE expressions as explained in the pcrepattern(3) manual page.

When prefixed with an exclamation point, the match result is inverted, i. e. the rule applies if the newsgroup name is not matched by the PCRE, useful for "for all newsgroups except... do"-style filters. The newsgroup rule is memorized and can be used for more than one pattern/action pair. It is valid until the next newsgroups parameter turns up.

The pattern to recognize can be one of the following:

pattern = [!]PCRE
Filter on any pattern that may occur in a header. Leafnode uses Perl-compatible regular expressions (PCRE) which are explained in a separate manual page (pcrepattern(3)).

Note that the patterns are case sensitive. If you want a caseless match, you can use (?i) as PCRE option inside the pattern - the PCRE manual will tell you more. Specifically, leafnode compiles the PCRE patterns with just the PCRE_MULTILINE option set.

Note that the right hand side follows the same quoting conventions as leafnode's main configuration file - the double quote (") is special!

If you want to negate a part of the pattern, use a negative assertion, try (untested):


     pattern = ^Subject:(?!.*bad weather.*)$

Be very careful when using the | operator. Make sure you use parentheses to tell PCRE which part you want matched. "^Subject: Jens|Jan" is different from "^Subject: (Jens|Jan)" -- the former matches either "^Subject: Jens" or "Jan", and because these are applied to any header, the former PCRE would also match on all articles posted in January, because the "^Subject: " part is missing from the second alternative!

maxage = [days]
Filter on age of articles in days. Articles that are older are selected for the consecutive action.
minlines = [n]
Filter on the length of articles in lines. Articles that have less than n lines are selected for the consecutive action.
maxlines = [n]
Filter on the length of articles in lines. Articles that have more than n lines are selected for the consecutive action.
maxbytes = [n]
Filter on the length of articles in bytes. Articles that have more than n bytes are selected for the consecutive action.
maxcrosspost = [n]
Filter on the number of groups the article is posted to. Articles that are posted to more than n groups are selected for the consecutive action.

The action parameter finally determines what fetchnews should do when encountering a pattern. Since each action is dependent on a pattern, both of them must be defined. There are three actions to choose from:

kill
The article in question is not fetched if the pattern has been matched. No further patterns are considered.

select
The article in question is fetched if the pattern has been matched. No further patterns are considered.

[ score ]
score is a positive or negative integer value. The value is added to the score of the article if the pattern has been matched. Further patterns are considered. If, after consideration of all patterns, the score is below zero, the article is not fetched; otherwise it is fetched.

EFFICIENCY

For efficient filtering, you should prefer upstream newsservers that support the "XOVER" command, and only use filter criteria that can be satisfied from overview data. This means only filter on these headers with "pattern": Subject, From, Date, Message-ID, References, Bytes, Lines.

Filtering maxage, minlines, maxlines, maxbytes is fine. Maxcrosspost filtering, due to a lack of Newsgroups: headers in overview data, resorts to checking the Xref: header and will therefore count how many of the groups the article was crossposted into are offered by the upstream server. This means that maxcrosspost filtering can be delayed until after the whole article has been downloaded.

LOGGING

Articles that are rejected by filters are currently logged at "news.info" facility/priority. They take one of the forms

article N <M> rejected by filter (XOVER)

store: article <M> rejected by filter

where N is the article number in the current group and M is the Message-ID of the filtered article.

You can also add "filter debugging" if you need more detailed logging, but it is copious and should not be used unattended. Try running fetchnews or applyfilter with the "-D33 -e" options to see such logging.

EXAMPLES

The first example shows how to avoid "fast money"-type postings in all newsgroups:

newsgroups = .*
pattern = ^Subject:.*fast.?money
action = kill

This pattern is applied to all newsgroups. Only the Subject:-line is matched (if the "^" were omitted, other headers containing "Subject:.*fast.?money" would be matched as well). The amount of characters between the header name and the "fast" may be arbitrary ("." means "any char" and the "*" designates that an arbitrary number, including zero, of these characters may be there) while between "fast" and "money" there must be at least one other character (designated by the "?"). For further explanations of Perl-compatible regular expressions, see pcrepattern(3).

The second example shows how to avoid articles in a specific newsgroup unless they deal with a certain subject:

newsgroups = ^alt.hippopotamus
pattern = ^Subject:.*[Hh]ippo
action = select
pattern = ^Subject:.*
action = kill

Working from the top down, fetchnews will immediately select all articles in alt.hippopotamus that contain "Hippo" or "hippo" in the subject. If this expression is not matched, all articles will match the second expression and therefore be not fetched.

Finally, the third example shows another way to select certain articles, using scoring.

newsgroups = ^alt.hippopotamus.*
pattern = ^Subject:.*[Hh]ippo
action = 5
pattern = ^Subject:.*telekinesis
action = -3

We are interested in hippopotamus but not in telekinesis. This filter will kill all postings to the alt.hippopotamus hierarchy which deal with the latter, except if they report telekinesis in connection with hippos. (Of course, there may be other ways to achieve the same result.)

While in these examples filtering was always done on the Subject: line, there is no reason not to apply filters to arbitrary headers, e.g. certain posters (From:), MIME-encoded postings or whatever you can imagine. Just remember what was written in the EFFICIENCY section above if bandwidth is precious.

Leafnode modifies the wildmat format somewhat: patterns that do not specify a range of characters are matched in a case-insensitive manner. This is necessary since newsgroups names are de facto not matched in a case-sensitive manner. If you want to enforce case-sensitive pattern matching, include a range somewhere in your pattern.

BUGS

Neither the newsgroups name nor the pattern may contain a hash mark (#).

AUTHOR

This manual page was written by Cornelius Krasel <[email protected]> with some inspiration from Lloyd Zusman <[email protected]>. It was later revised by Matthias Andree.