logrep(1) A handy tool for sophisticated, ad-hoc analysis of webserver logs.

DESCRIPTION

wtop is like "top" for your webserver. How many searches or signups are happening per second? What is the response time histogram for your static files? wtop shows you at a glance.

OPTIONS

logrep [--mode MODE] [--include | --exclude CLASSES] [-H | -R]

[--output FIELDS] [--filter FILTERS] [--last LAST_N] [--sort LIM:FIELDS:DIRECTION] [--config CFG_FILE] [--quiet] [LOG_FILE]
-m MODE
There are three modes:
--mode


      - "grep" parses an entire log file (default).

      - "tail" reads from the end of the file.

      - "top"  shows running performance stats.
-i, -e CLASSES
Include or exclude the given URL "classes". You can
--include
configure logrep to classify URLs by a set of
--exclude
regular expressions. See the installation docs and /etc/wtop.cfg for how to configure your own classes. --include and --exclude are mutually exclusive.
Examples:
--include "home,search,wiki"
--exclude "img,xml,js"
-f FILTERS -f filters act on named fields.
--filter
There is support for strings & numbers, greater than (>), less than (<), equals (=), not-equals (!=), and regular expression match (~ and !~).
For example: Filter successful requests that were over 10kB in size that do not have 'example.com' in the Referer field:
-f "status=200,bytes>10000,refdom!~example.com"
AVAILABLE FIELDS:

     msec

     millisecond response time

     ip

     The IP address of the client

     url

     The path of the request, eg '/home'

     ref

     'Referer' header

     refdom

     domain part of the 'Referer' header

     bytes

     Bytes sent

     ua

     User-agent header

     uas

     First 30 characters of ua

     class

     URL class, configurable in wtop.cfg

     status

     HTTP status code, eg 200, 301, 404

     proto

     Protocol version, eg 'HTTP/1.1'

     method

     HTTP method, eg 'GET', 'POST'

     bot

     Is a robot? 1 or 0. Only a guess.

     botname

     eg 'Googlebot', 'Nutch', 'Slurp', etc

     ts

     Unix timestamp of the request

     year

     month

     day

     hour

     minute

     country    country name (see Geocoding, below)

     cc         ISO-639 country code (see below)
-H, -R
Shorthand for a useful but incomplete filter of robot user-agents. Equivalent to --filter 'bot=0' or --filter 'bot=1'
-o FIELDS
Output only the given fields, tab-delimited. All
--output
of the fields listed for --filter are available.
AGGREGATE FUNCTIONS: In -m grep mode you can use aggregate functions on numeric fields such as bytes and msec. Any non-aggregate fields in the list will be used to group records together.
count(*) avg(FIELD) mean average min(FIELD) lowest seen value max(FIELD) highest seen value sum(FIELD) summation of all values var(FIELD) population variance dev(FIELD) deviation (square root of variance)
-s LIM:FIELDS:DIRECTION
--sort Use this option to sort & limit aggregate records. LIMIT is the number of records to return, FIELDS is a comma-delimited list of column positions starting with 1, and DIRECTION is either 'descending' (default) or 'ascending'.
-l LAST_N
(grep mode) Only read the last N log lines.
--last
-c CFG_FILE
Feed logrep a custom config file. By default it
--config
will use: /etc/wtop.cfg (Linux, BSD, OSX, etc)
Python sys.prefix (Windows)
-q, --quiet
Quiet mode. Does not print warnings to stderr.
LOG_FILE
The path to a log file. By default logrep will read from the file path specified in wtop.cfg If you specify '-', logrep will read from STDIN.

CONFIGURATION:

Configuring Apache
Please Note: By default Apache LogFormats do not have the %D (microsecond response time) directive. You must have at least %s, %r, %t and %D in your LogFormat in order to use wtop. You can use logrep without %D, but you will not be able to use the msec field.
Example: before

     LogFormat "%h %l %u %t      CustomLog logs/access_log common
Example: after

     LogFormat "%h %l %u %t      CustomLog logs/access_log common

GEOCODING:

logrep will use the MaxMind GeoIP library if it is installed. This will enable two extra fields for filtering and output: country (eg "United Kingdom"), and cc (ISO-639 country code, eg "UK"). These are a *guess* at the country the HTTP client is from.

KNOWN BUG:

Some installations of Apache have HostnameLookups defaulted to On. This means that the %h field will contain the fully-qualified domain name of the client (xdsl456.foo.example.com) instead of the IP address (123.1.2.3). Geocoding will work but will require a DNS lookup to resolve the IP address. Using the 'cc' or 'country' field in this case will generate a *LOT* of DNS traffic and can hang the program. It is recommended to explicitly set HostnameLookups Off in your Apache configuration.

EXAMPLES

"wtop" for all human traffic:
$ logrep -m top -f 'bot=0' access.log
Status code & response times for all Googlebot homepage hits:
$ logrep -f 'botname=Googlebot' -i home -o status,msec
Tail for pages about Angelina Jolie or Brad Pitt sent from example.com:
$ logrep -m tail -f 'url~jolie|pitt,ref~example.com' access.log
Get maximum response size and average response time for requests grouped by URL class:
$ logrep -o 'class,max(bytes),avg(msec)' access.log
Grouped by status:
$ logrep -o 'status,count(*),avg(msec)'
Output:
$ logrep -o 'cc,msec,url'
Total bytes sent, by hour & minute:
$ logrep -o 'hour,minute,sum(bytes)' -s'3600:1,2:a'
The 10 most popular URLs:
$ logrep -o 'url,count(*)' -s '10:2'