XMLTV::Grab_XML(3) Perl extension to fetch raw XMLTV data from a site


package Grab_XML_rur;
use base 'XMLTV::Grab_XML';
sub urls_by_date( $ ) { my $pkg = shift; ... }
sub country( $ ) { my $pkg = shift; return 'Ruritania' }
# Maybe override a couple of other methods as described below...


This module helps to write grabbers which fetch pages in XMLTV format from some website and output the data. It is not used for grabbers which scrape human-readable sites.

It consists of several class methods (package methods). The way to use it is to subclass it and override some of these.


Called at the start of the program to set up Date::Manip. You might want to override this with a method that sets the timezone.
Returns a hash mapping YYYYMMDD dates to a URL where listings for that date can be downloaded. This method is abstract, you must override it.

Arguments: the command line options for --config-file and --quiet.

Given page data for a particular day, turn it into XML. The default implementation just returns the data unchanged, but you might override it if you need to decompress the data or patch it up.
Configure the grabber if needed. Arguments are --config-file option (or undef) and --quiet flag (or undef).

This method is not provided in the base class; if you don't provide it then attempts to --configure will give a message that configuration is not necessary.

Bump a YYYYMMDD date by one. You probably shouldn't override this.
Return the name of the country you're grabbing for, used in usage messages. Abstract.
Return a command-line usage message. This calls "country()", so you probably need to override only that method.
Given a URL, fetch the content at that URL. The default implementation calls XMLTV::Get_nice::get_nice() but you might want to override it if you need to do wacky things with http requests, like cookies.

Note that while this method fetches a page, "xml_from_data()" does any further processing of the result to turn it into XML.

The main program. Parse command line options, fetch and write data.

Most of the options are fairly self-explanatory but this routine also calls the XMLTV::Memoize module to look for a --cache argument. The functions memoized are those given by the "cachables()" method.

Returns a list of names of functions which could reasonably be memoized between runs. This will normally be whatever function fetches the web pages - you memoize that to save on repeated downloads. A subclass might want to add things to this list if it has its own way of fetching web pages.
Checks each stop time and removes it if it's before the start time.

Argument: the XML to correct Returns: the corrected XML


Ed Avis, [email protected]