Scrappy::Scraper::Control(3) Scrappy HTTP Request Constraints System

VERSION

version 0.94112090

SYNOPSIS


#!/usr/bin/perl
use Scrappy::Scraper::Control;
my $control = Scrappy::Scraper::Control->new;

$control->allow('http://search.cpan.org');
$control->allow('http://search.cpan.org', if => {
content_type => ['text/html', 'application/x-tar']
}
);

$control->restrict('http://www.cpan.org');

if ($control->is_allowed('http://search.cpan.org/')) {
...
}

# constraints will only be checked if the is_allowed method is
# passed a HTTP::Response object.

DESCRIPTION

Scrappy::Scraper::Control provides HTTP request access control for the Scrappy framework.

ATTRIBUTES

The following is a list of object attributes available with every Scrappy::Scraper::Control instance.

allowed

The allowed attribute holds a hasherf of allowed domain/contraints.

    my  $control = Scrappy::Scraper::Control->new;
        $control->allowed;
        
        e.g.
        
        {
            'www.foobar.com' => {
                methods => [qw/GET POST PUSH PUT DELETE/],
                content_type => ['text/html']
            }
        }

restricted

The restricted attribute holds a hasherf of restricted domain/contraints.

    my  $control = Scrappy::Scraper::Control->new;
        $control->restricted;
        
        e.g.
        
        {
            'www.foobar.com' => {
                methods => [qw/GET POST PUSH PUT DELETE/]
            }
        }

METHODS

allow

    my  $control = Scrappy::Scraper::Control->new;
        $control->allow('http://www.perl.org');
        $control->allow('http://search.cpan.org', if => {
                content_type => ['text/html', 'application/x-tar']
            }
        );

restrict

    my  $control = Scrappy::Scraper::Control->new;
        $control->restrict('http://www.perl.org');
        $control->restrict('http://search.cpan.org', if => {
                content_type => ['text/html', 'application/x-tar']
            }
        );

is_allowed

    my  $control = Scrappy::Scraper::Control->new;
        $control->allow('http://search.cpan.org');
        $control->restrict('http://www.perl.org');
        
        if (! $control->is_allowed('http://perl.org')) {
            die 'Cant get to Perl.org';
        }

AUTHOR

Al Newkirk <[email protected]>

COPYRIGHT AND LICENSE

This software is copyright (c) 2010 by awncorp.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.