plucker-build(1) generate a document (e-book) in Plucker format

SYNOPSIS

plucker-build [--alt-maxheight=pixel-height] [--alt-maxwidth=pixel-width] [--author=string] [--backup] [--beamable] [--bpp=image-depth] [--category=default-category-name] [--charset=charset-indicator] [--compression=compression-type] [--depth-first] [--doc-file=name-prefix] [--doc-name=document-name] [--doc-compression] [--exclusion-list=filename] [--extra-section=section-name] [--help] [--home-url=base-URL] [--icon=image-filename] [--launchable] [--maxdepth=depth] [--maxheight=pixel-height] [--maxwidth=pixel-width] [--no-backup] [--noimages] [--not-beamable] [--not-launchable] [--no-urlinfo] [--owner-id=name] [--pluckerdir=output-directory] [--pluckerhome=plucker-home-directory] [--quiet] [--referrer=string] [--status-file=filename] [--staybelow=url-prefix] [--stayonhost] [--title=string] [--update-cache] [--url-pattern=pattern] [--user-agent=string] [--verbosity=verbosity-level] [--zlib-compression] [HOME-URL]

DESCRIPTION

plucker-build creates a Plucker binary document, which is a kind of e-book, from a URL. This document is formatted for the Plucker viewer program, which currently runs on Palm devices. The normal mode of operation is to take a home URL and 'pluck' it to produce a Plucker document, either to stdout, or to a file if --doc-file is specified. Alternatively, specifying the option --update-cache will update a cache of Plucker records (though it's not clear what this is good for). The Plucker document format is specified at http://www.plkr.org/index.pl/cvs/docs/DBFormat.html?rev=HEAD.

OPTIONS

Many options are also available as parameters in the configuration file $HOME/.pluckerrc, or in the default configuration file. Where applicable, the name of the configuration file parameter is shown after the documentation on the option. An option given on the command line will override any configuration file parameter. For more on configuration files, see below.
--alt-maxheight=pixel-height
Specifies the maximum height, in pixels, of the alternate rendition of an image. (When inline images are too large to be included full-size, they are converted into smaller versions, with sizes governed by the MAXHEIGHT and MAXWIDTH parameters, and are linked to larger renditions of the images, called the alternate rendition.) [alt_maxheight]
--alt-maxwidth=pixel-width
Specifies the maximum width, in pixels, of the alternate rendition of an image. (When inline images are too large to be included full-size, they are converted into smaller versions, with sizes governed by the MAXHEIGHT and MAXWIDTH parameters, and are linked to larger renditions of the images, called the alternate rendition.) [alt_maxwidth]
--author=string
Sets the author of the document to string, which is assumed to be in the charset of the document (see --charset), or ASCII if no charset is specified. [author_md]
--backup
Sets the bit in the output file that causes the document to be backed up on Palm HotSync. By default, the document is backed up. [backup_bit]
--beamable
Sets the bit in the output file that allows the document to be beamed. By default, the document is beamable. [copyprevention_bit]
--bpp=image-depth
Specifies the number of bits-per-pixel to be used for images. Valid values as of Plucker 1.1 are 0, 1 (the default), 2, 4, or 8. If 0 is specified, no images will be included in the document. See also --noimages. [bpp]
--category=default-category-name
Specifies a default Plucker category or categories to include in the document. If more than one category is specified, the category names should be separated by semicolons. [category]
--charset=charset-indicator
Specifies the default character set encoding used in the text of the documents being plucked. charset-indicator is either a charset name (from a small list; see src/parser/python/PyPlucker/__init__.py.in for a list of valid names), or a decimal integer indicating the charset's MIBenum value, as shown in the table at http://www.iana.org/assignments/character-sets. [default_charset]
--compression=compression-type
Specifies the type of compression to use in the document. There are two possible values for compression-type: doc or zlib. The default is doc, which is the same compression system used in Palm DOC-format documents. zlib compression usually results in smaller documents. See also --zlib-compression and --doc-compression. [compression]
--depth-first
Specifies a depth-first traversal of the web graph, rather than the default breadth-first traversal. This often works better on bushy acyclic graph structures than the breadth-first traversal. [depth_first]
--doc-file=name-prefix (or -f name-prefix)
also as -f name-prefix. Specifies the name of the document output file, without the directory (specified with --pluckerdir) or extension (always .pdb). If not specified, and if stdout is not a tty, the document will be written to stdout. [doc_file]
--doc-name=document-name (or -N document-name)
Specifies the short name by which the document will be identified in the viewer. Defaults to value of --doc-file. If --doc-file is not specified, the document name defaults to the home URL. This name should be limited to 26 characters. [doc_name]
--doc-compression
Specifies that Doc compression, the compression scheme developed for the Palm DOC format, should be used for the parts of this document. This is the default. See also --zlib-compression and --compression.
--exclusion-list=filename (or -E filename)
Used to add additional files to the the exclusion list, a list of files containing information on URLs to exclude from the document. See the User's Guide for more information on exclusion lists. [exclusion_lists]
--extra-section=section-name (or -s section-name)
Used to add additional sections to the list to searched sections in the configuration files. A section is a named set of configuration information. By default, the DEFAULT section will be searched, then any operating-system-specific sections, then any sections specified on the command line.
--help (or -h)
Outputs help on command-line parameters.
--home-url=base-url (or -H base-URL)
Specifies the URL from which the document is to be constructed. This may also be specified as a single argument on the command line. If a home URL is not specified, it will default to file:/$HOME/.plucker/home.html. This default may be changed in your .pluckerrc file. Note that this value must be a valid absolute URL. A special URL scheme is supported, plucker:. This specifies files on the Plucker search path, which consists of PluckerDir (the Plucker current working directory) followed by PluckerHome (the Plucker home directory). [home_url]
--icon=image-filename
If the output file is launchable, this switch can be used to specify the large icon shown in the launcher for the document. If not specified, a default icon is used. If the output file is not launchable, this switch has no effect. See also --launchable. [big_icon]
--launchable
Specifies that the output document should be shown as an icon in the system launcher. Clicking on the icon will start Plucker and select this document. By default, documents are not launchable. [launchable_bit]
--maxdepth=depth (or -M depth)
This specifies the number of levels of links the parser will traverse when converting the input. It is best to keep this value small, or the size of your document can get very large. If you want just a page, but none of the pages pointed to by that page, use a value of 1. [home_maxdepth]
--maxheight=pixel-height
Specifies the maximum height, in pixels, for an inline image. Overrides the MAXHEIGHT parameter in the configuration file, but is in turn overridden by any height specification in the image link itself. [maxheight]
--maxwidth=pixel-width
Specifies the maximum width, in pixels, for an inline image. Overrides the MAXWIDTH parameter in the configuration file, but is in turn overridden by any width specification in the image link itself. [maxwidth]
--no-backup
Clears the bit in the output file that causes the document to be backed up on Palm HotSync. By default, the document is backed up. [backup_bit]
--noimages
Specifies that no images will be included. Identical to --bpp=0. See also --bpp.
--not-beamable
Sets the bit in the output file that prevents the document from being beamed. By default, the document is beamable. [copyprevention_bit]
--not-launchable
Specifies that the output document should not be shown as an icon in the system launcher. By default, documents are not launchable. [launchable_bit]
--no-urlinfo
Specifies that no URL information will be included in the document. When links are included in documents, the information about the actual URL is included by default. This is often handy for external references (links to documents not included in the document). Use of this option may result in a slightly smaller document. [no_urlinfo]
--owner-id=name
Specifies an owner-id for the document. This causes the document to be lightly encrypted in such a way that it will only open on a device with a matching owner-id. With the PalmOS viewer, the HotSync UserName is used as the owner-id. [owner_id_build]
--pluckerhome=plucker-home-directory (or -P plucker-home-directory)
Overrides the default value for PluckerHome, which is $HOME/.plucker/. Can also be specified by setting the environment variable PLUCKERHOME. An explicit value for --pluckerhome overrides any setting of PLUCKERHOME. [PLUCKERHOME]
--pluckerdir=output-directory (or -p output-directory)
Overrides the default value for PluckerDir, which defaults to PluckerHome (see --pluckerhome). PluckerDir is the default directory to which output documents will be written, and which will be searched for input files if the plucker: URL scheme is used. [pluckerdir]
--quiet (or -q)
Same as --verbosity=0.
--referrer=string
When using HTTP to gather input, send string as the value of the Referrer HTTP header. Default is to send no referrer header. [referrer]
--status-file=filename
Gives the name of a file to read to get an estimate for the total number of pages that have to be processed, and to continually write with a single line giving the number of pages collected so far, the number of links still to process, and the estimated number of total pages that will be gathered (or zero if this is not known). The three values are written as space-separated ASCII numbers. The status line in the file is continually over-written as the pluck progresses, so the file will always contain only a single line. [status_file]
--staybelow=url-prefix
Automatically excludes all URLs that do not start with url-prefix. A handy way to process a subtree. [home_staybelow]
--stayondomain
Specifies that no web hosts other than those in the same domain as the original base URL will be visited for parts of the document. [home_stayondomain]
--stayonhost
Specifies that no web hosts other than that named in the original base URL will be visited for parts of the document. [home_stayonhost]
--title=string
Sets the title of the document to string. This is different from the name of the document (see --doc-name=) in that it may be relatively long. The string is assumed to be in the charset of the document (see --charset), or ASCII if no charset is specified. [title_md]
--update-cache (or -c)
Update the Plucker cache of records, rather than build a document. [use_cache]
--url-pattern=pattern
Automatically excludes all URLs that do not match the regular expression pattern. The regular expression language used is that of the Python 're' module, as specified in http://www.python.org/doc/current/lib/re-syntax.html. [home_url_pattern]
--user-agent=string
When using HTTP to gather input, send string as the value of the User-Agent HTTP header. Default is to send "Plucker/Py-XX", where XX is the Plucker version. [user_agent]
--verbosity=verbosity-level (or -V verbosity-level)
Sets the level of status information output to the value specified by verbosity-level. Appropriate values are 0, for total silence, 1, for standard progress status (the default value), and 2, for lots of output about gathering and parsing the input (usually reserved for debugging). Values larger than 2 will also work, but tend to give profuse output that's only useful to developers. See also --quiet. [verbosity]
--zlib-compression
Specifies that Zlib compression should be used for the parts of this document. This is considerably more efficient than the default compression format, Doc compression. See also --doc-compression and --compression.

EXAMPLES

To build a pocket version of the weekly cafeteria menu at the foo.com cafeteria, available on the Web at http://www.foo.com/ops/cafe/weeklymenu.html, without following any links, and without including any images, and naming the document "Cafeteria Menu", and putting the document in a file named /tmp/Menu.pdb, one would say:

% plucker-build http://www.foo.com/cafe/weeklymenu.html >/tmp/Menu.pdb

Or alternatively,

% plucker-build --pluckerdir=/tmp \

     --doc-name="Cafeteria Menu" \

     --doc-file=Menu \

     --home-url="http://www.foo.com/cafe/weeklymenu.html" \

     --maxdepth=1 \

     --bpp=0
Pluckerdir is '/tmp'...
---- 0 collected, 1 to do ----
Processing http://www.foo.com/cafe/weeklymenu.html...

  Retrieved ok.

  Parsed ok.
---- all pages retrieved and parsed ----

Writing out collected data...
Writing document 'Cafeteria Menu' to file /tmp/Menu.pdb
Converting http://www.foo.com/cafe/weeklymenu.html...
Wrote 1 <= plucker:/~special~/index
Wrote 2 <= http://www.foo.com/cafe/weeklymenu.html
Wrote 3 <= plucker:/~special~/pluckerlinks
Wrote 5 <= plucker:/~special~/metadata
Wrote 11 <= plucker:/~special~/links1
Done!
% ls -l /tmp/Menu.pdb
-rw-rw-r-- 1 user somegroup 2646 Nov 2 21:19 /tmp/Menu.pdb
%

ENVIRONMENT VARIABLES

HOME
Used to determine the location of the user's configuration file. If not set, the system-wide configuration file is used.
HTTP_PROXY, HTTP_PROXY_USER, HTTP_PROXY_PASS
If set, will be used to retrieve URLs with the http URL scheme.
PLUCKERHOME
Specifies value for PluckerHome. See the option --pluckerhome for more details.
PLUCKERDIR
Specifies value for PluckerDir. See the option --pluckerdir for more details.

CONFIGURATION FILES

Two configuration files are examined for customized settings of the various plucker-build parameters. The first is a system-wide configuration file, by default /usr/local/etc/pluckerrc, or /etc/pluckerrc in your Debian system. Any settings in this may be overridden with a personal configuration file, $HOME/.pluckerrc. Both files contain any number of sections, each of which may contain any number of configuration parameter settings. Each section has a name, which is enclosed in square brackets, followed by parameter settings. Normally, only the section called "default" will be examined. Extra sections may be specified with the --extra-section option to plucker-build; settings in these sections will override values in the default section.

Parameter settings have the form form name = value, where name is the name of a plucker-build parameter, and value is a string, integer, floating-point, or boolean value. A colon character (:) may be used instead of the equals sign to separate name and value. Comments may be expressed by starting any line with the characters "rem", or with the character "#", or with the character ";". Boolean values of True may be expressed with "TRUE", "true", "True", "on", or "1". Boolean values of False may be expressed with "FALSE", "false", "False", "off", or "0".

Configuration sections are often useful for specific often-used groups of options. It's possible to define these options in a section of the configuration file, and then just specify the section as the argument to plucker-build; the other options can all be drawn from the section.

The following parameters are understood:

PLUCKERHOME
See option --pluckerhome.
alt_maxheight
See option --alt-maxheight.
alt_maxwidth
See option --alt-maxwidth.
anchor_color
A color to draw all links in, expressed as one of the 16 standard Web color names, or in the Web standard RGB color notation. See the HTTP 4.0.1 specification for more details on allowed color names and RGB notation.
author_md
See option --author.
auto_scale_images
A boolean; if true, plucker-build will automatically attempt to convert images which are too large to include in the document, to a smaller form which will fit in the document. Defaults to false.
backup_bit
See option --backup.
big_icon
See option --icon.
bmp_to_tbmp
Name of the bmp2tbmp program in Windows. Defaults to Bmp2Tbmp.exe.
bmp_to_tbmp_parameter
Parameter for the bmp2tbmp program in the Windows ImageMagick image parser.
bpp
See option --bpp.
cache_dir_name
Specify the subdirectory of PluckerDir to use for cache storage. The default is "cache".
category
See option --category.
color_paragraphs
Boolean; if set, will insert a specific foreground color at beginning of every paragraph. Shouldn't be necessary, and defaults to off.
compression
See option --compression.
convert_program
If using the deprecated imagemagick image parser, the name of the convert program. Defaults to convert (convert.exe for Windows).
convert_program_parameter
Parameter for the Windows ImageMagick image parser's use of convert.
copyprevention_bit
See option --beamable.
db_file
Deprecated alternative to doc_file. May disappear in any release.
db_name
Deprecated alternative to doc_name. May disappear in any release.
default_charset
See option --charset.
depth_first
See option --depth-first.
djpeg_program
Name of the djpeg program. Defaults to djpeg. Used by the netpbm2 image parser.
doc_file
See option --doc-file.
doc_name
See option --doc-name.
exclusion_lists
See option --exclusion-list. If multiple files are specified here, they should be separated by the appropriate separator character for your operating system (a colon on Unix platforms, a semicolon on Windows platforms).
filename_extension
Extension to use for the filename. Defaults to pdb. Another possibility is plkr.
giftopnm_program
Name of program used to convert GIF image files to PNM image files. Used by the netpbm and netpbm2 image parsers. Defaults to giftopnm.
guess_tbmp_size
Boolean, defaults to on. Used by the Windows image parser.
home_maxdepth
See option --maxdepth.
home_staybelow
See option --staybelow.
home_stayondomain
See option --stayondomain.
home_stayonhost
See option --stayonhost.
home_url
See option --home-url.
home_url_pattern
See option --url-pattern.
http_proxy
String giving any HTTP proxy server to use. Sets the environment variable HTTP_PROXY to this value.
http_proxy_pass
String giving a password for any HTTP proxy. Sets the environment variable HTTP_PROXY_PASS to this value.
http_proxy_user
String giving a username for any HTTP proxy. Sets the environment variable HTTP_PROXY_USER to this value.
image_compression_limit
Integer giving the minimum number of image bytes to compress. Defaults to 0. Images smaller than this will not be compressed.
image_parser
String specifying which image parser to use. If not specified, a working default will be used. It's suggested that you not specify this configuration parameter unless you know what you are doing. Acceptable values are netpbm2, pil2, imagemagick2, netpbm (deprecated), pil (deprecated), imagemagick (deprecated), windowspil, windows (deprecated). This value is ignored in the Java version of plucker-build.
imagemagick_convert_command
Identifies the ImageMagick convert program in the imagemagick2 image parser. Defaults to convert.
indent_paragraphs
Boolean which when set will cause paragraphs to have leading indentation, but no extra leading space. Defaults to off.
launchable_bit
See option --launchable.
max_tbmp_size
Integer, maximum size for an image in the windows image parser.
maxheight
See option --maxheight.
maxwidth
See option --maxwidth.
no_dithering_in_java_image_quantization
Boolean, used in the Java plucker-build image parser to turn off dithering when an image is being quantized to the fixed set of colors used in Palm grayscale or eight-bit colormaps. Defaults to false.
no_urlinfo
See option --no-urlinfo.
owner_id_build
See option --owner-id.
palm1bit_graymap_file
String, used by the netpbm2 and netpbm image parsers to get the location of the Palm colormap file.
palm2bit_graymap_file
String, used by the netpbm2 and netpbm image parsers to get the location of the Palm colormap file.
palm4bit_graymap_file
String, used by the netpbm2 and netpbm image parsers to get the location of the Palm colormap file.
palm8bit_stdcolormap_file
String, used by the netpbm2 and netpbm image parsers to get the location of the Palm colormap file.
palmtopnm_program
String, used by the netpbm2 image parser, giving the location of the palmtopnm program. Defaults to palmtopnm.
pgmtopbm_program
String, used by the netpbm2 image parser, giving the location of the pgmtopbm program. Defaults to pgmtopbm.
pluckerdir
See option --pluckerdir.
pngtopnm_program
String, used by the netpbm2 image parser, giving the location of the pngtopnm program. Defaults to pngtopnm.
pnmcut_program
String, used by the netpbm2 image parser, giving the location of the pnmcut program. Defaults to pnmcut.
pnmdepth_program
String, used by the netpbm2 image parser, giving the location of the pnmdepth program. Defaults to pnmdepth.
pnmfile_program
String, used by the netpbm2 image parser, giving the location of the pnmfile program. Defaults to pnmfile.
pnmscale_program
String, used by the netpbm2 image parser, giving the location of the pnmscale program. Defaults to pnmscale.
ppmquant_program
String, used by the netpbm2 image parser, giving the location of the pnmquant program. Defaults to pnmquant.
ppmtoTbmp_program
String, used by various image parsers, giving the location of either the ppmtoTbmp program (in various deprecated image parsers), or in netpbm2, the pnmtopalm program. In netpbm2, defaults to pnmtopalm.
ppmtopgm_program
String, used by the netpbm2 image parser, giving the location of the ppmtopgm program. Defaults to ppmtopgm.
referrer
See option --referrer.
retrieval_timeout
Integer, used to attempt to set a timeout in seconds on all retrievals. Will not affect timeouts on Java version of plucker-build.
small_icon
Filename of file containing a Palm icon to use as the small icon for the document, if the launchable bit is set. Defaults to a built-in icon.
status_file
See option --status-file.
status_line_length
Integer, specifying, in characters, the length of status lines output by the distiller. Defaults to 60. If a line is too long, some of the characters in the center are elided.
tbmp_compression
Boolean, used by the windows image parser to indicate whether or not to use Palm compression on images. Defaults to true.
tbmp_compression_type
Apparently also boolean, used by the windows image parser to indicate whether or not to use Palm compression on images. Defaults to true. The difference between this parameter and tbmp_compression is not known.
title_md
See option --title.
try_reduce_bpp
Boolean, controls whether the image parser will attempt to scale a large picture to fit by reducing the number of bits-per-pixel of the image. Only valid for netpbm2, imagemagick2, pil2, java, and windows image parsers. Defaults to off. try_reduce_bpp has precedence over try_reduce_dimension or auto_scale_image.
try_reduce_dimension
Boolean, controls whether the image parser will attempt to scale a large picture to fit by reducing the size of the image. Only valid for netpbm2, imagemagick2, pil2, java, and windows parser.
use_cache
See option --update-cache. Misleadingly named.
user_agent
See option --user-agent.
verbosity
See option --verbosity.
zlib_compression
Specifies that zlib compression should be used. Deprecated in favor of compression.

BUGS

Report bugs using Debian BTs and the reportbug tool, or directly upstream to http://bugs.plkr.org/ or <[email protected]>

AUTHORS

Holger Duerer, <[email protected]>, and Bill Janssen, <[email protected]>