SYNOPSIS
plucker-build [--alt-maxheight=pixel-height] [--alt-maxwidth=pixel-width] [--author=string] [--backup] [--beamable] [--bpp=image-depth] [--category=default-category-name] [--charset=charset-indicator] [--compression=compression-type] [--depth-first] [--doc-file=name-prefix] [--doc-name=document-name] [--doc-compression] [--exclusion-list=filename] [--extra-section=section-name] [--help] [--home-url=base-URL] [--icon=image-filename] [--launchable] [--maxdepth=depth] [--maxheight=pixel-height] [--maxwidth=pixel-width] [--no-backup] [--noimages] [--not-beamable] [--not-launchable] [--no-urlinfo] [--owner-id=name] [--pluckerdir=output-directory] [--pluckerhome=plucker-home-directory] [--quiet] [--referrer=string] [--status-file=filename] [--staybelow=url-prefix] [--stayonhost] [--title=string] [--update-cache] [--url-pattern=pattern] [--user-agent=string] [--verbosity=verbosity-level] [--zlib-compression] [HOME-URL]DESCRIPTION
plucker-build creates a Plucker binary document, which is a kind of e-book, from a URL. This document is formatted for the Plucker viewer program, which currently runs on Palm devices. The normal mode of operation is to take a home URL and 'pluck' it to produce a Plucker document, either to stdout, or to a file if --doc-file is specified. Alternatively, specifying the option --update-cache will update a cache of Plucker records (though it's not clear what this is good for). The Plucker document format is specified at http://www.plkr.org/index.pl/cvs/docs/DBFormat.html?rev=HEAD.OPTIONS
Many options are also available as parameters in the configuration file $HOME/.pluckerrc, or in the default configuration file. Where applicable, the name of the configuration file parameter is shown after the documentation on the option. An option given on the command line will override any configuration file parameter. For more on configuration files, see below.- --alt-maxheight=pixel-height
- Specifies the maximum height, in pixels, of the alternate rendition of an image. (When inline images are too large to be included full-size, they are converted into smaller versions, with sizes governed by the MAXHEIGHT and MAXWIDTH parameters, and are linked to larger renditions of the images, called the alternate rendition.) [alt_maxheight]
- --alt-maxwidth=pixel-width
- Specifies the maximum width, in pixels, of the alternate rendition of an image. (When inline images are too large to be included full-size, they are converted into smaller versions, with sizes governed by the MAXHEIGHT and MAXWIDTH parameters, and are linked to larger renditions of the images, called the alternate rendition.) [alt_maxwidth]
- --author=string
- Sets the author of the document to string, which is assumed to be in the charset of the document (see --charset), or ASCII if no charset is specified. [author_md]
- --backup
- Sets the bit in the output file that causes the document to be backed up on Palm HotSync. By default, the document is backed up. [backup_bit]
- --beamable
- Sets the bit in the output file that allows the document to be beamed. By default, the document is beamable. [copyprevention_bit]
- --bpp=image-depth
- Specifies the number of bits-per-pixel to be used for images. Valid values as of Plucker 1.1 are 0, 1 (the default), 2, 4, or 8. If 0 is specified, no images will be included in the document. See also --noimages. [bpp]
- --category=default-category-name
- Specifies a default Plucker category or categories to include in the document. If more than one category is specified, the category names should be separated by semicolons. [category]
- --charset=charset-indicator
- Specifies the default character set encoding used in the text of the documents being plucked. charset-indicator is either a charset name (from a small list; see src/parser/python/PyPlucker/__init__.py.in for a list of valid names), or a decimal integer indicating the charset's MIBenum value, as shown in the table at http://www.iana.org/assignments/character-sets. [default_charset]
- --compression=compression-type
- Specifies the type of compression to use in the document. There are two possible values for compression-type: doc or zlib. The default is doc, which is the same compression system used in Palm DOC-format documents. zlib compression usually results in smaller documents. See also --zlib-compression and --doc-compression. [compression]
- --depth-first
- Specifies a depth-first traversal of the web graph, rather than the default breadth-first traversal. This often works better on bushy acyclic graph structures than the breadth-first traversal. [depth_first]
- --doc-file=name-prefix (or -f name-prefix)
- also as -f name-prefix. Specifies the name of the document output file, without the directory (specified with --pluckerdir) or extension (always .pdb). If not specified, and if stdout is not a tty, the document will be written to stdout. [doc_file]
- --doc-name=document-name (or -N document-name)
- Specifies the short name by which the document will be identified in the viewer. Defaults to value of --doc-file. If --doc-file is not specified, the document name defaults to the home URL. This name should be limited to 26 characters. [doc_name]
- --doc-compression
- Specifies that Doc compression, the compression scheme developed for the Palm DOC format, should be used for the parts of this document. This is the default. See also --zlib-compression and --compression.
- --exclusion-list=filename (or -E filename)
- Used to add additional files to the the exclusion list, a list of files containing information on URLs to exclude from the document. See the User's Guide for more information on exclusion lists. [exclusion_lists]
- --extra-section=section-name (or -s section-name)
- Used to add additional sections to the list to searched sections in the configuration files. A section is a named set of configuration information. By default, the DEFAULT section will be searched, then any operating-system-specific sections, then any sections specified on the command line.
- --help (or -h)
- Outputs help on command-line parameters.
- --home-url=base-url (or -H base-URL)
- Specifies the URL from which the document is to be constructed. This may also be specified as a single argument on the command line. If a home URL is not specified, it will default to file:/$HOME/.plucker/home.html. This default may be changed in your .pluckerrc file. Note that this value must be a valid absolute URL. A special URL scheme is supported, plucker:. This specifies files on the Plucker search path, which consists of PluckerDir (the Plucker current working directory) followed by PluckerHome (the Plucker home directory). [home_url]
- --icon=image-filename
- If the output file is launchable, this switch can be used to specify the large icon shown in the launcher for the document. If not specified, a default icon is used. If the output file is not launchable, this switch has no effect. See also --launchable. [big_icon]
- --launchable
- Specifies that the output document should be shown as an icon in the system launcher. Clicking on the icon will start Plucker and select this document. By default, documents are not launchable. [launchable_bit]
- --maxdepth=depth (or -M depth)
- This specifies the number of levels of links the parser will traverse when converting the input. It is best to keep this value small, or the size of your document can get very large. If you want just a page, but none of the pages pointed to by that page, use a value of 1. [home_maxdepth]
- --maxheight=pixel-height
- Specifies the maximum height, in pixels, for an inline image. Overrides the MAXHEIGHT parameter in the configuration file, but is in turn overridden by any height specification in the image link itself. [maxheight]
- --maxwidth=pixel-width
- Specifies the maximum width, in pixels, for an inline image. Overrides the MAXWIDTH parameter in the configuration file, but is in turn overridden by any width specification in the image link itself. [maxwidth]
- --no-backup
- Clears the bit in the output file that causes the document to be backed up on Palm HotSync. By default, the document is backed up. [backup_bit]
- --noimages
- Specifies that no images will be included. Identical to --bpp=0. See also --bpp.
- --not-beamable
- Sets the bit in the output file that prevents the document from being beamed. By default, the document is beamable. [copyprevention_bit]
- --not-launchable
- Specifies that the output document should not be shown as an icon in the system launcher. By default, documents are not launchable. [launchable_bit]
- --no-urlinfo
- Specifies that no URL information will be included in the document. When links are included in documents, the information about the actual URL is included by default. This is often handy for external references (links to documents not included in the document). Use of this option may result in a slightly smaller document. [no_urlinfo]
- --owner-id=name
- Specifies an owner-id for the document. This causes the document to be lightly encrypted in such a way that it will only open on a device with a matching owner-id. With the PalmOS viewer, the HotSync UserName is used as the owner-id. [owner_id_build]
- --pluckerhome=plucker-home-directory (or -P plucker-home-directory)
- Overrides the default value for PluckerHome, which is $HOME/.plucker/. Can also be specified by setting the environment variable PLUCKERHOME. An explicit value for --pluckerhome overrides any setting of PLUCKERHOME. [PLUCKERHOME]
- --pluckerdir=output-directory (or -p output-directory)
- Overrides the default value for PluckerDir, which defaults to PluckerHome (see --pluckerhome). PluckerDir is the default directory to which output documents will be written, and which will be searched for input files if the plucker: URL scheme is used. [pluckerdir]
- --quiet (or -q)
- Same as --verbosity=0.
- --referrer=string
- When using HTTP to gather input, send string as the value of the Referrer HTTP header. Default is to send no referrer header. [referrer]
- --status-file=filename
- Gives the name of a file to read to get an estimate for the total number of pages that have to be processed, and to continually write with a single line giving the number of pages collected so far, the number of links still to process, and the estimated number of total pages that will be gathered (or zero if this is not known). The three values are written as space-separated ASCII numbers. The status line in the file is continually over-written as the pluck progresses, so the file will always contain only a single line. [status_file]
- --staybelow=url-prefix
- Automatically excludes all URLs that do not start with url-prefix. A handy way to process a subtree. [home_staybelow]
- --stayondomain
- Specifies that no web hosts other than those in the same domain as the original base URL will be visited for parts of the document. [home_stayondomain]
- --stayonhost
- Specifies that no web hosts other than that named in the original base URL will be visited for parts of the document. [home_stayonhost]
- --title=string
- Sets the title of the document to string. This is different from the name of the document (see --doc-name=) in that it may be relatively long. The string is assumed to be in the charset of the document (see --charset), or ASCII if no charset is specified. [title_md]
- --update-cache (or -c)
- Update the Plucker cache of records, rather than build a document. [use_cache]
- --url-pattern=pattern
- Automatically excludes all URLs that do not match the regular expression pattern. The regular expression language used is that of the Python 're' module, as specified in http://www.python.org/doc/current/lib/re-syntax.html. [home_url_pattern]
- --user-agent=string
- When using HTTP to gather input, send string as the value of the User-Agent HTTP header. Default is to send "Plucker/Py-XX", where XX is the Plucker version. [user_agent]
- --verbosity=verbosity-level (or -V verbosity-level)
- Sets the level of status information output to the value specified by verbosity-level. Appropriate values are 0, for total silence, 1, for standard progress status (the default value), and 2, for lots of output about gathering and parsing the input (usually reserved for debugging). Values larger than 2 will also work, but tend to give profuse output that's only useful to developers. See also --quiet. [verbosity]
- --zlib-compression
- Specifies that Zlib compression should be used for the parts of this document. This is considerably more efficient than the default compression format, Doc compression. See also --doc-compression and --compression.
EXAMPLES
To build a pocket version of the weekly cafeteria menu at the foo.com cafeteria, available on the Web at http://www.foo.com/ops/cafe/weeklymenu.html, without following any links, and without including any images, and naming the document "Cafeteria Menu", and putting the document in a file named /tmp/Menu.pdb, one would say:% plucker-build http://www.foo.com/cafe/weeklymenu.html >/tmp/Menu.pdb
Or alternatively,
% plucker-build --pluckerdir=/tmp \
--doc-name="Cafeteria Menu" \
--doc-file=Menu \
--home-url="http://www.foo.com/cafe/weeklymenu.html" \
--maxdepth=1 \
--bpp=0
Pluckerdir is '/tmp'...
---- 0 collected, 1 to do ----
Processing http://www.foo.com/cafe/weeklymenu.html...
Retrieved ok.
Parsed ok.
---- all pages retrieved and parsed ----
Writing out collected data...
Writing document 'Cafeteria Menu' to file /tmp/Menu.pdb
Converting http://www.foo.com/cafe/weeklymenu.html...
Wrote 1 <= plucker:/~special~/index
Wrote 2 <= http://www.foo.com/cafe/weeklymenu.html
Wrote 3 <= plucker:/~special~/pluckerlinks
Wrote 5 <= plucker:/~special~/metadata
Wrote 11 <= plucker:/~special~/links1
Done!
% ls -l /tmp/Menu.pdb
-rw-rw-r-- 1 user somegroup 2646 Nov 2 21:19 /tmp/Menu.pdb
%
ENVIRONMENT VARIABLES
- HOME
- Used to determine the location of the user's configuration file. If not set, the system-wide configuration file is used.
- HTTP_PROXY, HTTP_PROXY_USER, HTTP_PROXY_PASS
- If set, will be used to retrieve URLs with the http URL scheme.
- PLUCKERHOME
- Specifies value for PluckerHome. See the option --pluckerhome for more details.
- PLUCKERDIR
- Specifies value for PluckerDir. See the option --pluckerdir for more details.
CONFIGURATION FILES
Two configuration files are examined for customized settings of the various plucker-build parameters. The first is a system-wide configuration file, by default /usr/local/etc/pluckerrc, or /etc/pluckerrc in your Debian system. Any settings in this may be overridden with a personal configuration file, $HOME/.pluckerrc. Both files contain any number of sections, each of which may contain any number of configuration parameter settings. Each section has a name, which is enclosed in square brackets, followed by parameter settings. Normally, only the section called "default" will be examined. Extra sections may be specified with the --extra-section option to plucker-build; settings in these sections will override values in the default section.Parameter settings have the form form name = value, where name is the name of a plucker-build parameter, and value is a string, integer, floating-point, or boolean value. A colon character (:) may be used instead of the equals sign to separate name and value. Comments may be expressed by starting any line with the characters "rem", or with the character "#", or with the character ";". Boolean values of True may be expressed with "TRUE", "true", "True", "on", or "1". Boolean values of False may be expressed with "FALSE", "false", "False", "off", or "0".
Configuration sections are often useful for specific often-used groups of options. It's possible to define these options in a section of the configuration file, and then just specify the section as the argument to plucker-build; the other options can all be drawn from the section.
The following parameters are understood:
- PLUCKERHOME
- See option --pluckerhome.
- alt_maxheight
- See option --alt-maxheight.
- alt_maxwidth
- See option --alt-maxwidth.
- anchor_color
- A color to draw all links in, expressed as one of the 16 standard Web color names, or in the Web standard RGB color notation. See the HTTP 4.0.1 specification for more details on allowed color names and RGB notation.
- author_md
- See option --author.
- auto_scale_images
- A boolean; if true, plucker-build will automatically attempt to convert images which are too large to include in the document, to a smaller form which will fit in the document. Defaults to false.
- backup_bit
- See option --backup.
- big_icon
- See option --icon.
- bmp_to_tbmp
- Name of the bmp2tbmp program in Windows. Defaults to Bmp2Tbmp.exe.
- bmp_to_tbmp_parameter
- Parameter for the bmp2tbmp program in the Windows ImageMagick image parser.
- bpp
- See option --bpp.
- cache_dir_name
- Specify the subdirectory of PluckerDir to use for cache storage. The default is "cache".
- category
- See option --category.
- color_paragraphs
- Boolean; if set, will insert a specific foreground color at beginning of every paragraph. Shouldn't be necessary, and defaults to off.
- compression
- See option --compression.
- convert_program
- If using the deprecated imagemagick image parser, the name of the convert program. Defaults to convert (convert.exe for Windows).
- convert_program_parameter
- Parameter for the Windows ImageMagick image parser's use of convert.
- copyprevention_bit
- See option --beamable.
- db_file
- Deprecated alternative to doc_file. May disappear in any release.
- db_name
- Deprecated alternative to doc_name. May disappear in any release.
- default_charset
- See option --charset.
- depth_first
- See option --depth-first.
- djpeg_program
- Name of the djpeg program. Defaults to djpeg. Used by the netpbm2 image parser.
- doc_file
- See option --doc-file.
- doc_name
- See option --doc-name.
- exclusion_lists
- See option --exclusion-list. If multiple files are specified here, they should be separated by the appropriate separator character for your operating system (a colon on Unix platforms, a semicolon on Windows platforms).
- filename_extension
- Extension to use for the filename. Defaults to pdb. Another possibility is plkr.
- giftopnm_program
- Name of program used to convert GIF image files to PNM image files. Used by the netpbm and netpbm2 image parsers. Defaults to giftopnm.
- guess_tbmp_size
- Boolean, defaults to on. Used by the Windows image parser.
- home_maxdepth
- See option --maxdepth.
- home_staybelow
- See option --staybelow.
- home_stayondomain
- See option --stayondomain.
- home_stayonhost
- See option --stayonhost.
- home_url
- See option --home-url.
- home_url_pattern
- See option --url-pattern.
- http_proxy
- String giving any HTTP proxy server to use. Sets the environment variable HTTP_PROXY to this value.
- http_proxy_pass
- String giving a password for any HTTP proxy. Sets the environment variable HTTP_PROXY_PASS to this value.
- http_proxy_user
- String giving a username for any HTTP proxy. Sets the environment variable HTTP_PROXY_USER to this value.
- image_compression_limit
- Integer giving the minimum number of image bytes to compress. Defaults to 0. Images smaller than this will not be compressed.
- image_parser
- String specifying which image parser to use. If not specified, a working default will be used. It's suggested that you not specify this configuration parameter unless you know what you are doing. Acceptable values are netpbm2, pil2, imagemagick2, netpbm (deprecated), pil (deprecated), imagemagick (deprecated), windowspil, windows (deprecated). This value is ignored in the Java version of plucker-build.
- imagemagick_convert_command
- Identifies the ImageMagick convert program in the imagemagick2 image parser. Defaults to convert.
- indent_paragraphs
- Boolean which when set will cause paragraphs to have leading indentation, but no extra leading space. Defaults to off.
- launchable_bit
- See option --launchable.
- max_tbmp_size
- Integer, maximum size for an image in the windows image parser.
- maxheight
- See option --maxheight.
- maxwidth
- See option --maxwidth.
- no_dithering_in_java_image_quantization
- Boolean, used in the Java plucker-build image parser to turn off dithering when an image is being quantized to the fixed set of colors used in Palm grayscale or eight-bit colormaps. Defaults to false.
- no_urlinfo
- See option --no-urlinfo.
- owner_id_build
- See option --owner-id.
- palm1bit_graymap_file
- String, used by the netpbm2 and netpbm image parsers to get the location of the Palm colormap file.
- palm2bit_graymap_file
- String, used by the netpbm2 and netpbm image parsers to get the location of the Palm colormap file.
- palm4bit_graymap_file
- String, used by the netpbm2 and netpbm image parsers to get the location of the Palm colormap file.
- palm8bit_stdcolormap_file
- String, used by the netpbm2 and netpbm image parsers to get the location of the Palm colormap file.
- palmtopnm_program
- String, used by the netpbm2 image parser, giving the location of the palmtopnm program. Defaults to palmtopnm.
- pgmtopbm_program
- String, used by the netpbm2 image parser, giving the location of the pgmtopbm program. Defaults to pgmtopbm.
- pluckerdir
- See option --pluckerdir.
- pngtopnm_program
- String, used by the netpbm2 image parser, giving the location of the pngtopnm program. Defaults to pngtopnm.
- pnmcut_program
- String, used by the netpbm2 image parser, giving the location of the pnmcut program. Defaults to pnmcut.
- pnmdepth_program
- String, used by the netpbm2 image parser, giving the location of the pnmdepth program. Defaults to pnmdepth.
- pnmfile_program
- String, used by the netpbm2 image parser, giving the location of the pnmfile program. Defaults to pnmfile.
- pnmscale_program
- String, used by the netpbm2 image parser, giving the location of the pnmscale program. Defaults to pnmscale.
- ppmquant_program
- String, used by the netpbm2 image parser, giving the location of the pnmquant program. Defaults to pnmquant.
- ppmtoTbmp_program
- String, used by various image parsers, giving the location of either the ppmtoTbmp program (in various deprecated image parsers), or in netpbm2, the pnmtopalm program. In netpbm2, defaults to pnmtopalm.
- ppmtopgm_program
- String, used by the netpbm2 image parser, giving the location of the ppmtopgm program. Defaults to ppmtopgm.
- referrer
- See option --referrer.
- retrieval_timeout
- Integer, used to attempt to set a timeout in seconds on all retrievals. Will not affect timeouts on Java version of plucker-build.
- small_icon
- Filename of file containing a Palm icon to use as the small icon for the document, if the launchable bit is set. Defaults to a built-in icon.
- status_file
- See option --status-file.
- status_line_length
- Integer, specifying, in characters, the length of status lines output by the distiller. Defaults to 60. If a line is too long, some of the characters in the center are elided.
- tbmp_compression
- Boolean, used by the windows image parser to indicate whether or not to use Palm compression on images. Defaults to true.
- tbmp_compression_type
- Apparently also boolean, used by the windows image parser to indicate whether or not to use Palm compression on images. Defaults to true. The difference between this parameter and tbmp_compression is not known.
- title_md
- See option --title.
- try_reduce_bpp
- Boolean, controls whether the image parser will attempt to scale a large picture to fit by reducing the number of bits-per-pixel of the image. Only valid for netpbm2, imagemagick2, pil2, java, and windows image parsers. Defaults to off. try_reduce_bpp has precedence over try_reduce_dimension or auto_scale_image.
- try_reduce_dimension
- Boolean, controls whether the image parser will attempt to scale a large picture to fit by reducing the size of the image. Only valid for netpbm2, imagemagick2, pil2, java, and windows parser.
- use_cache
- See option --update-cache. Misleadingly named.
- user_agent
- See option --user-agent.
- verbosity
- See option --verbosity.
- zlib_compression
- Specifies that zlib compression should be used. Deprecated in favor of compression.