duck(1) the Debian Url ChecKer

SYNOPSIS

duck [ OPTION ]... [-f file] [-u file] [-c file] [URL]

DESCRIPTION

duck extracts links, email address domains and VCS-* entries from the following files:

  • debian/control
  • debian/upstream, debian/upstream-metadata.yaml and debian/upstream/metadata
  • debian/copyright
  • DEP-3 patch files in every directory a series file is found
  • systemd.unit files (*.socket, *.device, *.mount, *.automount, *.swap, *.target, *.path, *.time, *.snapshot, *.slice, *.scope)
  • Appstream files (*.appdata)

If an URL is supplied, duck uses dget to download the specified URL and processes the downloaded source package instead of working on the current directory.

It tries to access those VCS-* entries and URLs using the appropriate tool to find out whether the given URLs or entries are broken or working. If errors are detected, the filename, fieldname and URL/email of the broken entry are displayed.

duck will search for the default files (see above) and skip them silently, if they cannot be found. If specific filenames for options -c, -f or -u are given, and one of those files cannot be found, duck exits with exit code 2.

Email address domains are checked for existing MX records, A records, or AAAA records, in this order. If none of these 3 are found for a given domain, it is considered broken.

Checks results are displayed with 3 different error levels

O:
(OK) Indicates that the given check did not result in an error. Only shown if -n is used.
I:
(Information) Indicates informational warnings, suchs as missing helper tools as well as failing checks based on searches in unstructured text files, which sometimes lead to false positives.
E:
(Error) Indicates failing checks based on data from well-defined fields (e.g. Homepage: entry in debian/control).

and 3 different certainty-levels

certain
Data taken from well defined fields. As the format of this field is specified (e.g. Debian Policy, etc.), it can be checked by the appropriate tools. If this check then fails, the data in the field is certainly erroneous.
possible
Data extracted using regular expressions (e.g. email addresses, URLs). This might lead to false positives, so the check result is possibly a false positive.
wild-guess
Data extracted from websites, by using regular expressions. This is still experimental and probably buggy, hence the "wild-guess".

OPTIONS

-v
verbose mode. This shows all URLs found and the checks run.
-q
quiet mode. Suppress all output.
-n
dry run. Don't run any checks, just show entries to be checked.
--modules-dir=DIRECTORY
specify modules directory. Mostly useful for developing new checks. If this parameter is specified, only modules defined in this directory are used. You have to copy all *.pm files from /usr/share/duck/lib/checks to the directory specified.
--color=[WHEN]
Specify when to emit escape sequences to the output. Available options are:
auto Emit color escape codes on STDOUT/STDERR, no color if output is piped to a file.

always Always emit color escape codes.

never Never emit color escape codes.

--no-https
do not try to find matching https URLs to http URLs. See also the DUCK_NOHTTPS environment variable.
--missing-helpers
display list of missing external helper tools and exits.
--version
display copyright and version information
-f
specify path to control file. This overrides the default debian/control.
-F
skip processing of the control file.
-u
specify path to upstream metadata file. This overrides the default files debian/upstream, debian/upstream-metadata.yaml and debian/upstream/metadata.
-U
skip processing of the upstream metadata file.
-c
specify path to copyright file. This overrides the default debian/copyright.
-C
skip processing of copyright file.
-P
skip processing of patch files.
-A
skip processing of appstream metadata files.
-S
skip processing of systemd.unit files.
--disable-urlfix=<fix1,...>
disables the specified url fix(es). An urlfix tries to remove leading/trailing characters from extracted URLs, like trailing dots or parentheses. Using this parameter enables all urlfixes minus the specified ones.

--enable-urlfix=<fix1,...>
enables the specified url fix(es). Using this parameter disabled all urlfixes minus the specified ones.

The following urlfixes are available:

TRAILING_COLON Removes trailing colon ":" character.

TRAILING_PAREN_DOT Removes the string ")." from the end of the URL.

TRAILING_PUNCTUATION Removes trailing "." and "," characters.

TRAILING_QUOTES Removes trailing single quotes "'" characters. Note: Double quotes (") are already correctly handled by the used perl regex matchers.

TRAILING_SLASH_DOT Removes the string "/." (without the quotes) from the end of the URL.

TRAILING_SLASH_PAREN Removes the string "/)" (without the quotes) from the end of the URL.

--tasks=[number]
Specify the number of checks allowed to run in parallel. Default value is 24. This value must be >0.

All urlfixes are enabled by default.

ENVIRONMENT VARIABLES

DUCK_NOHTTPS
If this variable is set, do not try to find matching https URLs to http URLs.
XDG_CONFIG_HOME
if this variable is set, use the config file (if any) $XDG_HOME/duck/duck.conf. The default value is $HOME/.config/duck/duck.conf .
XDG_CONFIG_DIRS
defines the preference-ordered set of base directories to search for configuration files in addition to the XDG_CONFIG_HOME base directory. The directories in XDG_CONFIG_DIRS should be separated with a colon ':'.

EXAMPLE

To run duck, change your working directory to an extracted debian source package and run: duck

EXIT STATUS

0
Success, no errors
1
Error(s) detected
2
User-specified file not found

FILES

debian/duck-overrides
Overrides-file in the Debian package source tree. This files contains a list of URL regexs which should not be reported as down/broken. This might be useful in cases, where URLs are extracted from old/outdated copyright-files or patches, which will never ever be working, and which will then lead to false positives. Please see an example in /usr/share/doc/duck/examples.
duck.conf
Config file which contains the regular expressions used to detect parked domains, redirected websites and The default file is in /etc/duck/duck.conf. duck also honors the XDG Base Directory Specification, see the section ENVIRONMENT VARIABLES for details. Search order for duck.conf is:

$XDG_CONFIG_HOME/duck/duck.conf (default: $HOME/.config/duck/duck.conf)

/etc/duck/duck.conf

/$XDG_CONFIG_DIRS (default: /etc/xdg/duck/duck.conf)

Please see the XDG Base Directory Specification (https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html) for more details.