SYNOPSIS
table_dispatcher.py [switches] config.ini
DESCRIPTION
table_dispatcher is PgQ consumer that reads url encoded records from source queue and writes them into partitioned tables according to configuration file. Used to partiton data. For example change log's that need to kept online only shortly can be written to daily tables and then dropped as they become irrelevant. Also allows to select which columns have to be written into target database Creates target tables according to configuration file as needed.
QUICK-START
Basic table_dispatcher setup and usage can be summarized by the following steps:
- 1. PgQ must be installed in source database. See pgqadm man page for details. Target database must have pgq_ext schema installed.
- 2. edit a table_dispatcher configuration file, say table_dispatcher_sample.ini
-
3.
create source queue
-
$ pgqadm.py ticker.ini create <queue>
-
-
4.
launch table dispatcher in daemon mode
-
$ table_dispatcher.py table_dispatcher_sample.ini -d
-
- 5. start producing events
CONFIG
Common configuration parameters
job_name
- Name for particulat job the script does. Script will log under this name to logdb/logserver. The name is also used as default for PgQ consumer name. It should be unique.
pidfile
- Location for pid file. If not given, script is disallowed to daemonize.
logfile
- Location for log file.
loop_delay
- If continuisly running process, how long to sleep after each work loop, in seconds. Default: 1.
connection_lifetime
- Close and reconnect older database connections.
log_count
- Number of log files to keep. Default: 3
log_size
- Max size for one log file. File is rotated if max size is reached. Default: 10485760 (10M)
use_skylog
- If set, search for [./skylog.ini, ~/.skylog.ini, /etc/skylog.ini]. If found then the file is used as config file for Pythons logging module. It allows setting up fully customizable logging setup.
Common PgQ consumer parameters
pgq_queue_name
- Queue name to attach to. No default.
pgq_consumer_id
- Consumers ID to use when registering. Default: %(job_name)s
table_dispatcher parameters
src_db
- Source database.
dst_db
- Target database.
dest_table
- Where to put data. when partitioning, will be used as base name
part_field
- date field with will be used for partitioning.
part_template
- SQL code used to create partition tables. Various magic replacements are done there:
_PKEY
- comma separated list of primery key columns.
_PARENT
- schema-qualified parent table name.
_DEST_TABLE
- schema-qualified partition table.
_SCHEMA_TABLE
- same as DEST_TABLE but dots replaced with "_", to allow use as index names.
Example config
-
[table_dispatcher] job_name = table_dispatcher_source_table_targetdb
-
src_db = dbname=sourcedb dst_db = dbname=targetdb
-
pgq_queue_name = sourceq
-
logfile = log/%(job_name)s.log pidfile = pid/%(job_name)s.pid
-
# where to put data. when partitioning, will be used as base name dest_table = orders
-
# names of the fields that must be read from source records fields = id, order_date, customer_name
-
# date field with will be used for partitioning part_field = order_date
-
# template used for creating partition tables part_template = create table _DEST_TABLE () inherits (orders); alter table only _DEST_TABLE add constraint _DEST_TABLE_pkey primary key (id); grant select on _DEST_TABLE to group reporting;
COMMAND LINE SWITCHES
Following switches are common to all skytools.DBScript-based Python programs.
-h, --help
- show help message and exit
-q, --quiet
- make program silent
-v, --verbose
- make program more verbose
-d, --daemon
- make program go background
Following switches are used to control already running process. The pidfile is read from config then signal is sent to process id specified there.
-r, --reload
- reload config (send SIGHUP)
-s, --stop
- stop program safely (send SIGINT)
-k, --kill
- kill program immidiately (send SIGTERM)
LOGUTRIGA EVENT FORMAT
PgQ trigger function pgq.logutriga() sends table change event into queue in following format:
ev_type
-
(op || ":" || pkey_fields). Where op is either "I", "U" or "D", corresponging to insert, update or delete. And pkey_fields is comma-separated list of primary key fields for table. Operation type is always present but pkey_fields list can be empty, if table has no primary keys. Example: I:col1,col2
ev_data
- Urlencoded record of data. It uses db-specific urlecoding where existence of = is meaningful - missing = means NULL, present = means literal value. Example: id=3&name=str&nullvalue&emptyvalue=
ev_extra1
- Fully qualified table name.