SYNOPSIS
pbs_sched [-a alarm] [-b file] [-d home] [-i file] [-L logfile] [-p file] [-S port] [-t file] [-v] [-c file]DESCRIPTION
The pbs_sched program runs in conjunction with the PBS server. It queries the server about the state of PBS and communicates with pbs_mom to get information about the status of running jobs, memory available etc. It then makes decisions as to what jobs to run.pbs_sched must be executed with root permission.
OPTIONS
- -a alarm
- This specifies the time in seconds to wait for a schedule run to finish. If a script takes too long to finish, an alarm signal is sent, and the scheduler is restarted. If a core file does not exist in the current directory, abort() is called and a core file is generated. The default for alarm is 180 seconds.
- -b file
- This specifies the "body" file. The file given is read into memory once at program start or after the program receives a SIGHUP and executed each time the scheduler is awakened by the server. If this option is not given, the file "sched_tcl" in the directory PBS_HOME/sched_priv is read for the body code.
- -d home
- This specifies the PBS home directory, PBS_HOME. The current working directory of the scheduler is PBS_HOME/sched_priv. If this option is not given, PBS_HOME defaults to $PBS_SERVER_HOME as defined during the PBS build procedure.
- -i file
- This specifies the "initialize" file. The file given is executed once before the main processing loop is entered. If this option is not given, no initialization code is executed.
- -L logfile
- Specifies an absolute path name of the file to use as the log file. If not specified, the scheduler will open a file named for the current date in the PBS_HOME/sched_logs directory (see the -d option).
- -p file
-
This specifies the "print" file. Any output from the Tcl
code which is written to standard out or standard error will be
written to this file.
If this option is not given, the file used will be
PBS_HOME/sched_priv/sched_out.
See the
option.
- -S port
- This specifies the port to use. If this option is not given, the default port for the PBS scheduler is used.
- -t file
- This specifies the "terminator" file. If a QUIT command is sent from the server, this code is executed before the scheduler exits. If this option is not given, no special termination handling is done.
- -v
- This puts the scheduler into "verbose" mode. Any errors will be shown no matter what this may be set to, but some "uninteresting" events may be logged by using this flag. An example is a message each time the server contacts the scheduler.
- -c file
- Specify a configuration file, see description below. If this is a relative file name it will be relative to PBS_HOME/sched_priv, see the -d option. If the -c option is not supplied, pbs_sched will not attempt to open a configuration file.
The options that specify file names may be absolute or relative. If they are relative, their root directory will be PBS_HOME/sched_priv.
USAGE
This version of the scheduler requires knowledge of the Tcl language. A set of functions to communicate with the PBS server and resource monitor have been added to those normally available with Tcl. All these calls will set the Tcl variable "pbs_errno" to a value to indicate if an error occured. In all cases, the value "0" means no error. If a call to a Resource Monitor function is made, any error value will come from the system supplied errno variable. If the function call communicates with the PBS Server, any error value will come from the error number returned by the server.- openrm host ?port?
- Creates a connection to the PBS Resource Monitor on host using port as the port number or the standard port for the resource monitor if it is not given. A connection handle is returned. If the open is successful, this will be a non-negative integer. If not, an error occurred.
- closerm connection
- The parameter connection is a handle to a resource monitor which was previously returned from openrm. This connection is closed. Nothing is returned.
- downrm connection
- Sends a command to the connected resource monitor to shutdown. Nothing is returned.
- configrm connection filename
- Sends a command to the connected resource monitor to read the configuration file given by filename. If this is successful, a "0" is returned, otherwise, "-1" is returned.
- addreq connection request
- A resource request is sent to the connected resource monitor. If this is successful, a "0" is returned, otherwise, "-1" is returned.
- getreq connection
- One resource request response from the connected resource monitor is returned. If an error occurred or there are no more responses, an empty string is returned.
- allreq request
- A resource request is sent to all connected resource monitors. The number of streams acted upon is returned.
- flushreq
- All resource requests previously sent to all connected resource monitors are flushed out to the network. Nothing is returned.
- activereq
- The connection number of the next stream with something to read is returned. If there is nothing to read from any of the connections, a negative number is returned.
- fullresp flag
- Evaluates flag as a boolean value and sets the response mode used by getreq to full if flag evaluates to "true". The full return from a resource monitor includes the original request followed by an equal sign followed by the response. The default situation is only to return the response following the equal sign. If a script needs to "see" the entire line, this function may be used.
- pbsstatserv
- The server is sent a status request for information about the server itself. If the request succeeds, a list with three elements is returned, otherwise an empty string is returned. The first element is the server's name. The second is a list of attributes. The third is the "text" associated with the server (usually blank).
- pbsstatjob
- The server is sent a status request for information about the all jobs resident within the server. If the request succeeds, a list is returned, otherwise an empty string is returned. The list contains an entry for each job. Each element is a list with three elements. The first is the job's jobid. The second is a list of attributes. The attribute names which specify resources will have a name of the form "Resource_List:name" where "name" is the resource name. The third is the "text" associated with the job (usually blank).
- pbsstatque
- The server is sent a status request for information about all queues resident within the server. If the request succeeds, a list is returned, otherwise an empty string is returned. The list contains an entry for each queue. Each element is a list with three elements. This first is the queue's name. The second is a list of attributes similar to pbsstatjob. The third is the "text" associated with the queue (usually blank).
- pbsstatnode
- The server is sent a status request for information about all nodes defined within the server. If the request succeeds, a list is returned, otherwise an empty string is returned. The list contains an entry for each node. Each element is a list with three elements. This first is the nodes's name. The second is a list of attributes similar to pbsstatjob. The third is the "text" associated with the node (usually blank).
- pbsselstat
- The server is sent a status request for information about the all runnable jobs resident within the server. If the request succeeds, a list similar to pbsstatjob is returned, otherwise an empty string is returned.
- pbsrunjob jobid ?location?
- Run the job given by jobid at the location given by location. If location is not given, the default location is used. If this is successful, a "0" is returned, otherwise, "-1" is returned.
- pbsasyrunjob jobid ?location?
- Run the job given by jobid at the location given by location without waiting for a positive response that the job has actually started. If location is not given, the default location is used. If this is successful, a "0" is returned, otherwise, "-1" is returned.
- pbsrerunjob jobid
- Re-runs the job given by jobid. If this is successful, a "0" is returned, otherwise, "-1" is returned.
- pbsdeljob jobid
- Delete the job given by jobid. If this is successful, a "0" is returned, otherwise, "-1" is returned.
- pbsholdjob jobid
- Place a hold on the job given by jobid. If this is successful, a "0" is returned, otherwise, "-1" is returned.
- pbsmovejob jobid ?location?
- Move the job given by jobid to the location given by location. If location is not given, the default location is used. If this is successful, a "0" is returned, otherwise, "-1" is returned.
- pbsqenable queue
- Set the "enabled" attribute for the queue given by queue to true. If this is successful, a "0" is returned, otherwise, "-1" is returned.
- pbsqdisable queue
- Set the "enabled" attribute for the queue given by queue to false. If this is successful, a "0" is returned, otherwise, "-1" is returned.
- pbsqstart queue
- Set the "started" attribute for the queue given by queue to true. If this is successful, a "0" is returned, otherwise, "-1" is returned.
- pbsqstop queue
- Set the "started" attribute for the queue given by queue to false. If this is successful, a "0" is returned, otherwise, "-1" is returned.
- pbsalterjob jobid attribute_list
- Alter the attributes for a job specified by jobid. The parameter attribute_list is the list of attributes to be altered. There can be more than one. Each attribute consists of a list of three elements. The first is the name, the second the resource and the third is the new value. If the alter is successful, a "0" is returned, otherwise, "-1" is returned.
- pbsrescquery resource_list
- Obtain information about the resources specified by resource_list. This will be a list of strings. If the request succeeds, a list with the same number of elements as resource_list is returned. Each element in this list will be a list with four numbers. The numbers specify available, allocated, reserved, and down in that order.
- pbsrescreserve resource_id resource_list
- Make (or extend) a reservation for the resources specified by resource_list which will be given as a list of strings. The parameter resource_id is a number which provides a unique identifier for a reservation being tracked by the server. If resource_id is given as "0", a new reservation is created. In this case, a new identifier is generated and returned by the function. If an old identifier is used, that same number will be returned. The Tcl variable "pbs_errno" will be set to indicate the success or failure of the reservation.
- pbsrescrelease resource_id
- The reservation specified by resource_id is released.
The two following commands are not normally used by the scheduler. They are included here because there could be a need for a scheduler to contact a server other than the one which it normally communicates with. Also, these commands are used by the Tcl tools.
- pbsconnect ?server?
- Make a connection to the named server or the default server if a parameter is not given. Only one connection to a server is allowed at any one time.
- pbsdisconnect
- Disconnect from the currently connected server.
The above Tcl functions use PBS interface library calls for communication with the server and the PBS resource monitor library to communicate with pbs_mom.
- datetime ?day? ?time?
-
The number of arguments used determine the type of
date to be calculated. With no arguments, the current POSIX
date is returned. This is an integer in seconds.
With one argument there are two possible formats. The first is a 12 (or more) character string specifying a complete date in the following format:
YYMMDDhhmmss
All characters must be digits. The year (YY) is given by the first two (or more) characters and is the number of years since 1900. The month (MM) is the number of the month [01-12]. The day (DD) is the day of the month [01-32]. The hour (hh) is the hour of the day [00-23]. The minute (mm) is minutes after the hour [00-59]. The second (ss) is seconds after the minute [00-59]. The POSIX date for the given date/time is returned.The second option with one argument is a relative time. The format for this is
HH:MM:SS
With hours (HH), minutes (MM) and seconds (SS) being separated by colons ":". The number returned in this case will be the number of seconds in the interval specified, not an absolute POSIX date.
With two arguments a relative date is calculated. The first argument specifies a day of the week and must be one of the following strings: "Sun", "Mon", "Tue", "Wed", "Thr", "Fri", or "Sat". The second argument is a relative time as given above. The POSIX date calculated will be the day of the week given which follows the current day, and the time given in the second argument. For example, if the current day was Monday, and the two arguments were "Fri" and "04:30:00", the date calculated would be the POSIX date for the Friday following the current Monday, at four-thirty in the morning. If the day specified and the current day are the same, the current day is used, not the day one week later.
- strftime format time
- This function calls the POSIX function strftime(). It requires two arguments. The first is a format string. The format conventions are the same as those for the POSIX function strftime(). The second argument is POSIX calendar time in second as returned by datetime. It returns a string based on the format given. This gives the ability to extract information about a time, or format it for printing.
The Tcl interpreter is started at program initialization and after a reset (the receipt of a SIGHUP signal). It is not deleted between scheduling runs so variables which are set in one can be accessed later.
The "initialize" and "terminator" files are run with no supplied connection to the server. This means that none of the above functions which talk to the server will work unless pbsconnect is called first. The "body" file is run with a connection to the server already established.
CONFIGURATION FILE
A configuration file may be specified with the -c option. This file may be used to specify the hosts (servers) which are allowed to connect to pbs_sched. The hosts are specified in the configuration file in a manor identical to that used in pbs_mom. There is one line per host with the syntax:where clienthost and hostname are separated by white space.
Two host names are always allowed to connection to pbs_sched, "localhost" and the name returned to pbs_sched by the system call gethostname(). These names need not be specified in the configuration file.
The configuration file must be "secure". It must be owned by a user id and group id less than 10 and not be world writable.
FILES
- $PBS_SERVER_HOME/sched_priv
- the default directory for configuration files, typically (/usr/spool/pbs)/sched_priv.
Signal Handling
A C based scheduler will handle the following signals:- SIGHUP
- The server will close and reopen its log file and reread the config file if one exists.
- SIGALRM
- If the site supplied scheduling module exceeds the time limit, the Alarm will cause the scheduler to attempt to core dump and restart itself.
- SIGINT and SIGTERM
- Will result in an orderly shutdown of the scheduler.
All other signals have the default action installed.