Slurm(3) Perl API for libslurm

SYNOPSIS


use Slurm;
my $slurm = Slurm::new();
$nodes = $slurm->load_node();
unless($nodes) {
die "failed to load node info: " . $slurm->strerror();
}

DESCRIPTION

The Slurm class provides Perl interface of the SLURM API functions in "<slurm/slurm.h>", with some extra frequently used functions exported by libslurm.

METHODS

To use the API, first create a Slurm object:

    $slurm = Slurm::new($conf);

Then call the desired functions:

    $resp = $slurm->load_jobs();

In the following ``METHODS'' section, if a parameter is omitted, it will be listed as ``param=val'' , where ``val'' is the default value of the parameter.

DATA STRUCTURES

Typically, C structures are converted to (maybe blessed) Perl hash references, with field names as hash keys. Arrays in C are converted to arrays in Perl. For example, there is a structure "job_info_msg_t":

    typedef struct job_info_msg {
        time_t last_update;     /* time of latest info */
        uint32_t record_count;  /* number of records */
        job_info_t *job_array;  /* the job records */
    } job_info_msg_t;

This will be converted to a hash reference with the following structure:

    {
        last_update => 1285847672,
        job_array => [ {account => 'test', alloc_node => 'ln0', alloc_sid => 1234, ...},
                       {account => 'debug', alloc_node => 'ln2', alloc_sid => 5678, ...},
                       ...
                     ]
    }

Note the missing of the "record_count" field in the hash. It can be derived from the number of elements in array "job_array".

To pass parameters to the API functions, use the corresponding hash references, for example:

    $rc = $slurm->update_node({node_names => 'node[0-7]', node_state => NODE_STATE_DRAIN});

Please see "<slurm/slurm.h>" for the definition of the structures.

CONSTANTS

The enumerations and macro definitions are available in the Slurm package. If ':constant' is given when using the Slurm package, the constants will be exported to the calling package.

Please see Slurm::Constant for the available constants.

METHODS

CONSTRUCTOR/DESTRUCTOR

$slurm = Slurm::new($conf_file=undef);

Create a Slurm object. For now the object is just a hash reference with no members.

  • IN $conf_file: the SLURM configuration file. If omitted, the default SLURM configuration file will be used (file specified by environment variable SLURM_CONF or the file slurm.conf under directroy specified in compile time).
  • RET: blessed opaque Slurm object. On error "undef" is returned.

ERROR INFORMATION FUNCTIONS

$errno = $slurm->get_errno();

Get the error number associated with last operation.

  • RET: error number associated with last operation.

$str = $slurm->strerror($errno=0)

Get the string describing the specified error number.

  • IN $errno: error number. If omitted or 0, the error number returned by "$slurm-"get_errno()> will be used.
  • RET: error string.

ENTITY STATE/REASON/FLAG/TYPE STRING FUNCTIONS

$str = $slurm->preempt_mode_string($mode_num);

Get the string describing the specified preemt mode number.

  • IN $mode_num: preempt mode number.
  • RET: preempt mode string.

$num = $slurm->preempt_mode_num($mode_str);

Get the preempt mode number of the specified preempt mode string.

  • IN $mode_str: preempt mode string.
  • RET: preempt mode number.

$str = $slurm->job_reason_string($num);

Get the string representation of the specified job state reason number.

  • IN $num: job reason number.
  • RET: job reason string.

$str = $slurm->job_state_string($num);

Get the string representation of the specified job state number.

  • IN $num: job state number.
  • RET: job state string.

$str = $slurm->job_state_string_compact($num);

Get the compact string representation of the specified job state number.

  • IN $num: job state number.
  • RET: compact job state string.

$num = $slurm->job_state_num($str);

Get the job state number of the specified (compact) job state string.

  • IN $str: job state string.
  • RET: job state number.

$str = $slurm->reservation_flags_string($flags);

Get the string representation of the specified reservation flags.

  • IN $num: reservation flags number.
  • RET: reservation flags string.

$str = $slurm->node_state_string($num);

Get the string representation of the specified node state number.

  • IN $num: node state number.
  • RET: node state string.

$str = $slurm->node_state_string_compact($num);

Get the compact string representation of the specified node state number.

  • IN $num: node state number.
  • RET: compact node state string.

$str = $slurm->private_data_string($num);

Get the string representation of the specified private data type.

  • IN $num: private data type number.
  • RET: private data type string.

$str = $slurm->accounting_enforce_string($num);

Get the string representation of the specified accounting enforce type.

  • IN $num: accounting enforce type number.
  • RET: accounting enforce type string.

$str = $slurm->conn_type_string($num);

Get the string representation of the specified connection type.

  • IN $num: connection type number.
  • RET: connection type string.

$str = $slurm->node_use_string($num);

Get the string representation of the specified node usage type.

  • IN $num: node usage type number.
  • RET: node usage type string.

$str = $slurm->bg_block_state_string($num);

Get the string representation of the specified BlueGene block state.

  • IN $num: BG block state number.
  • RET: BG block state string.

RESOURCE ALLOCATION FUNCTIONS

$resp = $slurm->allocate_resources($job_desc);

Allocate resources for a job request. If the requested resources are not immediately available, the slurmctld will send the job_alloc_resp_msg to the sepecified node and port.

  • IN $job_desc: description of resource allocation request, with sturcture of "job_desc_msg_t".
  • RET: response to request, with structure of "resource_allocation_response_msg_t". This only represents a job allocation if resources are immediately available. Otherwise it just contains the job id of the enqueued job request. On failure "undef" is returned.

$resp = $slurm->allocate_resources_blocking($job_desc, $timeout=0, $pending_callbacks=undef);

Allocate resources for a job request. This call will block until the allocation is granted, or the specified timeout limit is reached.

  • IN $job_desc: description of resource allocation request, with sturcture of "job_desc_msg_t".
  • IN $timeout: amount of time, in seconds, to wait for a response before giving up. A timeout of zero will wait indefinitely.
  • IN $pending_callbacks: If the allocation cannot be granted immediately, the controller will put the job in the PENDING state. If pending callback is given, it will be called with the job id of the pending job as the sole parameter.
  • RET: allcation response, with structure of "resource_allocation_response_msg_t". On failure "undef" is returned, with errno set.

$resp = $slurm->allocation_lookup($job_id);

Retrieve info for an existing resource allocation.

  • IN $job_id: job allocation identifier.
  • RET: job allocation info, with structure of "job_alloc_info_response_msg_t". On failure "undef" is returned with errno set.

$resp = $slurm->allocatiion_lookup_lite($job_id);

Retrieve minor info for an existing resource allocation.

  • IN $job_id: job allocation identifier.
  • RET: job allocation info, with structure of "resource_allocation_response_msg_t". On failure "undef" is returned with errno set.

$str = $slurm->read_hostfile($filename, $n);

Read a specified SLURM hostfile. The file must contain a list of SLURM NodeNames, one per line.

  • IN $filename: name of SLURM hostlist file to be read.
  • IN $n: number of NodeNames required.
  • RET: a string representing the hostlist. Returns NULL if there are fewer than $n hostnames in the file, or if an error occurs.

$msg_thr = $slurm->allocation_msg_thr_create($port, $callbacks);

Startup a message handler talking with the controller dealing with messages from the controller during an allocation.

  • OUT $port: port we are listening for messages on from the controller.
  • IN $callbacks: callbacks for different types of messages, with structure of "slurm_allocation_callbacks_t".
  • RET: opaque object of "allocation_msg_thread_t *", or NULL on failure.

$slurm->allocation_msg_thr_destroy($msg_thr);

Shutdown the message handler talking with the controller dealing with messages from the controller during an allocation.

  • IN $msg_thr: opaque object of "allocation_msg_thread_t" pointer.

$resp = $slurm->submit_batch_job($job_desc_msg);

Issue RPC to submit a job for later execution.

  • IN $job_desc_msg: description of batch job request, with structure of "job_desc_msg_t".
  • RET: 0 on success, otherwise return -1 and set errno to indicate the error.

$rc = $slurm->job_will_run($job_desc_msg);

Determine if a job would execute immediately if submitted now.

  • IN $job_desc_msg: description of resource allocation request, with structure of "job_desc_msg_t".
  • RET: 0 on success, otherwise return -1 and set errno to indicate the error.

$resp = $slurm->sbcast_lookup($job_id);

Retrieve info for an existing resource allocation including a credential needed for sbcast.

  • IN $jobid: job allocation identifier.
  • RET: job allocation information includeing a credential for sbcast, with structure of "job_sbcast_cred_msg_t". On failure "undef" is returned with errno set.

JOB/STEP SIGNALING FUNCTIONS

$rc = $slurm->kill_job($job_id, $signal, $batch_flag=0);

Send the specified signal to all steps of an existing job.

  • IN $job_id: the job's id.
  • IN $signal: signal number.
  • IN $batch_flag: 1 to signal batch shell only, otherwise 0.
  • RET: 0 on success, otherwise return -1 and set errno to indicate the error.

$rc = $slurm->kill_job_step($job_id, $step_id, $signal);

Send the specified signal to an existing job step.

  • IN $job_id: the job's id.
  • IN $step_id: the job step's id.
  • IN $signal: signal number.
  • RET: 0 on success, otherwise return -1 and set errno to indicate the error.

$rc = $slurm->signal_job($job_id, $signal);

Send the specified signal to all steps of an existing job.

  • IN $job_id: the job's id.
  • IN $signal: signal number.
  • RET: 0 on success, otherwise return -1 and set errno to indicate the error.

$rc = $slurm->signal_job_step($job_id, $step_id, $signal);

Send the specified signal to an existing job step.

  • IN $job_id: the job's id.
  • IN $step_id: the job step's id.
  • IN $signal: signal number.
  • RET: 0 on success, otherwise return -1 and set errno to indicate the error.

JOB/STEP COMPLETION FUNCTIONS

$rc = $slurm->complete_job($job_id, $job_rc=0);

Note the completion of a job and all of its steps.

  • IN $job_id: the job's id.
  • IN $job_rc: the highest exit code of any task of the job.
  • RET: 0 on success, otherwise return -1 and set errno to indicate the error.

$rc = $slurm->terminate_job_step($job_id, $step_id);

Terminates a job step by sending a REQUEST_TERMINATE_TASKS rpc to all slurmd of a job step, and then calls slurm_complete_job_step() after verifying that all nodes in the job step no longer have running tasks from the job step. (May take over 35 seconds to return.)

  • IN $job_id: the job's id.
  • IN $step_id: the job step's id - use SLURM_BATCH_SCRIPT as the step_id to terminate a job's batch script.
  • RET: 0 on success, otherwise return -1 and set errno to indicate the error.

SLURM TASK SPAWNING FUNCTIONS

$ctx = $slurm->step_ctx_create($params);

Create a job step and its context.

  • IN $params: job step parameters, with structure of "slurm_step_ctx_params_t".
  • RET: the step context. On failure "undef" is returned with errno set.

$ctx = $slurm->step_ctx_create_no_alloc($params);

Create a job step and its context without getting an allocation.

  • IN $params: job step parameters, with structure of "slurm_step_ctx_params_t"..
  • IN $step_id: fake job step id.
  • RET: the step context. On failure "undef" is returned with errno set.

SLURM CONTROL CONFIGURATION READ/PRINT/UPDATE FUNCTIONS

($major, $minor, $micro) = $slurm->api_version();

Get the SLURM API's version number.

  • RET: a three element list of the major, minor, and micro version number.

$resp = $slurm->load_ctl_conf($update_time=0);

Issue RPC to get SLURM control configuration information if changed.

  • IN $update_time: time of current configuration data.
  • RET: SLURM configuration data, with structure of "slurm_ctl_conf_t". On failure "undef" is returned with errno set.

$slurm->print_ctl_conf($out, $conf);

Output the contents of SLURM control configuration message as loaded using "load_ctl_conf()".

  • IN $out: file to write to.
  • IN $conf: SLURM control configuration, with structure of "slurm_ctl_conf_t".

$list = $slurm->ctl_conf_2_key_pairs($conf);

Put the SLURM configuration data into a List of opaque data type "config_key_pair_t".

  • IN $conf: SLURM control configuration, with structure of "slurm_ctl_conf_t".
  • RET: List of opaque data type "config_key_pair_t".

$resp = $slurm->load_slurmd_status();

Issue RPC to get the status of slurmd daemon on this machine.

  • RET: slurmd status info, with structure of "slurmd_status_t". On failure "undef" is returned with errno set.

$slurm->print_slurmd_status($out, $slurmd_status);

Output the contents of slurmd status message as loaded using "load_slurmd_status()".

  • IN $out: file to write to.
  • IN $slurmd_status: slurmd status info, with structure of "slurmd_status_t".

$slurm->print_key_pairs($out, $key_pairs, $title);

Output the contents of key_pairs which is a list of opaque data type "config_key_pair_t".

  • IN $out: file to write to.
  • IN $key_pairs: List containing key pairs to be printed.
  • IN $title: title of key pair list.

$rc = $slurm->update_step($step_msg);

Update the time limit of a job step.

  • IN $step_msg: step update messasge descriptor, with structure of "step_update_request_msg_t".
  • RET: 0 or -1 on error.

SLURM JOB RESOURCES READ/PRINT FUNCTIONS

$num = $slurm->job_cpus_allocated_on_node_id($job_res, $node_id);

Get the number of cpus allocated to a job on a node by node id.

  • IN $job_res: job resources data, with structure of "job_resources_t".
  • IN $node_id: zero-origin node id in allocation.
  • RET: number of CPUs allocated to job on this node or -1 on error.

$num = $slurm->job_cpus_allocated_on_node($job_res, $node_name);

Get the number of cpus allocated to a job on a node by node name.

  • IN $job_res: job resources data, with structure of "job_resources_t".
  • IN $node_name: name of node.
  • RET: number of CPUs allocated to job on this node or -1 on error.

SLURM JOB CONFIGURATION READ/PRINT/UPDATE FUNCTIONS

$time = $slurm->get_end_time($job_id);

Get the expected end time for a given slurm job.

  • IN $jobid: SLURM job id.
  • RET: scheduled end time for the job. On failure "undef" is returned with errno set.

$secs = $slurm->get_rem_time($job_id);

Get the expected time remaining for a given job.

  • IN $jobid: SLURM job id.
  • RET: remaining time in seconds or -1 on error.

$rc = $slurm->job_node_ready($job_id);

Report if nodes are ready for job to execute now.

IN $job_id: SLURM job id.
RET:
  • READY_JOB_FATAL: fatal error
  • READY_JOB_ERROR: ordinary error
  • READY_NODE_STATE: node is ready
  • READY_JOB_STATE: job is ready to execute

$resp = $slurm->load_job($job_id, $show_flags=0);

Issue RPC to get job information for one job ID.

  • IN $job_id: ID of job we want information about.
  • IN $show_flags: job filtering options.
  • RET: job information, with structure of "job_info_msg_t". On failure "undef" is returned with errno set.

$resp = $slurm->load_jobs($update_time=0, $show_flags=0);

Issue RPC to get all SLURM job information if changed.

  • IN $update_time: time of current job information data.
  • IN $show_flags: job filtering options.
  • RET: job information, with structure of "job_info_msg_t". On failure "undef" is returned with errno set.

$rc = $slurm->notify_job($job_id, $message);

Send message to the job's stdout, usable only by user root.

  • IN $job_id: SLURM job id or 0 for all jobs.
  • IN $message: arbitrary message.
  • RET: 0 or -1 on error.

$job_id = $slurm->pid2jobid($job_pid);

Issue RPC to get the SLURM job ID of a given process ID on this machine.

  • IN $job_pid: process ID of interest on this machine.
  • RET: corresponding job ID. On failure "undef" is returned.

$slurm->print_job_info($out, $job_info, $one_liner=0);

Output information about a specific SLURM job based upon message as loaded using "load_jobs()".

  • IN $out: file to write to.
  • IN $job_info: an individual job information record, with structure of "job_info_t".
  • IN $one_liner: print as a single line if true.

$slurm->print_job_info_msg($out, $job_info_msg, $one_liner=0);

Output information about all SLURM jobs based upon message as loaded using "load_jobs()".

  • IN $out: file to write to.
  • IN $job_info_msg: job information message, with structure of "job_info_msg_t".
  • IN $one_liner: print as a single line if true.

$str = $slurm->sprint_job_info($job_info, $one_liner=0);

Output information about a specific SLURM job based upon message as loaded using "load_jobs()".

  • IN $job_info: an individual job information record, with structure of "job_info_t".
  • IN $one_liner: print as a single line if true.
  • RET: string containing formatted output.

$rc = $slurm->update_job($job_info);

Issue RPC to a job's configuration per request only usable by user root or (for some parameters) the job's owner.

  • IN $job_info: description of job updates, with structure of "job_desc_msg_t".
  • RET: 0 on success, otherwise return -1 and set errno to indicate the error.

SLURM JOB STEP CONFIGURATION READ/PRINT/UPDATE FUNCTIONS

$resp = $slurm->get_job_steps($update_time=0, $job_id=NO_VAL, $step_id=NO_VAL, $show_flags=0);

Issue RPC to get specific slurm job step configuration information if changed since update_time.

  • IN $update_time: time of current configuration data.
  • IN $job_id: get information for specific job id, NO_VAL for all jobs.
  • IN $step_id: get information for specific job step id, NO_VAL for all job steps.
  • IN $show_flags: job step filtering options.
  • RET: job step information, with structure of "job_step_info_response_msg_t". On failure "undef" is returned with errno set.

$slurm->print_job_step_info_msg($out, $step_info_msg, $one_liner);

Output information about all SLURM job steps based upon message as loaded using "get_job_steps()".

  • IN $out: file to write to.
  • IN $step_info_msg: job step information message, with structure of "job_step_info_response_msg_t".
  • IN $one_liner: print as a single line if true.

$slurm->print_job_step_info($out, $step_info, $one_liner);

Output information about a specific SLURM job step based upon message as loaded using "get_job_steps()".

  • IN $out: file to write to.
  • IN $step_info: job step information, with structure of "job_step_info_t".
  • IN $one_liner: print as a single line if true.

$str = $slurm->sprint_job_step_info($step_info, $one_liner);

Output information about a specific SLURM job step based upon message as loaded using "get_job_steps()".

  • IN $step_info: job step information, with structure of "job_step_info_t".
  • IN $one_liner: print as a single line if true.
  • RET: string containing formatted output.

$layout = $slurm->job_step_layout_get($job_id, $step_id);

Get the layout structure for a particular job step.

  • IN $job_id: SLURM job ID.
  • IN $step_id: SLURM step ID.
  • RET: layout of the job step, with structure of "slurm_step_layout_t". On failure "undef" is returned with errno set.

$resp = $slurm->job_step_stat($job_id, $step_id, $nodelist=undef);

Get status of a current step.

  • IN $job_id : SLURM job ID.
  • IN $step_id: SLURM step ID.
  • IN $nodelist: nodes to check status of step. If omitted, all nodes in step are used.
  • RET: response of step status, with structure of "job_step_stat_response_msg_t". On failure "undef" is returned.

$resp = $slurm->job_step_get_pids($job_id, $step_id, $nodelist);

Get the complete list of pids for a given job step.

  • IN $job_id: SLURM job ID.
  • IN $step_id: SLURM step ID.
  • IN $nodelist: nodes to check pids of step. If omitted, all nodes in step are used.
  • RET: response of pids information, with structure of "job_step_pids_response_msg_t". On failure "undef" is returned.

SLURM NODE CONFIGURATION READ/PRINT/UPDATE FUNCTIONS

$resp = $slurm->load_node($update_time=0, $show_flags=0);

Issue RPC to get all node configuration information if changed.

  • IN $update_time: time of current configuration data.
  • IN $show_flags: node filtering options.
  • RET: response hash reference with structure of "node_info_msg_t". On failure "undef" is returned with errno set.

$slurm->print_node_info_msg($out, $node_info_msg, $one_liner=0);

Output information about all SLURM nodes based upon message as loaded using "load_node()".

  • IN $out: FILE handle to write to.
  • IN $node_info_msg: node information message to print, with structure of "node_info_msg_t".
  • IN $one_liner: if true, each node info will be printed as a single line.

$slurm->print_node_table($out, $node_info, $node_scaling=1, $one_liner=0);

Output information about a specific SLURM node based upon message as loaded using "load_node()".

  • IN $out: FILE handle to write to.
  • IN $node_info: an individual node information record with structure of "node_info_t".
  • IN $node_scaling: the number of nodes each node information record represents.
  • IN $one_liner: whether to print as a single line.

$str = $slurm->sprint_node_table($node_info, $node_scaling=1, $one_liner=0);

Output information about a specific SLURM node based upon message as loaded using "load_node".

  • IN $node_info: an individual node information record with structure of "node_info_t".
  • IN $node_scaling: number of nodes each node information record represents.
  • IN $one_liner: whether to print as a single line.
  • RET: string containing formatted output on success, "undef" on failure.

$rc = $slurm->update_node($node_info);

Issue RPC to modify a node's configuration per request, only usable by user root.

  • IN $node_info: description of node updates, with structure of "update_node_msg_t".
  • RET: 0 on success, -1 on failure with errno set.

SLURM SWITCH TOPOLOGY CONFIGURATION READ/PRINT FUNCTIONS

$resp = $slurm->load_topo();

Issue RPC to get all switch topology configuration information.

  • RET: response hash reference with structure of "topo_info_response_msg_t". On failure "undef" is returned with errno set.

$slurm->print_topo_info_msg($out, $topo_info_msg, $one_liner=0);

Output information about all switch topology configuration information based upon message as loaded using "load_topo()".

  • IN $out: FILE handle to write to.
  • IN $topo_info_msg: swith topology information message, with structure of "topo_info_response_msg_t".
  • IN $one_liner: print as a single line if not zero.

$slurm->print_topo_record($out, $topo_info, $one_liner);

Output information about a specific SLURM topology record based upon message as loaded using "load_topo()".

  • IN $out: FILE handle to write to.
  • IN $topo_info: an individual switch information record, with structure of "topo_info_t".
  • IN $one_liner: print as a single line if not zero.

SLURM SELECT READ/PRINT/UPDATE FUNCTIONS

$rc = $slurm->get_select_jobinfo($jobinfo, $data_type, $data)

Get data from a select job credential.

  • IN $jobinfo: select job credential to get data from. Opaque object.
  • IN $data_type: type of data to get.
  • TODO: enumerate data type and returned value.
  • OUT $data: the data got.
  • RET: error code.

$rc = $slurm->get_select_nodeinfo($nodeinfo, $data_type, $state, $data);

Get data from a select node credential.

  • IN $nodeinfo: select node credential to get data from.
  • IN $data_type: type of data to get.
  • TODO: enumerate data type and returned value.
  • IN $state: state of node query.
  • OUT $data: the data got.

SLURM PARTITION CONFIGURATION READ/PRINT/UPDATE FUNCTIONS

$resp = $slurm->load_partitions($update_time=0, $show_flags=0);

Issue RPC to get all SLURM partition configuration information if changed.

  • IN $update_time: time of current configuration data.
  • IN $show_flags: partitions filtering options.
  • RET: response hash reference with structure of "partition_info_msg_t".

$slurm->print_partition_info_msg($out, $part_info_msg, $one_liner=0);

Output information about all SLURM partitions based upon message as loaded using "load_partitions()".

  • IN $out: FILE handle to write to.
  • IN $part_info_msg: partitions information message, with structure of "partition_info_msg_t".
  • IN $one_liner: print as a single line if true.

$slurm->print_partition_info($out, $part_info, $one_liner=0);

Output information about a specific SLURM partition based upon message as loaded using "load_partitions()".

  • IN $out: FILE handle to write to.
  • IN $part_info: an individual partition information record, with structure of "partition_info_t".
  • IN $one_liner: print as a single ine if true.

$str = $slurm->sprint_partition_info($part_info, $one_liner=0);

Output information about a specific SLURM partition based upon message as loaded using "load_reservations()".

  • IN $part_info: an individual partition information record, with structure of "partition_info_t".
  • IN $one_liner: print as a single line if true.
  • RET: string containing formatted output. On failure "undef" is returned.

$rc = $slurm->create_partition($part_info);

Create a new partition, only usable by user root.

  • IN $part_info: description of partition configuration with structure of "update_part_msg_t".
  • RET: 0 on success, -1 on failure with errno set.

$rc = $slurm->update_partition($part_info);

Issue RPC to update a partition's configuration per request, only usable by user root.

  • IN $part_info: description of partition updates with structure of "update_part_msg_t".
  • RET: 0 on success, -1 on failure with errno set.

$rc = $slurm->delete_partition($part_info)

Issue RPC to delete a partition, only usable by user root.

  • IN $part_info: description of partition to delete, with structure of "delete_part_msg_t".
  • RET: 0 on success, -1 on failure with errno set.

SLURM RESERVATION CONFIGURATION READ/PRINT/UPDATE FUNCTIONS

$name = $slurm->create_reservation($resv_info);

Create a new reservation, only usable by user root.

  • IN $resv_info: description of reservation, with structure of "resv_desc_msg_t".
  • RET: name of reservation created. On failure "undef" is returned with errno set.

$rc = $slurm->update_reservation($resv_info);

Modify an existing reservation, only usable by user root.

  • IN $resv_info: description of reservation, with structure of "resv_desc_msg_t".
  • RET: error code.

$rc = $slurm->delete_reservation($resv_info);

Issue RPC to delete a reservation, only usable by user root.

  • IN $resv_info: description of reservation to delete, with structure of "reservation_name_msg_t".
  • RET: error code

$resp = $slurm->load_reservations($update_time=0);

Issue RPC to get all SLURM reservation configuration information if changed.

  • IN $update_time: time of current configuration data.
  • RET: response of reservation information, with structure of "reserve_info_msg_t". On failure "undef" is returned with errno set.

$slurm->print_reservation_info_msg($out, $resv_info_msg, $one_liner=0);

Output information about all SLURM reservations based upon message as loaded using "load_reservation()".

  • IN $out: FILE handle to write to.
  • IN $resv_info_msg: reservation information message, with structure of "reserve_info_msg_t".
  • IN $one_liner: print as a single line if true.

$slurm->print_reservation_info($out, $resv_info, $one_liner=0);

Output information about a specific SLURM reservation based upon message as loaded using "load_reservation()".

  • IN $out: FILE handle to write to.
  • IN $resv_info: an individual reservation information record, with structure of "reserve_info_t".
  • IN $one_liner: print as a single line if true.

$str = $slurm->sprint_reservation_info($resv_info, $one_liner=0);

Output information about a specific SLURM reservation based upon message as loaded using "load_reservations()".

  • IN $resv_info: an individual reservation information record, with structure of "reserve_info_t".
  • IN $one_liner: print as a single line if true.
  • RET: string containing formatted output. On failure "undef" is returned.

SLURM PING/RECONFIGURE/SHUTDOWN FUNCTIONS

$rc = $slurm->ping($primary);

Issue RPC to ping Slurm controller (slurmctld).

  • IN primary: 1 for primary controller, 2 for secondary controller.
  • RET: error code.

$rc = $slurm->reconfigure()

Issue RPC to have Slurm controller (slurmctld) reload its configuration file.

  • RET: error code.

$rc = $slurm->shutdown($options);

Issue RPC to have Slurm controller (slurmctld) cease operations, both the primary and backup controller are shutdown.

  • IN $options:
  • 0: all slurm daemons are shutdown.
  • 1: slurmctld generates a core file.
  • 2: only the slurmctld is shutdown (no core file).
  • RET: error code.

$rc = $slurm->takeover();

Issue RPC to have Slurm backup controller take over the primary controller. REQUEST_CONTROL is sent by the backup to the primary controller to take control.

  • RET: error code.

$rc = $slurm->set_debug_level($debug_level)

Issue RPC to set slurm controller debug level.

  • IN $debug_level: requested debug level.
  • RET: 0 on success, -1 on error with errno set.

$rc = $slurm->set_schedlog_level($schedlog_level);

Issue RPC to set slurm scheduler log level.

  • schedlog_level: requested scheduler log level.
  • RET: 0 on success, -1 on error with errno set.

SLURM JOB SUSPEND FUNCTIONS

$rc = $slurm->suspend($job_id);

Suspend execution of a job.

  • IN $job_id: job on which top perform operation.
  • RET: error code.

$rc = $slurm->resume($job_id);

Resume execution of a previously suspended job.

  • IN $job_id: job on which to perform operation.
  • RET: error code.

$rc = $slurm->requeue($job_id);

Re-queue a batch job, if already running then terminate it first.

  • IN $job_id: job on which to perform operation.
  • RET: error code.

SLURM JOB CHECKPOINT FUNCTIONS

$rc = $slurm->checkpoint_able($job_id, $step_id, $start_time);

Determine if the specified job step can presently be checkpointed.

  • IN $job_id: job on which to perform operation.
  • IN $step_id: job step on which to perform operation.
  • OUT $start_time: time at which checkpoint request was issued.
  • RET: 0 (can be checkpoined) or a slurm error code.

$rc = $slurm->checkpoint_disable($job_id, $step_id);

Disable checkpoint requests for some job step.

  • IN $job_id: job on which to perform operation.
  • IN $step_id: job step on which to perform operation.
  • RET: error code.

$rc = $slurm->checkpoint_enable($job_id, $step_id);

Enable checkpoint requests for some job step.

  • IN $job_id: job on which to perform operation.
  • IN $step_id: job step on which to perform operation.
  • RET: error code.

$rc = $slurm->checkpoint_create($job_id, $step_id, $max_wait, $image_dir);

Initiate a checkpoint requests for some job step. The job will continue execution after the checkpoint operation completes.

  • IN $job_id: job on which to perform operation.
  • IN $step_id: job step on which to perform operation.
  • IN $max_wait: maximum wait for operation to complete, in seconds.
  • IN $image_dir: directory to store image files.
  • RET: error code.

$rc = $slurm->checkpoint_vacate($job_id, $step_id, $max_wait, $image_dir);

Initiate a checkpoint requests for some job step. The job will terminate after the checkpoint operation completes.

  • IN $job_id: job on which to perform operation.
  • IN $step_id: job step on which to perform operation.
  • IN $max_wait: maximum wait for operation to complete, in seconds.
  • IN $image_dir: directory to store image files.
  • RET: error code.

$rc = $slurm->checkpoint_restart($job_id, $step_id, $stick, $image_dir)

Restart execution of a checkpointed job step.

  • IN $job_id: job on which to perform operation.
  • IN $step_id: job step on which to perform operation.
  • IN $stick: if true, stick to nodes previously running on.
  • IN $image_dir: directory to find checkpoint image files.
  • RET: error code.

$rc = $slurm->checkpoint_complete($job_id, $step_id, $begin_time, $error_code, $error_msg);

Note the completion of a job step's checkpoint operation.

  • IN $job_id: job on which to perform operation.
  • IN $step_id: job step on which to perform operation.
  • IN $begin_time: time at which checkpoint began.
  • IN $error_code: error code, highest value for all complete calls is preserved.
  • IN $error_msg: error message, preserved for highest error_code.
  • RET: error code.

checkpoint_task_complete($job_id, $step_id, $task_id, $begin_time, $error_code, $error_msg);

Note the completion of a task's checkpoint operation.

  • IN $job_id: job on which to perform operation.
  • IN $step_id: job step on which to perform operation.
  • IN $task_id: task which completed the operation.
  • IN $begin_time: time at which checkpoint began.
  • IN $error_code: error code, highest value for all complete calls is preserved.
  • IN $error_msg: error message, preserved for highest error_code.
  • RET: error code.

$rc = $slurm->checkpoint_error($job_id, $step_id, $error_code, $error_msg);

Gather error information for the last checkpoint operation for some job step.

  • IN $job_id: job on which to perform operation.
  • IN $step_id: job step on which to perform operation.
  • OUT $error_code: error number associated with the last checkpoint operation.
  • OUT $error_msg: error message associated with the last checkpoint operation.
  • RET: error code.

$rc = $slurm->checkpoint_tasks($job_id, $step_id, $image_dir, $max_wait, $nodelist);

Send checkoint request to tasks of specified job step.

  • IN $job_id: job on which to perform operation.
  • IN $step_id: job step on which to perform operation.
  • IN $image_dir: location to store checkpoint image files.
  • IN $max_wait: seconds to wait for the operation to complete.
  • IN $nodelist: nodes to send the request.
  • RET: 0 on success, non-zero on failure with errno set.

SLURM TRIGGER FUNCTIONS

$rc = $slurm->set_trigger($trigger_info);

Set an event trigger.

  • IN $trigger_info: hash reference of specification of trigger to create, with structure of "trigger_info_t".
  • RET: error code.

$rc = $slurm->clear_trigger($trigger_info);

Clear an existing event trigger.

  • IN $trigger_info: hash reference of specification of trigger to remove, with structure of "trigger_info_t".
  • RET: error code.

$resp = $slurm->get_triggers();

Get all event trigger information.

  • RET: hash reference with structure of "trigger_info_msg_t". On failure "undef" is returned with errno set.

JOB/NODE STATE TESTING FUNCTIONS

The following are functions to test job/node state, based on the macros defined in src/common/slurm_protocol_defs.h. The functions take a parameter of a hash reference of a job/node, and return a boolean value. For job, $job->{job_state} is tested. For node, $node->{node_state} is tested.

$cond = IS_JOB_PENDING($job);

$cond = IS_JOB_RUNNING($job);

$cond = IS_JOB_SUSPENDED($job);

$cond = IS_JOB_COMPLETE($job);

$cond = IS_JOB_CANCELLED($job);

$cond = IS_JOB_FAILED($job);

$cond = IS_JOB_TIMEOUT($job);

$cond = IS_JOB_NODE_FAILED($job);

$cond = IS_JOB_COMPLETING($job);

$cond = IS_JOB_CONFIGURING($job);

$cond = IS_JOB_STARTED($job);

$cond = IS_JOB_FINISHED($job);

$cond = IS_JOB_COMPLETED($job);

$cond = IS_JOB_RESIZING($job);

$cond = IS_NODE_UNKNOWN($node);

$cond = IS_NODE_DOWN($node);

$cond = IS_NODE_IDLE($node);

$cond = IS_NODE_ALLOCATED($node);

$cond = IS_NODE_ERROR($node);

$cond = IS_NODE_MIXED($node);

$cond = IS_NODE_FUTURE($node);

$cond = IS_NODE_DRAIN($node);

$cond = IS_NODE_DRAINING($node);

$cond = IS_NODE_DRAINED($node);

$cond = IS_NODE_COMPLETING($node);

$cond = IS_NODE_NO_RESPOND($node);

$cond = IS_NODE_POWER_SAVE($node);

$cond = IS_NODE_POWER_UP($node);

$cond = IS_NODE_FAIL($node);

$cond = IS_NODE_MAINT($node);

EXPORT

The job/node state testing functions are exported by default.

If ':constant' if specified, all constants are exported.

AUTHOR

This library is created by Hongjia Cao, <hjcao(AT)nudt.edu.cn> and Danny Auble, <da(AT)llnl.gov>. It is distributed with SLURM.

COPYRIGHT AND LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available.