Submitting batch jobs on the PC farm


The batch system that is used on the BaBar-UK PC farm is PBS (Portable Batch System). All the farms in each UK institution are set up in the same way, so this instructions can be used on any farm.

To see the list of available queues, type the following command from one of the servers (bfa or bfb):

 qstat -q 
If given from the bfa server, the answer one gets is the following:
server: bfa.pp.rhul.ac.uk
Queue            Memory CPU Time Walltime Node Run Que Lm  State
---------------- ------ -------- -------- ---- --- --- --  -----
express            --      --       --     --    0   0 --   E R
D                  --   30:00:00    --     --    0   0  2   D R
mc-simu            --      --       --     --    0   0  8   E R
bulk               --      --       --     --    0   0 --   E R
E                  --   30:00:00    --     --    0   0 20   E R
S                  --   08:00:00    --     --    0   0 --   E R
M                  --   30:00:00    --     --    0   0 --   E R
L                  --   168:00:0    --     --    0   0 --   E R
mc-reco            --      --       --     --    0   0  9   E R
mc-mixr            --      --       --     --    0   0  3   E R
The queues "mc-xxx" are used for MC production and it is not advisable to use them. The queues "express" and "bulk" are routing queues, so jobs sent to one of these queues will be routed to queue "E"(express), "S"(short), "M"(medium) or "L"(long), according to the job requirements specified in the qsub command (see below). The queue "express" routes the jobs to the queue "E", while "bulk" routes it to one of the others. It is always better to submit the jobs using the queue "bulk", that by default will route the job to the queue "M".

To submit a job, the command to use is qsub; there are two ways to submit a job: to write a script file with all needed comands and submit the whole file to the batch queue or open an interactive session on the batch machine that runs the selected queue and submit every single command interactively. A simple example of script file to submit a batch job to build the analysis-11 library is the following:

#!/bin/csh

echo 'Start execution of the script, cd to user directory'
cd /home/salvator/physics/test/anal11
pwd
echo 'Here there are the environment variables set fot his job:'
env
echo 'Make the libraries'
gmake lib ROPT=-noDebug
ls -la lib/Linux2
echo 'end job'
To submit this test script, type the following:
qsub -V -m e -M < user >@ppu1 -q bulk -l cput=08:00:00 -j oe -N test.lib -o lib.log test.batch 
Here's a brief explanation of the flags used in the qsub comand:
 -V  exports to the batch job all environment variables in the qsub command's 
     environment;
 -m e  the execution server will send a mail message when the job is finished;
 -M user1@host,user2@host...  specifies the list of users to which the mail is 
     sent. If unset, the mail is sent only to the job owner on the submitting 
     node (bfa or bfb);
 -q < queue@server >  selects the queue to which the job is sent. If @server 
     is not specified, one of the servers running the selected queue is 
     automatically choosen;
 -l < cput=hh:mm:ss >  specifies the cpu time required for the job. The job is
     then routed to the queue that meets the cput requirements; if this flag
     is not specified in the qsub command, the default queue is "M";
 -j oe  merges the standard output and standard error files in the std output
     file (you can merge the outputs in the std error file by doing  -j eo );
 -N < name >  declares a name for the job (job_name);
 -o < path >  defines the path to be used for the std output stream of
    the batch job. If this option is not specified, no log file is produced; 
    only the std error file is automatically produced and stored with the name 
    job_name.osequence_number in the directory from which the job has been 
    submitted (unless otherwise specified).
The log file for this example should look like this
The user's directives for a job can also be specified directly in the script file by using the #PBS command. One example is the following:
#!/bin/csh
####################################################
#         USER'S DIRECTIVES FOR THIS JOB           
#                                                  
#PBS -N test.lib                                   
#PBS -V                                            
#PBS -m e                                          
#PBS -M < user >@ppu1                          
#PBS -j oe                                         
#PBS -q bulk                                       
#PBS -l cput=100:00:00                               
#                                                  
####################################################
#
:
:
To submit the job in this case is enough to do the following:
qsub -o lib.log test.batch 
To submit the same job interactively, the following has to be typed:
qsub -V -m e -M < user >@ppu1 -q bulk -l cput=08:00:00 -j oe -N test.lib -o lib.log -I   
a new session on the batch machine running the selected queue is opened and the desired commands can be typed interactively.

The command qstat is used to check the status of a batch job; typing:

qstat -a 
the status of all jobs is displayed.

The command qcat is used to check the stdout or the stderr file while the batch job is running (it's the equivalent of bpeek on SUN). By typing

qcat -o < job_id > 
the stdout file is checked, while with qcat -e < job_id > one can check the stderr file (if thy're not forced to be all in one file by using the flag -j in the qsub command). Including the flag -f as well (qcat -f -o < job_id > ), the command waits and displays lines as they're added to the stdout (or stderr) file.


Return to the main page.

This page is maintained by Fabrizio Salvatore (salvator@smtp.pp.rhul.ac.uk).

Last Update: 17/10/2001 15:30 GMT