Running Jobs on Improv
Quickstart
Presented below are fundamental commands essential for day-to-day use by most LCRC users on Improv. Comprehensive guides are available in other sections linked within our documentation.
Check your Current Allocation Balance(s):
sbank-list-allocations -p <project_name>
Check your Filesystem Quota(s):
lcrc-quota
Submit a Batch Job:
qsub -A <project> <your job script>
List All Jobs:
qstat
Delete a Job:
qdel <jobid>
Job Scheduling System
Improv's job scheduling system is characterized by:
- Uses PBS Pro
- Uses the
sbank
accounting system - Allocations are calculated in node hours
Queues
Improv currently enforces the following limits on publicly available queues:
- 15 Running Jobs per user.
- 100 Queued Jobs per user.
- 3 Days (72 Hours) Maximum Walltime.
- 1 Hour Default Walltime if not specified.
- compute (Standard Compute Nodes) is the default queue.
Use the -q
option with qsub
to select a queue.
Improv Queue Name | Description | Number of Nodes | CPU Type | Cores Per Node | Memory Per Node | Local Scratch Disk | Max Walltime |
---|---|---|---|---|---|---|---|
compute | Standard Compute Nodes | 805 | 2x AMD EPYC 7713 64-Core Processor | 128 | 256GB DDR4 | 960GB (6TB bigdata Nodes) | 72 Hours (3 Days) |
bigmem | Large Memory Compute Nodes | 12 | 2x AMD EPYC 7713 64-Core Processor | 128 | 1TB DDR4 | 6TB | 72 Hours (3 Days) |
debug | Reduced Walltime Compute Nodes | 8 | 2x AMD EPYC 7713 64-Core Processor | 128 | 256GB DDR4 | 960GB | 1 Hour |
The compute queue also has 68 nodes with a 6TB local NVMe scratch disk. You can request these directly by adding bigdata=true
to your PBS select statement. For example:
#PBS -l select=8:ncpus=128:mpiprocs=128:bigdata=true
Running MPI Applications
OpenMPI is the recommended MPI for use on Improv.
Here are some parameters that might be useful:
-np <processes>
: This specifies the number of processes to launch. It should match the number of processors or cores you've requested in your job script. If you plan to use all the cores on each node you are allocated, this is not necessary in a submission script. The #PBS -l select=<# of nodes>:ncpus=128:mpiprocs=128
script header takes care of this.
--display-map
: This displays a map showing where MPI processes are running.
--report-bindings
: This reports how MPI processes are bound to cores or sockets.
For the most accurate and detailed information, please refer to the OpenMPI documentation.
#!/bin/bash -l
# UG Section 2.5, page UG-24 Job Submission Options
# Note: Command line switches will override values in this file.
# ----------------- PBS Directives ----------------- #
# These options are mandatory in LCRC; qsub will fail without them.
# select: number of nodes. Adjust these values as per your job's requirement. In the below example, 4 nodes are requested.
# ncpus: number of cores per node. (use 128 unless you have a reason otherwise)
# mpiprocs: number of MPI processes (use the same value as ncpus unless you have a reason otherwise ).
# walltime: a limit on the total time from the start to the completion of a job
#PBS -A <project_name>
#PBS -l select=4:ncpus=128:mpiprocs=128
#PBS -l walltime=HH:MM:SS
# Queue for the job submission
#PBS -q <queue>
# Job name (first 15 characters are displayed in qstat)
#PBS -N <job_name>
# Output options: 'oe' to merge stdout/stderr into stdout, 'eo' for stderr, 'n' to not merge.
#PBS -j n
# Email notifications: 'b' at begin, 'e' at end, 'a' on abort. Remove 'n' for no emails.
#PBS -m be
#PBS -M <your_email_address>
# --------------- Script Execution Part --------------- #
# The following is an example setup for an MPI job.
echo "Working directory: $PBS_O_WORKDIR"
cd $PBS_O_WORKDIR
echo "Job ID: $PBS_JOBID"
echo "Running on host: $(hostname)"
echo "Running on nodes: $(cat $PBS_NODEFILE)"
# Loading GCC and MPI modules, and verifying the environment.
module load gcc openmpi
module list
which mpirun
# Run the MPI program
mpirun ./hello_mpi
Important things to note:
- PBS Pro on Improv is currently not configured to allow sharing nodes. When have a node allocated to you, you receive the ENTIRE node. Ensure that you use all of your allocated nodes' resources unless you have a reason not to.
- More information about running jobs on PBS can be found on our Running Jobs on PBS Clusters page.