gnuparallel

About

GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input and pipe it into commands in parallel.

Versions and Availability

Softenv Keys for gnuparallel on supermike2
Machine Version Softenv Key
supermike2 20161022 +gnuparallel-20161022-gcc-4.4.6
▶ Softenv FAQ?

The information here is applicable to LSU HPC and LONI systems.

Softenv

SoftEnv is a utility that is supposed to help users manage complex user environments with potentially conflicting application versions and libraries.

System Default Path

When a user logs in, the system /etc/profile or /etc/csh.cshrc (depending on login shell, and mirrored from csm:/cfmroot/etc/profile) calls /usr/local/packages/softenv-1.6.2/bin/use.softenv.sh to set up the default path via the SoftEnv database.

SoftEnv looks for a user's ~/.soft file and updates the variables and paths accordingly.

Viewing Available Packages

Using the softenv command, a user may view the list of available packages. Currently, it can not be ensured that the packages shown are actually available or working on the particular machine. Every attempt is made to present an identical environment on all of the LONI clusters, but sometimes this is not the case.

Example,

$ softenv
These are the macros available:
*   @default
These are the keywords explicitly available:
+amber-8                       Applications: 'Amber', version: 8 Amber is a
+apache-ant-1.6.5              Ant, Java based XML make system version: 1.6.
+charm-5.9                     Applications: 'Charm++', version: 5.9 Charm++
+default                       this is the default environment...nukes /etc/
+essl-4.2                      Libraries: 'ESSL', version: 4.2 ESSL is a sta
+gaussian-03                   Applications: 'Gaussian', version: 03 Gaussia
....
Listing of Available Packages

See Packages Available via SoftEnv on LSU HPC and LONI.

For a more accurate, up to date list, use the softenv command.

Caveats

Currently there are some caveats to using this tool.

  1. packages might be out of sync between what is listed and what is actually available
  2. resoft and soft utilities are not; to update the environment for now, log out and login after modifying the ~/.soft file.
Availability

softenv is available on all LSU HPC and LONI clusters to all users in both interactive login sessions (i.e., just logging into the machine) and the batch environment created by the PBS job scheduler on Linux clusters and by loadleveler on AIX clusters..

Packages Availability

This information can be viewed using the softenv command:

% softenv
Managing Environment with SoftEnv

The file ~/.soft in the user's home directory is where the different packages are managed. Add the +keyword into your .soft file. For instance, ff one wants to add the Amber Molecular Dynamics package into their environment, the end of the .soft file should look like this:

+amber-8

@default

To update the environment after modifying this file, one simply uses the resoft command:

% resoft

Usage

Parallel typical serial and MPI-based applications.

(1) Parallel serial jobs

Example of a blast job on Mike:

#!/bin/bash

#PBS -A hpc_smictest3
#PBS -l nodes=2:ppn=16
#PBS -l walltime=1:00:00
#PBS -q workq

cd $PBS_O_WORKDIR
export JOBS_PER_NODE=16
export WDIR=$PBS_O_WORKDIR

parallel --progress \				# shows progres
         --joblog logfile \			# job logfile
         -j $JOBS_PER_NODE \		# jobs per node
         --slf $PBS_NODEFILE \			# nodes assigned to your job
         --workdir $WDIR \			
         ./cmd_blast.sh {} {/.} :::: input.lst  #script_to_parallize input output joblist

where: input.lst contains job input list:

/work/$USER/blast/data/input1.faa
/work/$USER/blast/data/input2.faa
....
/work/$USER/blast/data/input200.faa

where: cmd_blast.sh is the script for running a serial blast job

e.g.: ./cmd_blast.sh input1.faa input1 -- how to run single serial job

#!/bin/bash

export WDIR=/xxx/xxx
cd $WDIR
blastp -query $1 -db db/img_v400_PROT.00 -out output/$2.out -outfmt 7 -max_target_seqs 100 -num_threads 2
	

(2) Parallel MPI jobs

Use "mpirun" to run a laplace

#!/bin/bash

#PBS -A your_allocation_name
#PBS -l walltime=2:00:00
#PBS -l nodes=4:ppn=16
#PBS -q checkpt

export JOBS_PER_NODE=8
export NPROCS=2
export WDIR=$PBS_O_WORKDIR
cd $WDIR
parallel --progress \
         -j $JOBS_PER_NODE \
         --slf $PBS_NODEFILE \
         --workdir $WDIR \
         ./cmd_mpi.sh {} $NPROCS :::: input.lst
	

where: cmd_mpi.sh is the script to run one MPI job

#!/bin/bash

export WDIR=$PBS_O_WORKDIR
FILE=$(eval echo $1)
param=`cat ${FILE}`
mpirun -ppn $2 $WDIR/lap_mpi $param

where: input.lst contains job input list:

/work/$USER/laplace/data/input1
/work/$USER/laplace/data/input2
....
/work/$USER/laplace/data/input200

cat input1:
4096 4096 2 2 0.08 20000 0 0

Resources

Last modified: January 30 2017 12:37:10.