gaussian

Versions and Availability

▶ Display Module Names for gaussian on all clusters.

Machine Version Module
qb3 g09-d01 gaussian/g09-d01
qb3 g16-b01 gaussian/g16-b01
qb3 g16-c01 gaussian/g16-c01
qb2 g09-d01(default) gaussian/g09-d01(default)
qb2 g16-a03 gaussian/g16-a03
qb2 g16-b01 gaussian/g16-b01
qb2 g16-c01 gaussian/g16-c01
smic g09-d01 gaussian/g09-d01
smic g16-c01 gaussian/g16-c01
supermike3 g09-d01 gaussian/g09-d01
supermike3 g16-c01 gaussian/g16-c01

▶ Module FAQ?

The information here is applicable to LSU HPC and LONI systems.

h4

Shells

A user may choose between using /bin/bash and /bin/tcsh. Details about each shell follows.

/bin/bash

System resource file: /etc/profile

When one access the shell, the following user files are read in if they exist (in order):

  1. ~/.bash_profile (anything sent to STDOUT or STDERR will cause things like rsync to break)
  2. ~/.bashrc (interactive login only)
  3. ~/.profile

When a user logs out of an interactive session, the file ~/.bash_logout is executed if it exists.

The default value of the environmental variable, PATH, is set automatically using Modules. See below for more information.

/bin/tcsh

The file ~/.cshrc is used to customize the user's environment if his login shell is /bin/tcsh.

Modules

Modules is a utility which helps users manage the complex business of setting up their shell environment in the face of potentially conflicting application versions and libraries.

Default Setup

When a user logs in, the system looks for a file named .modules in their home directory. This file contains module commands to set up the initial shell environment.

Viewing Available Modules

The command

$ module avail

displays a list of all the modules available. The list will look something like:

--- some stuff deleted ---
velvet/1.2.10/INTEL-14.0.2
vmatch/2.2.2

---------------- /usr/local/packages/Modules/modulefiles/admin -----------------
EasyBuild/1.11.1       GCC/4.9.0              INTEL-140-MPICH/3.1.1
EasyBuild/1.13.0       INTEL/14.0.2           INTEL-140-MVAPICH2/2.0
--- some stuff deleted ---

The module names take the form appname/version/compiler, providing the application name, the version, and information about how it was compiled (if needed).

Managing Modules

Besides avail, there are other basic module commands to use for manipulating the environment. These include:

add/load mod1 mod2 ... modn . . . Add modules
rm/unload mod1 mod2 ... modn  . . Remove modules
switch/swap mod . . . . . . . . . Switch or swap one module for another
display/show  . . . . . . . . . . List modules loaded in the environment
avail . . . . . . . . . . . . . . List available module names
whatis mod1 mod2 ... modn . . . . Describe listed modules

The -h option to module will list all available commands.

▶ Did not find the version you want to use??

If a software package you would like to use for your research is not available on a cluster, you can request it to be installed. The software requests are evaluated by the HPC staff on a case-by-case basis. Before you send in a software request, please go through the information below.

h3

Types of request

Depending on how many users need to use the software, software requests are divided into three types, each of which corresponds to the location where the software is installed:

  • The user's home directory
    • Software packages installed here will be accessible only to the user.
    • It is suitable for software packages that will be used by a single user.
    • Python, Perl and R modules should be installed here.
  • /project
    • Software packages installed in /project can be accessed by a group of users.
    • It is suitable for software packages that
      • Need to be shared by users from the same research group, or
      • are bigger than the quota on the home file syste.
    • This type of request must be sent by the PI of the research group, who may be asked to apply for a storage allocation.
  • /usr/local/packages
    • Software packages installed under /usr/local/packages can be accessed by all users.
    • It is suitable for software packages that will be used by users from multiple research groups.
    • This type of request must be sent by the PI of a research group.

h3

How to request

Please send an email to sys-help@loni.org with the following information:

  • Your user name
  • The name of cluster where you want to use the requested software
  • The name, version and download link of the software
  • Specific installation instructions if any (e.g. compiler flags, variants and flavor, etc.)
  • Why the software is needed
  • Where the software should be installed (locally, /project, or /usr/local/packages) and justification explaining how many users are expected.

Please note that, once the software is installed, testing and validation are users' responsibility.

About the Software


Usage

Gaussian is run from the command line, and does not provide a graphical interface. Thus interactive and batch job usage is the same. The TCP Linda extension is required run Gaussian in parallel using more than 1 node per job. Currently only LSU has such a license.

Please refer to the FAQ on Common Problems below or the Gaussian User Manual for Memory Requirements for the your gaussian job.

An input file is used to specify the desired calculations. It may be as simple as:

%chk=water.chk

# HF/6-31G(d)

water energy              Title section

0   1
O  -0.464   0.177   0.0
H  -0.464   1.137   0.0
H   0.441  -0.143   0.0

Please refer to the program documentation for details.

One an input file has been created, the next step is creating a PBS or SLURM job file.

▶ QSub FAQ?

Portable Batch System: qsub

qsub

All HPC@LSU clusters use the Portable Batch System (PBS) for production processing. Jobs are submitted to PBS using the qsub command. A PBS job file is basically a shell script which also contains directives for PBS.

Usage
$ qsub job_script

Where job_script is the name of the file containing the script.

PBS Directives

PBS directives take the form:

#PBS -X value

Where X is one of many single letter options, and value is the desired setting. All PBS directives must appear before any active shell statement.

Example Job Script
 #!/bin/bash
 #
 # Use "workq" as the job queue, and specify the allocation code.
 #
 #PBS -q workq
 #PBS -A your_allocation_code
 # 
 # Assuming you want to run 16 processes, and each node supports 4 processes, 
 # you need to ask for a total of 4 nodes. The number of processes per node 
 # will vary from machine to machine, so double-check that your have the right 
 # values before submitting the job.
 #
 #PBS -l nodes=4:ppn=4
 # 
 # Set the maximum wall-clock time. In this case, 10 minutes.
 #
 #PBS -l walltime=00:10:00
 # 
 # Specify the name of a file which will receive all standard output,
 # and merge standard error with standard output.
 #
 #PBS -o /scratch/myName/parallel/output
 #PBS -j oe
 # 
 # Give the job a name so it can be easily tracked with qstat.
 #
 #PBS -N MyParJob
 #
 # That is it for PBS instructions. The rest of the file is a shell script.
 # 
 # PLEASE ADOPT THE EXECUTION SCHEME USED HERE IN YOUR OWN PBS SCRIPTS:
 #
 #   1. Copy the necessary files from your home directory to your scratch directory.
 #   2. Execute in your scratch directory.
 #   3. Copy any necessary files back to your home directory.

 # Let's mark the time things get started.

 date

 # Set some handy environment variables.

 export HOME_DIR=/home/$USER/parallel
 export WORK_DIR=/scratch/myName/parallel
 
 # Set a variable that will be used to tell MPI how many processes will be run.
 # This makes sure MPI gets the same information provided to PBS above.

 export NPROCS=`wc -l $PBS_NODEFILE |gawk '//{print $1}'`

 # Copy the files, jump to WORK_DIR, and execute! The program is named "hydro".

 cp $HOME_DIR/hydro $WORK_DIR
 cd $WORK_DIR
 mpirun -machinefile $PBS_NODEFILE -np $NPROCS $WORK_DIR/hydro

 # Mark the time processing ends.

 date
 
 # And we're out'a here!

 exit 0

An example PBS batch job file follows:

 #!/bin/tcsh
 #PBS -A your_allocation
 # specify the allocation. Change it to your allocation
 #PBS -q checkpt
 # the queue to be used.
 #PBS -l nodes=1:ppn=4
 # Number of nodes and processors
 #PBS -l walltime=1:00:00
 # requested Wall-clock time.
 #PBS -o g09_output
 # name of the standard out file to be "g09_output".
 #PBS -j oe
 # standard error output merge to the standard output file.
 #PBS -N g09test
 # name of the job (that will appear on executing the qstat command).
 #
 # cd to the directory with Your input file
 cd ~USER/g09test
 #
 # Change this line to reflect your input file and output file
 g09 water.inp

An example SLURM batch job file follows:

 #!/bin/bash
 #SBATCH -A loni_loniadmin1
 # specify the allocation. Change it to your allocation
 #SBATCH -p checkpt
 # the queue to be used.
 #SBATCH -N 1
 #SBATCH -n 48
 # Number of nodes and processors
 #SBATCH -t 2:00:00
 # requested Wall-clock time.
 #SBATCH -o slurm-%x-%j.out-%N
 # name of the standard out file to be "slurm-g16-100460.out-qbc185".
 #SBATCH -J g16_job
 # name of the job (that will appear on executing the squeue command).
 #
 # cd to the directory with Your input file
 cd $SLURM_SUBMIT_DIR
 #
 # Change this line to reflect your input file and output file
 g16 myinput.com

Multi-node Job Submission

Below is an example PBS job script for running gaussian/g16-b01 on QB2

(Ref: https://github.com/ResearchComputing/Documentation/wiki/Gaussian#parallel-jobs)

#!/bin/bash
#PBS -N g16job
#PBS -l nodes=2:ppn=20
#PBS -l walltime=01:00:00
#PBS -A your_allocation

module load gaussian/g16-b01

for n in $(cat $PBS_NODEFILE | uniq);
do
    echo ${n}
done | paste -s -d, > nodes.$PBS_JOBID

# the next line prevents OpenMP parallelism from conflicting with Gaussian's internal parallelization
export OMP_NUM_THREADS=1

# increases the verbosity of Linda output messages
export GAUSS_LFLAGS="-v"

cd $PBS_O_WORKDIR
date
g16 -p=20 -w=$(cat nodes.$PBS_JOBID) myinput
date

Below is an example SLURM job script for running gaussian/g16-b01 on QB3

(Ref: https://curc.readthedocs.io/en/latest/software/gaussian.html)

#!/bin/bash
#SBATCH -p checkpt
#SBATCH -N 2
#SBATCH -n 48
#SBATCH -c 1
#SBATCH -t 2:00:00
#SBATCH -A loni_loniadmin1
#SBATCH -J g16
#SBATCH -o slurm-%x-%j.out-%N

module load gaussian/g16-b01

cd $SLURM_SUBMIT_DIR

for n in `scontrol show hostname | sort -u`; do
 echo ${n}
done | paste -s -d, > nodes.$SLURM_JOBID

# the next line prevents OpenMP parallelism from conflicting with Gaussian's internal parallelization
export OMP_NUM_THREADS=1

# increases the verbosity of Linda output messages
export GAUSS_LFLAGS="-v"

g16 -p=48 -w=$(cat nodes.$SLURM_JOBID)  myinput.com

#End-of-file (EOF)

Contents of myinput(or myinput.com):

#P b3lyp/6-31g* test stable=(opt,qconly)

Gaussian Test Job 135:
Fe=O perpendicular to ethene, in triplet state.

0 3
X
Fe X  RXFe
C1 X  RXC  Fe  90.
C2 X  RXC  Fe  90.  C1  180.
O  X  RXO  C1  90.  Fe 0.
H1 C1 RCH  C2 CCH   Fe  Angle1
H2 C1 RCH  C2 CCH   Fe -Angle1
H3 C2 RCH  C1 CCH   Fe  Angle2
H4 C2 RCH  C1 CCH   Fe -Angle2

RXFe  1.7118
RXC   0.7560
RXO   3.1306
RCH   1.1000
Angle1 110.54
Angle2 110.53
CCH   117.81

Below are examples for using g09

TCP Linda is required to run gaussian jobs on more than one node

#!/bin/bash
#PBS -q checkpt
#PBS -l nodes=2:ppn=16
#PBS -l cput=00:20:00
#PBS -l walltime=00:20:00
#PBS -o output-file
#PBS -j oe
#PBS -V
#PBS -N jobtest

export WORK_DIR=$PBS_O_WORKDIR
cd $WORK_DIR

cat $PBS_NODEFILE | sort | uniq > /tmp/.nodes.$PBS_JOBID
export GAUSS_LFLAGS="-nodefile /tmp/.nodes.$PBS_JOBID"

g09 < input.inp > output.log

In the sample input above, INPUT.inp, the Link 0 directives should be

%UseSSH
%chk=/work/mmcken6/g09.chk
%mem=16mw
%nprocshared=16  Note: must match ppn in the job submission script
%NprocLinda=2    Note: must match nodes in the job submission script
(rest of input)
...

Note that the "%UseSSH" directive is necessary for Linda jobs, which may fail otherwise. Alternatively, you can add the "-opt Tsnet.Node.lindarsharg: ssh" flag to the g09 command, which has the same effect for Linda jobs.

Resources

▶ Common Problems FAQ?

Gaussian Common Problems

There are a few common Gaussian problems that can be easily resolved. These issues usually stem from disk or memory space limitations.

Memory Requirements

%Mem=N sets the amount of dynamic memory used to N 8-byte words (default); this value may also be followed by KB,MB,GB,KW,MW or GW (without intervening spaces) to specify units of kilo-, mega- or giga- bytes or words. The default memory size is 256 MB.

All LONI clusters and LSU HPC Tezpur cluster has only 4GB RAM per node. For running jobs on these clusters, the value of N should not be greater than 3500MB or 450MW.

LSU HPC clusters such as Philip, Pandora and SuperMike II have 24/48/96, 128 and 32 GB RAM per node respectively. The maximum value of N should be 120GB or 15GW on Pandora, 28GB or 3500MW on SuperMike II and 20/40/90GB on Philip (depending on queue).

If you use a value of N greater than these value, your job will use virtual memory making not only the job to run slower but also cause excessive swapping of memory which can bring down the node. If your jobs repeatedly use more memory than that available on the node and/or bring down the compute node, your privileges of using the cluster will be suspended.

LSU HPC users have access to TCP Linda to run gaussian jobs on multiple nodes. Note that the %Mem=N sets the amount of dynamic memory per node and not total memory for the job, so the maximum value of N should be the same as described above.

You can estimate the amount of memory in 8-byte words that your job will require using the formula

N = M + 2(NB)2

where where NB is the number of basis functions used in the calculation, and M is a minimum value that is usually generously covered by the default memory size.

Please refer to Gaussian manual regarding Link 0 Commands and Efficiency Considerations for more details.

Scaling

First one needs to understand the basic run-time needs of Gaussian calculations. The table below is the Formal Scaling Behavior of Gaussian, in which N = the number of basis functions. Use this table to determine how much work will be required, compared to current selections, if N is increased (e.g. if the behavior is N4, doubling N would result in 16 times more work).

Scaling Behavior Method(s)
N4 HF
N5 MP2
N6 MP3, CISD, CCSD, QCISD
N7 MP4, CCSD(T), QCISD(T)
N8 MP5, CISDT, CCSDT
N9 MP6
N10 MP7, CISDTQ, CCSDTQ
Large files and memory usage

Computational cost and demand increases quickly when trying to obtain accuracies better than the MP2 level. On the other hand, one can supply a large molecule at a lower level of theory and still come across the same disk/memory errors.

If one has a large model and needs a good electron correlation method, starting this calculation from an initial guess wave function will likely cause it to fail instantly. A typical route in achieving such accuracies with a large model begins with a good initial guess of the wave function at a lower level of theory. In this method, one uses the orbital coefficients from the lower level of theory calculations, projects them onto a larger basis set, and uses that as an initial guess for the high level of theory. Every chemical model is different; care and caution needs to be taken at each step, perhaps even repeat the calculation using a different set of inputs to see if it converges properly.

For instance, if one would like to run a large model at the MP2/6-311G** level of theory.

  1. Optimize wave function at the HF/3-21G
  2. Re-optimize at the MP2/6-31G*
  3. Re-optimize at MP2/6-311G**

When restarting the calculation the following Guess and SCF options are important

Guess=Read
Reads the initial guess from the checkpoint file. If the basis set specified is different from the basis set used in the job which generated the checkpoint file, then the wave function will be projected from one basis to the other. This is an efficient way to switch from one basis to another.
Geom=AllCheckpoint
Reads the molecular geometry, charge, multiplicity and title from the checkpoint file. This is often used to start a second calculation at a different level of theory.
SCF=Restart
Enable use of checkpoint file.
Break up restart files

Sometimes when writing a large restart file, Gaussian will crash complaining about shared memory is too small, or not enough memory. This is caused by reading/writing too much information at one time. One can break up how it writes its read-write restart file (*.rwf) by:

%rwf=/work/username/tmp1,2GB,/work/username/tmp2,2GB,/work/username/tmp3,2GB

If the last file doesn't have a number, then the rest of the rwf is written to that file.

One problem, two solutions

If one experiences two different solutions to the same problem- either same calculation on two different machines or same calculation run at different times on the same machine - one is likely using an incorrect restart file. Check your output calculations - namely the NOrb value (a different number of orbitals will likely produce a different energy result).

Refer to the Gaussian manual for more information on memory and disk space usage.

Warnings not to be ignored
Warning!!: The largest alpha MO coefficient is

This warning is usually associated with post-HF calculation (MP2 or CC). Although, this is not an error will and will not cause your job to crash, it is an important warning. It warns on the accuracy of your calculation. This occurs when one has a near-linear dependencies in the basis sets. For instance, diffuse functions on two close atoms are likely linearly dependent. When transforming to molecular orbitals, the atomic orbital integrals are multiplied by all the molecular orbital coefficients. The accuracy of the molecular orbital will decrease since one or more atomic orbitals are very large.

Last modified: September 17 2021 12:09:19.