blast

Versions and Availability

h4

h5
Module Names for blast on qb3
Machine Version Module Name
qb3 2.10.1 blast/2.10.1/gcc-9.3.0

▶ Module FAQ?

The information here is applicable to LSU HPC and LONI systems.

h4

Shells

A user may choose between using /bin/bash and /bin/tcsh. Details about each shell follows.

/bin/bash

System resource file: /etc/profile

When one access the shell, the following user files are read in if they exist (in order):

  1. ~/.bash_profile (anything sent to STDOUT or STDERR will cause things like rsync to break)
  2. ~/.bashrc (interactive login only)
  3. ~/.profile

When a user logs out of an interactive session, the file ~/.bash_logout is executed if it exists.

The default value of the environmental variable, PATH, is set automatically using Modules. See below for more information.

/bin/tcsh

The file ~/.cshrc is used to customize the user's environment if his login shell is /bin/tcsh.

Modules

Modules is a utility which helps users manage the complex business of setting up their shell environment in the face of potentially conflicting application versions and libraries.

Default Setup

When a user logs in, the system looks for a file named .modules in their home directory. This file contains module commands to set up the initial shell environment.

Viewing Available Modules

The command

$ module avail

displays a list of all the modules available. The list will look something like:

--- some stuff deleted ---
velvet/1.2.10/INTEL-14.0.2
vmatch/2.2.2

---------------- /usr/local/packages/Modules/modulefiles/admin -----------------
EasyBuild/1.11.1       GCC/4.9.0              INTEL-140-MPICH/3.1.1
EasyBuild/1.13.0       INTEL/14.0.2           INTEL-140-MVAPICH2/2.0
--- some stuff deleted ---

The module names take the form appname/version/compiler, providing the application name, the version, and information about how it was compiled (if needed).

Managing Modules

Besides avail, there are other basic module commands to use for manipulating the environment. These include:

add/load mod1 mod2 ... modn . . . Add modules
rm/unload mod1 mod2 ... modn  . . Remove modules
switch/swap mod . . . . . . . . . Switch or swap one module for another
display/show  . . . . . . . . . . List modules loaded in the environment
avail . . . . . . . . . . . . . . List available module names
whatis mod1 mod2 ... modn . . . . Describe listed modules

The -h option to module will list all available commands.

About the Software

Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences. - Homepage: http://blast.ncbi.nlm.nih.gov/

Usage

A suite of tools are provided in the BLAST+, such as blastx, blastn, blastp. The command line below is provided by NCBI, which is a blastn search against a database. This command:

    blastn -db nt -qurey nt.fsa -out results.out
    

will run a blastn search of nt.fsa (a nucleotide sequence in FASTA format) against the nt database, printing results to the file results.out.

BLAST+ uses an environment variable $BLASTDB to point to the directory where the database is located in. So this environment variable should be specified before running blast commands, see example below:

Example: run blastx search in PBS batch job

This example shows a blastx search of trinity.fasta against the nr database, printing results to the file blastx.out. The nr database files are in the directory /work/ychen64/nr.

#!/bin/bash
#PBS -q workq
#PBS -l nodes=1:ppn=20
#PBS -l walltime=72:00:00
#PBS -A your_allocation_name
export BLASTDB=/work/ychen64/nr
blastx -query trinity.fasta -db nr -out blastx.out -num_threads 20
    

The option -num_threads is used to specify the number of threads (CPUs) in the BLAST search. This number should match the number in "ppn=" in the "#PBS -l nodes=1:ppn=" to fully utilize the CPU power.

Example: create local BLAST database in PBS batch job

The local BLAST database can be created by a perl script called "update_blastdb.pl" included as part of BLAST+. perl is required to run this script. In this example, a nr database is created at /work/ychen64

#!/bin/bash
#PBS -q workq
#PBS -l nodes=1:ppn=20
#PBS -l walltime=72:00:00
#PBS -A your_allocation_name
cd /work/ychen64
# Specify the database name here:
export DATABASE=nr

mkdir $DATABASE
cd $DATABASE
update_blastdb.pl --decompress --verbose $DATABASE
    

It will take a while to create a large local BLAST database, so update_blastdb.pl should be run with the interactive or batch job.

Once the local BLAST database is created, it needs to be updated in a timely manner (every couple of days/weeks months based on your database). Just run the script above again to update the database. The Documentation for the update_blastdb.pl script is available by running the script without any arguments.

Note:

  • Please don't search against the database on the NCBI BLAST server (i.e. using "-remote" option), especially for the large case. Search against local BLAST database only.
  • Please only use one compute node to run BLAST job. The parallel search technique provided by BLAST+ is based on the thread parallelism , so it cannot be used on the multiple nodes.
  • On the cluster with SLURM job scheduler (rather than PBS), use #SBATCH -c to specify the number of threads per process in the SLURM directives. Skip #SBATCH -n in the SLURM directives.

Resources

  • The BLAST Home Page provides links to protein and genomic data sets, as well as information on specific tools.

Last modified: September 10 2020 17:18:38.