Virus Mutational Stability Pipeline

Bioinformatic pipeline diagram

Download:

Open Linux Terminal, then download the last pipeline archive, e.g. virmut.2.1.tar.gz, and unpack

wget http://virmut.eimb.ru/virmut.2.1.tar.gz

tar -xvzf virmut.2.1.tar.gz

cd virmut.2.1/install

Ubuntu 16.04 LTS 64bit. Works under all Ubuntu 16.04 derivatives: Kubuntu 16.04, Lubuntu 16.04, Xubuntu 16.04, Linux Mint 18.2. It should work at Debian 9.2 and Astra Linux
CentOS 7.3 64bit
OpenSUSE Leap 42.3 64bit. It should work at OpenSUSE 42.xx and 15
Fedora 26 Server 64bit. It should work at Fedora 19-25 and RedHat 7 64bit

The full description of the statistical criteria, their implementations and pipeline itself can be found in the following article.

This work is supported by RSCF grant no 15-14-00005

Software required for pipeline

cutadapt: - Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
Bowtie 2: - Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.
SAMtools: - Samtools is a suite of programs for interacting with high-throughput sequencing data.
Seqtk Toolkit: - Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format
FASTA local sequence alignment program: - The FASTA programs find regions of local or global similarity between DNA sequences. It provides information on the statistical significance of an alignment. FASTA can be used to infer functional and evolutionary relationships between sequences.
MAFFT multiple sequence program: - MAFFT is a multiple sequence alignment program. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of <200 sequences), FFT-NS-2 (fast; for alignment of <30,000 sequences), etc.
Gnuplot command-line graphing utility: - Gnuplot is a portable command-line driven graphing utility for Linux, OS/2, MS Windows, OSX, VMS, and many other platforms.
Required Perl modules:: - BioPerl modules Bio::SeqIO and Bio::SearchIO; - Statistics::Descriptive; - BioUtil::Seq; - Getopt::Long; - Config::Tiny; - Sys::CpuAffinity

Optional software

FastQC: - FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines.
Unipro UGENE toolkit: - Unipro UGENE is a unified bioinformatics toolkit.
WebLogo: A sequence logo generator: - A sequence logo is a graphical representation of nucleic acid multiple sequence alignment. Each logo consists of stacks of symbols, one stack for each position in the sequence. The overall height of the stack indicates the sequence conservation at that position, while the height of symbols within the stack indicates the relative frequency of each nucleic acid at that position. In general, a sequence logo provides a richer and more precise description of, for example, a binding site, than would a consensus sequence.