Bioinformatic pipeline diagram

Bioinformatic pipeline

Open Linux Terminal, then download the last pipeline archive, e.g. virmut.2.1.tar.gz, and unpack

wget http://virmut.eimb.ru/virmut.2.1.tar.gz
tar -xvzf virmut.2.1.tar.gz
cd virmut.2.1/install

Installation and use manual can be accessed via full manual or install.txt and readme.txt file in virmut.2.1 directory.

The full description of the statistical criteria, their implementations and pipeline itself can be found in the following article.

Kravatsky YV, Chechetkin VR, Fedoseeva DM, Gorbacheva MA, Kravatskaya GI, Kretova OV, Tchurikov NA.
A bioinformatic pipeline for monitoring of the mutational stability of viral drug targets with deep-sequencing technology.
Viruses 2017, 9(12), 357,
DOI: 10.3390/v9120357, PMID: 29168754

This work is supported by RSCF grant no 15-14-00005

© 2017 Yuri Kravatsky, Creative Commons CC BY-NC-SA 3.0 license.

Software required for pipeline

cutadapt
- Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
Bowtie 2
- Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.
SAMtools
- Samtools is a suite of programs for interacting with high-throughput sequencing data.
Seqtk Toolkit
- Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format
FASTA local sequence alignment program
- The FASTA programs find regions of local or global similarity between DNA sequences. It provides information on the statistical significance of an alignment. FASTA can be used to infer functional and evolutionary relationships between sequences.
MAFFT multiple sequence program
- MAFFT is a multiple sequence alignment program. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of <200 sequences), FFT-NS-2 (fast; for alignment of <30,000 sequences), etc.
Gnuplot command-line graphing utility
- Gnuplot is a portable command-line driven graphing utility for Linux, OS/2, MS Windows, OSX, VMS, and many other platforms.
Required Perl modules:
- BioPerl modules Bio::SeqIO and Bio::SearchIO
- Statistics::Descriptive
- BioUtil::Seq
- Getopt::Long
- Config::Tiny
- Sys::CpuAffinity

Optional software

FastQC
- FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines.
Unipro UGENE toolkit
- Unipro UGENE is a unified bioinformatics toolkit.
WebLogo: A sequence logo generator
- A sequence logo is a graphical representation of nucleic acid multiple sequence alignment. Each logo consists of stacks of symbols, one stack for each position in the sequence. The overall height of the stack indicates the sequence conservation at that position, while the height of symbols within the stack indicates the relative frequency of each nucleic acid at that position. In general, a sequence logo provides a richer and more precise description of, for example, a binding site, than would a consensus sequence.