Bioinformatic pipeline diagram
Open Linux Terminal, then download the last pipeline archive, e.g. virmut.2.1.tar.gz, and unpack
wget http://virmut.eimb.ru/virmut.2.1.tar.gz
tar -xvzf virmut.2.1.tar.gz
cd virmut.2.1/install
VirMut 2.1 has installation scripts for all prerequisites that are tested at following Linux distributions:
- Ubuntu 16.04 LTS 64bit. Works under all Ubuntu 16.04 derivatives: Kubuntu 16.04, Lubuntu 16.04, Xubuntu 16.04, Linux Mint 18.2.
It should work at Debian 9.2 and Astra Linux
- CentOS 7.3 64bit
- OpenSUSE Leap 42.3 64bit. It should work at OpenSUSE 42.xx and 15
- Fedora 26 Server 64bit. It should work at Fedora 19-25 and RedHat 7 64bit
Installation and use manual can be accessed via full manual or install.txt and readme.txt file in virmut.2.1 directory.
The full description of the statistical criteria, their implementations and pipeline itself can be found in the following article.
Kravatsky YV, Chechetkin VR, Fedoseeva DM, Gorbacheva MA, Kravatskaya GI, Kretova OV, Tchurikov NA.
A bioinformatic pipeline for monitoring of the mutational stability of viral drug targets with deep-sequencing technology.
Viruses 2017, 9(12), 357, DOI: 10.3390/v9120357, PMID: 29168754
This work is supported by RSCF grant no 15-14-00005
© 2017 Yuri Kravatsky, Creative Commons CC BY-NC-SA 3.0 license.
Software required for pipeline
- cutadapt
- - Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
- Bowtie 2
- - Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.
- SAMtools
- - Samtools is a suite of programs for interacting with high-throughput sequencing data.
- Seqtk Toolkit
- - Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format
- FASTA local sequence alignment program
- - The FASTA programs find regions of local or global similarity between DNA sequences. It provides information on the statistical significance of an alignment. FASTA can be used to infer functional and evolutionary relationships between sequences.
- MAFFT multiple sequence program
- - MAFFT is a multiple sequence alignment program. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of <200 sequences), FFT-NS-2 (fast; for alignment of <30,000 sequences), etc.
- Gnuplot command-line graphing utility
- - Gnuplot is a portable command-line driven graphing utility for Linux, OS/2, MS Windows, OSX, VMS, and many other platforms.
- Required Perl modules:
- - BioPerl modules Bio::SeqIO and Bio::SearchIO
- - Statistics::Descriptive
- - BioUtil::Seq
- - Getopt::Long
- - Config::Tiny
- - Sys::CpuAffinity
Optional software
- FastQC
- - FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines.
- Unipro UGENE toolkit
- - Unipro UGENE is a unified bioinformatics toolkit.
- WebLogo: A sequence logo generator
- - A sequence logo is a graphical representation of nucleic acid multiple sequence alignment. Each logo consists of stacks of symbols, one stack for each position in the sequence. The overall height of the stack indicates the sequence conservation at that position, while the height of symbols within the stack indicates the relative frequency of each nucleic acid at that position. In general, a sequence logo provides a richer and more precise description of, for example, a binding site, than would a consensus sequence.