msaexplorer
What is MSAexplorer?
MSAexplorer allows the analysis and straight forward plotting of multiple sequence alignments . It's focus is to act as simple python3 extension or shiny app with minimal dependencies and syntax. It's easy to setup and highly customizable.
Usage as a shiny application
The current version of the app is deployed to github pages. This application is serverless and all computation runs through your browser. There is no need to install anything. Just enjoy the app!
However, you can also deploy it yourself and host it however you like!
git clone https://github.com/jonas-fuchs/MSAexplorer
cd MSAexplorer
pip install -r requirements.txt # installs all dependencies
shiny run shinyapp/app.py
Now just follow the link provided in your terminal.
Usage as a python3 package
Installation
Some simple steps are needed at the moment but in the future you will be able to install MSAexplorer via pip install msaexplorer
.
git clone https://github.com/jonas-fuchs/MSAexplorer
cd MSAexplorer
pip install .
Now you are able to use MSAexplorer like any package that you would install via pip
.
Analysis
The explore
module lets you load an alignment file and analyze it.
'''
a small example on how to use the MSAexplorer package
'''
from msaexplorer import explore
# load MSA
msa = explore.MSA('example_alignments/DNA.fasta')
annotation = explore.Annotation(msa, 'example_alignments/DNA_RNA.gff3')
# you can set the zoom range and the reference id if needed
msa.zoom = (0, 1500)
msa.reference_id = 'your_ref_id'
# access functions on what to compute on the MSA
msa.calc_pairwise_identity_matrix()
Importantly, multiple sequence alignments should have the format:
>Seq1
ATCGATCGATCGATCG
>Seq2
ATCGATCGATCGATCG
>Seq3
ATCGATCGATCGATCG
Addtionally, you can also read in an annotation in bed
, gff3
or gb
format and connect them to the MSA. Importantly,
the sequence identifier has to be part of the alignment. All genomic locations are then automatically adapted to the
alignment.
Plotting
The plotting draw
module has several predefined functions to draw alignments.
'''
an example demonstrating how to plot multiple sequence alignments
'''
# import all packages
import matplotlib.pyplot as plt
from msaexplorer import explore
from msaexplorer import draw
# load alignment
aln = explore.MSA("example_alignments/DNA.fasta", reference_id=None, zoom_range=None)
# set reference to first sequence
aln.reference_id = list(aln.alignment.keys())[0]
# configure the plot layout
fig, ax = plt.subplots(nrows=10, height_ratios=[0.2,0.2,0.2,0.2,2,0.2,2,0.2,0.2,0.5], sharex=False)
# plot everything
draw.stat_plot(
aln,
ax[0],
"gc",
rolling_average=20,
line_color="black"
)
draw.stat_plot(
aln,
ax[1],
stat_type="entropy",
rolling_average=1,
line_color="indigo"
)
draw.stat_plot(
aln,
ax[2],
"coverage",
rolling_average=1
)
draw.stat_plot(
aln,
ax[3],
stat_type="identity",
rolling_average=1,
line_color="grey"
)
draw.identity_alignment(
aln,
ax[4],
show_gaps=False,
show_mask=True,
show_mismatches=True,
reference_color='lightsteelblue',
color_mismatching_chars=True,
show_seq_names=False,
show_ambiguities=True,
fancy_gaps=True,
show_x_label=False,
show_legend=True,
bbox_to_anchor=(1,1.05)
)
draw.stat_plot(
aln,
ax[5],
stat_type="similarity",
rolling_average=1,
line_color="darkblue"
)
draw.similarity_alignment(
aln, ax[6],
fancy_gaps=True,
show_gaps=True,
matrix_type='TRANS',
show_cbar=True,
cbar_fraction=0.02,
show_x_label=False
)
draw.orf_plot(
aln,
ax[7],
cmap='hsv',
non_overlapping_orfs=False,
show_cbar=True,
cbar_fraction=0.2,
min_length=150
)
draw.annotation_plot(
aln,
'example_alignments/DNA_RNA.gff3',
ax[8],
feature_to_plot='gene',
show_x_label=False
)
draw.variant_plot(
aln,
ax[9],
show_x_label=True,
show_legend=True,
bbox_to_anchor=(1,1.35)
)
# set the size of the figure
fig.set_size_inches(14, 29)
fig.tight_layout() # ensures that everything is correctly plotted
# save to file
plt.show()
1r""" 2# What is MSAexplorer? 3 4MSAexplorer allows the analysis and straight forward plotting of multiple sequence alignments . 5It's focus is to act as simple python3 extension or shiny app with minimal dependencies and syntax. It's easy 6to setup and highly customizable. 7 8# Usage as a shiny application 9 10The current version of the app is deployed to [github pages](https://jonas-fuchs.github.io/MSAexplorer/shiny/). This application is serverless and all 11computation runs through your browser. There is no need to install anything. Just enjoy the app! 12 13However, you can also deploy it yourself and host it however you like! 14 15```bash 16git clone https://github.com/jonas-fuchs/MSAexplorer 17cd MSAexplorer 18pip install -r requirements.txt # installs all dependencies 19shiny run shinyapp/app.py 20``` 21 22Now just follow the link provided in your terminal. 23 24 25# Usage as a python3 package 26 27## Installation 28 29Some simple steps are needed at the moment but in the future you will be able to install MSAexplorer via `pip install msaexplorer`. 30 31```bash 32git clone https://github.com/jonas-fuchs/MSAexplorer 33cd MSAexplorer 34pip install . 35``` 36 37Now you are able to use MSAexplorer like any package that you would install via `pip`. 38 39## Analysis 40 41The `explore` module lets you load an alignment file and analyze it. 42 43```python 44''' 45a small example on how to use the MSAexplorer package 46''' 47 48from msaexplorer import explore 49 50# load MSA 51msa = explore.MSA('example_alignments/DNA.fasta') 52annotation = explore.Annotation(msa, 'example_alignments/DNA_RNA.gff3') 53 54# you can set the zoom range and the reference id if needed 55msa.zoom = (0, 1500) 56msa.reference_id = 'your_ref_id' 57 58# access functions on what to compute on the MSA 59msa.calc_pairwise_identity_matrix() 60``` 61 62Importantly, multiple sequence alignments should have the format: 63 64``` 65>Seq1 66ATCGATCGATCGATCG 67>Seq2 68ATCGATCGATCGATCG 69>Seq3 70ATCGATCGATCGATCG 71``` 72 73Addtionally, you can also read in an annotation in `bed`, `gff3` or `gb` format and connect them to the MSA. Importantly, 74the sequence identifier has to be part of the alignment. All genomic locations are then automatically adapted to the 75alignment. 76 77## Plotting 78 79The plotting `draw` module has several predefined functions to draw alignments. 80 81```python 82''' 83an example demonstrating how to plot multiple sequence alignments 84''' 85# import all packages 86import matplotlib.pyplot as plt 87from msaexplorer import explore 88from msaexplorer import draw 89 90# load alignment 91aln = explore.MSA("example_alignments/DNA.fasta", reference_id=None, zoom_range=None) 92# set reference to first sequence 93aln.reference_id = list(aln.alignment.keys())[0] 94# configure the plot layout 95fig, ax = plt.subplots(nrows=10, height_ratios=[0.2,0.2,0.2,0.2,2,0.2,2,0.2,0.2,0.5], sharex=False) 96 97# plot everything 98draw.stat_plot( 99 aln, 100 ax[0], 101 "gc", 102 rolling_average=20, 103 line_color="black" 104) 105draw.stat_plot( 106 aln, 107 ax[1], 108 stat_type="entropy", 109 rolling_average=1, 110 line_color="indigo" 111) 112draw.stat_plot( 113 aln, 114 ax[2], 115 "coverage", 116 rolling_average=1 117) 118draw.stat_plot( 119 aln, 120 ax[3], 121 stat_type="identity", 122 rolling_average=1, 123 line_color="grey" 124) 125draw.identity_alignment( 126 aln, 127 ax[4], 128 show_gaps=False, 129 show_mask=True, 130 show_mismatches=True, 131 reference_color='lightsteelblue', 132 color_mismatching_chars=True, 133 show_seq_names=False, 134 show_ambiguities=True, 135 fancy_gaps=True, 136 show_x_label=False, 137 show_legend=True, 138 bbox_to_anchor=(1,1.05) 139) 140draw.stat_plot( 141 aln, 142 ax[5], 143 stat_type="similarity", 144 rolling_average=1, 145 line_color="darkblue" 146) 147draw.similarity_alignment( 148 aln, ax[6], 149 fancy_gaps=True, 150 show_gaps=True, 151 matrix_type='TRANS', 152 show_cbar=True, 153 cbar_fraction=0.02, 154 show_x_label=False 155) 156draw.orf_plot( 157 aln, 158 ax[7], 159 cmap='hsv', 160 non_overlapping_orfs=False, 161 show_cbar=True, 162 cbar_fraction=0.2, 163 min_length=150 164) 165draw.annotation_plot( 166 aln, 167 'example_alignments/DNA_RNA.gff3', 168 ax[8], 169 feature_to_plot='gene', 170 show_x_label=False 171) 172draw.variant_plot( 173 aln, 174 ax[9], 175 show_x_label=True, 176 show_legend=True, 177 bbox_to_anchor=(1,1.35) 178) 179 180# set the size of the figure 181fig.set_size_inches(14, 29) 182fig.tight_layout() # ensures that everything is correctly plotted 183 184# save to file 185plt.show() 186``` 187""" 188 189 190import importlib.metadata, pathlib, tomllib 191 192# get __version__ from pyproject.toml 193source_location = pathlib.Path("__file__").parent 194if (source_location.parent / "pyproject.toml").exists(): 195 with open(source_location.parent / "pyproject.toml", "rb") as f: 196 __version__ = tomllib.load(f)['project']['version'] 197else: 198 __version__ = importlib.metadata.version("msaexplorer")