msaexplorer

What is MSAexplorer?

MSAexplorer allows the analysis and straight forward plotting of multiple sequence alignments . It's focus is to act as simple python3 extension or shiny app with minimal dependencies and syntax. It's easy to setup and highly customizable.

Usage as a shiny application

The current version of the app is deployed to github pages. This application is serverless and all computation runs through your browser. There is no need to install anything. Just enjoy the app!

However, you can also deploy it yourself and host it however you like!

git clone https://github.com/jonas-fuchs/MSAexplorer
cd MSAexplorer
pip install -r requirements.txt  # installs all dependencies
shiny run shinyapp/app.py

Now just follow the link provided in your terminal.

Usage as a python3 package

Installation

Some simple steps are needed at the moment but in the future you will be able to install MSAexplorer via pip install msaexplorer.

git clone https://github.com/jonas-fuchs/MSAexplorer
cd MSAexplorer
pip install .

Now you are able to use MSAexplorer like any package that you would install via pip.

Analysis

The explore module lets you load an alignment file and analyze it.

'''
a small example on how to use the MSAexplorer package
'''

from msaexplorer import explore

# load MSA
msa = explore.MSA('example_alignments/DNA.fasta')
annotation = explore.Annotation(msa, 'example_alignments/DNA_RNA.gff3')

# you can set the zoom range and the reference id if needed
msa.zoom = (0, 1500)
msa.reference_id = 'your_ref_id'

# access functions on what to compute on the MSA
msa.calc_pairwise_identity_matrix()

Importantly, multiple sequence alignments should have the format:

>Seq1
ATCGATCGATCGATCG
>Seq2
ATCGATCGATCGATCG
>Seq3
ATCGATCGATCGATCG

Addtionally, you can also read in an annotation in bed, gff3 or gb format and connect them to the MSA. Importantly, the sequence identifier has to be part of the alignment. All genomic locations are then automatically adapted to the alignment.

Plotting

The plotting draw module has several predefined functions to draw alignments.

'''
an example demonstrating how to plot multiple sequence alignments
'''
# import all packages
import matplotlib.pyplot as plt
from msaexplorer import explore
from msaexplorer import draw

#  load alignment
aln = explore.MSA("example_alignments/DNA.fasta", reference_id=None, zoom_range=None)
# set reference to first sequence
aln.reference_id = list(aln.alignment.keys())[0]
# configure the plot layout
fig, ax = plt.subplots(nrows=10, height_ratios=[0.2,0.2,0.2,0.2,2,0.2,2,0.2,0.2,0.5], sharex=False)

# plot everything
draw.stat_plot(
    aln,
    ax[0],
    "gc",
    rolling_average=20,
    line_color="black"
)
draw.stat_plot(
    aln,
    ax[1],
    stat_type="entropy",
    rolling_average=1,
    line_color="indigo"
)
draw.stat_plot(
    aln,
    ax[2],
    "coverage",
    rolling_average=1
)
draw.stat_plot(
    aln,
    ax[3],
    stat_type="identity",
    rolling_average=1,
    line_color="grey"
)
draw.identity_alignment(
    aln,
    ax[4],
    show_gaps=False,
    show_mask=True,
    show_mismatches=True,
    reference_color='lightsteelblue',
    color_mismatching_chars=True,
    show_seq_names=False,
    show_ambiguities=True,
    fancy_gaps=True,
    show_x_label=False,
    show_legend=True,
    bbox_to_anchor=(1,1.05)
)
draw.stat_plot(
    aln,
    ax[5],
    stat_type="similarity",
    rolling_average=1,
    line_color="darkblue"
)
draw.similarity_alignment(
    aln, ax[6],
    fancy_gaps=True,
    show_gaps=True,
    matrix_type='TRANS',
    show_cbar=True,
    cbar_fraction=0.02,
    show_x_label=False
)
draw.orf_plot(
    aln,
    ax[7],
    cmap='hsv',
    non_overlapping_orfs=False,
    show_cbar=True,
    cbar_fraction=0.2,
    min_length=150
)
draw.annotation_plot(
    aln,
    'example_alignments/DNA_RNA.gff3',
    ax[8],
    feature_to_plot='gene',
    show_x_label=False
)
draw.variant_plot(
    aln,
    ax[9],
    show_x_label=True,
    show_legend=True,
    bbox_to_anchor=(1,1.35)
)

# set the size of the figure
fig.set_size_inches(14, 29)
fig.tight_layout()  # ensures that everything is correctly plotted

# save to file
plt.show()
  1r"""
  2# What is MSAexplorer?
  3
  4MSAexplorer allows the analysis and straight forward plotting of multiple sequence alignments .
  5It's focus is to act as simple python3 extension or shiny app with minimal dependencies and syntax. It's easy
  6to setup and highly customizable.
  7
  8# Usage as a shiny application
  9
 10The current version of the app is deployed to [github pages](https://jonas-fuchs.github.io/MSAexplorer/shiny/). This application is serverless and all
 11computation runs through your browser. There is no need to install anything. Just enjoy the app!
 12
 13However, you can also deploy it yourself and host it however you like!
 14
 15```bash
 16git clone https://github.com/jonas-fuchs/MSAexplorer
 17cd MSAexplorer
 18pip install -r requirements.txt  # installs all dependencies
 19shiny run shinyapp/app.py
 20```
 21
 22Now just follow the link provided in your terminal.
 23
 24
 25# Usage as a python3 package
 26
 27## Installation
 28
 29Some simple steps are needed at the moment but in the future you will be able to install MSAexplorer via `pip install msaexplorer`.
 30
 31```bash
 32git clone https://github.com/jonas-fuchs/MSAexplorer
 33cd MSAexplorer
 34pip install .
 35```
 36
 37Now you are able to use MSAexplorer like any package that you would install via `pip`.
 38
 39## Analysis
 40
 41The `explore` module lets you load an alignment file and analyze it.
 42
 43```python
 44'''
 45a small example on how to use the MSAexplorer package
 46'''
 47
 48from msaexplorer import explore
 49
 50# load MSA
 51msa = explore.MSA('example_alignments/DNA.fasta')
 52annotation = explore.Annotation(msa, 'example_alignments/DNA_RNA.gff3')
 53
 54# you can set the zoom range and the reference id if needed
 55msa.zoom = (0, 1500)
 56msa.reference_id = 'your_ref_id'
 57
 58# access functions on what to compute on the MSA
 59msa.calc_pairwise_identity_matrix()
 60```
 61
 62Importantly, multiple sequence alignments should have the format:
 63
 64```
 65>Seq1
 66ATCGATCGATCGATCG
 67>Seq2
 68ATCGATCGATCGATCG
 69>Seq3
 70ATCGATCGATCGATCG
 71```
 72
 73Addtionally, you can also read in an annotation in `bed`, `gff3` or `gb` format and connect them to the MSA. Importantly,
 74the sequence identifier has to be part of the alignment. All genomic locations are then automatically adapted to the
 75alignment.
 76
 77## Plotting
 78
 79The plotting `draw` module has several predefined functions to draw alignments.
 80
 81```python
 82'''
 83an example demonstrating how to plot multiple sequence alignments
 84'''
 85# import all packages
 86import matplotlib.pyplot as plt
 87from msaexplorer import explore
 88from msaexplorer import draw
 89
 90#  load alignment
 91aln = explore.MSA("example_alignments/DNA.fasta", reference_id=None, zoom_range=None)
 92# set reference to first sequence
 93aln.reference_id = list(aln.alignment.keys())[0]
 94# configure the plot layout
 95fig, ax = plt.subplots(nrows=10, height_ratios=[0.2,0.2,0.2,0.2,2,0.2,2,0.2,0.2,0.5], sharex=False)
 96
 97# plot everything
 98draw.stat_plot(
 99    aln,
100    ax[0],
101    "gc",
102    rolling_average=20,
103    line_color="black"
104)
105draw.stat_plot(
106    aln,
107    ax[1],
108    stat_type="entropy",
109    rolling_average=1,
110    line_color="indigo"
111)
112draw.stat_plot(
113    aln,
114    ax[2],
115    "coverage",
116    rolling_average=1
117)
118draw.stat_plot(
119    aln,
120    ax[3],
121    stat_type="identity",
122    rolling_average=1,
123    line_color="grey"
124)
125draw.identity_alignment(
126    aln,
127    ax[4],
128    show_gaps=False,
129    show_mask=True,
130    show_mismatches=True,
131    reference_color='lightsteelblue',
132    color_mismatching_chars=True,
133    show_seq_names=False,
134    show_ambiguities=True,
135    fancy_gaps=True,
136    show_x_label=False,
137    show_legend=True,
138    bbox_to_anchor=(1,1.05)
139)
140draw.stat_plot(
141    aln,
142    ax[5],
143    stat_type="similarity",
144    rolling_average=1,
145    line_color="darkblue"
146)
147draw.similarity_alignment(
148    aln, ax[6],
149    fancy_gaps=True,
150    show_gaps=True,
151    matrix_type='TRANS',
152    show_cbar=True,
153    cbar_fraction=0.02,
154    show_x_label=False
155)
156draw.orf_plot(
157    aln,
158    ax[7],
159    cmap='hsv',
160    non_overlapping_orfs=False,
161    show_cbar=True,
162    cbar_fraction=0.2,
163    min_length=150
164)
165draw.annotation_plot(
166    aln,
167    'example_alignments/DNA_RNA.gff3',
168    ax[8],
169    feature_to_plot='gene',
170    show_x_label=False
171)
172draw.variant_plot(
173    aln,
174    ax[9],
175    show_x_label=True,
176    show_legend=True,
177    bbox_to_anchor=(1,1.35)
178)
179
180# set the size of the figure
181fig.set_size_inches(14, 29)
182fig.tight_layout()  # ensures that everything is correctly plotted
183
184# save to file
185plt.show()
186```
187"""
188
189
190import importlib.metadata, pathlib, tomllib
191
192# get __version__ from pyproject.toml
193source_location = pathlib.Path("__file__").parent
194if (source_location.parent / "pyproject.toml").exists():
195    with open(source_location.parent / "pyproject.toml", "rb") as f:
196        __version__ = tomllib.load(f)['project']['version']
197else:
198    __version__ = importlib.metadata.version("msaexplorer")
source_location = PosixPath('.')