msaexplorer

What is MSAexplorer?

MSAexplorer allows the analysis and straight forward plotting of multiple sequence alignments. Its focus is to act as a simple python3 extension or shiny app with minimal dependencies and syntax. It's easy to set up and highly customizable.

Installation

pip install msaexplorer # or
pip install msaexplorer[process]  # additionally installs pyfamsa and pytrimal (not required, but optional in the app)

From this repo

git clone https://github.com/jonas-fuchs/MSAexplorer
cd MSAexplorer
pip install . # or
pip install .[process]

Usage as a shiny application

The current version of the app is also deployed to GitHub pages. This application is serverless, and all computation runs through your browser. There is no need to install anything. Enjoy the app!

However, you can also run it yourself or host it however you like!

Running the app

msaexplorer --run

Now just follow the link provided in your terminal.

Exporting as a static site

pip install shinylive
git clone https://github.com/jonas-fuchs/MSAexplorer
cd MSAexplorer
shinylive export ./ site/  # you should now have a new 'site' folder with the app

Usage as a python3 package

If you only want to use the MSAexplorer package without the shiny app, you can install it as follows:

pip install msaexplorer

Analysis

The explore module lets you load an alignment file and analyze it.

'''
a small example on how to use the MSAexplorer package
'''

from msaexplorer import explore

# load MSA
msa = explore.MSA('example_alignments/DNA.fasta')
annotation = explore.Annotation(msa, 'example_alignments/DNA_RNA.gff3')

# you can set the zoom range and the reference id if needed
msa.zoom = (0, 1500)
msa.reference_id = 'your_ref_id'

# access functions on what to compute on the MSA
msa.calc_pairwise_identity_matrix()

Importantly, multiple sequence alignments should have the format:

>Seq1
ATCGATCGATCGATCG
>Seq2
ATCGATCGATCGATCG
>Seq3
ATCGATCGATCGATCG

Additionally, you can also read in an annotation in bed, gff3 or gb format and connect them to the MSA. Importantly, the sequence identifier has to be part of the alignment. All genomic locations are then automatically adapted to the alignment.

Plotting

The plotting draw module has several predefined functions to draw alignments.

'''
an example demonstrating how to plot multiple sequence alignments
'''
# import all packages
import matplotlib.pyplot as plt
from msaexplorer import explore
from msaexplorer import draw

#  load alignment
aln = explore.MSA("example_alignments/DNA.fasta", reference_id=None, zoom_range=None)
# set reference to e.g. the first sequence
aln.reference_id = list(aln.alignment.keys())[0]

fig, ax = plt.subplots(nrows=2, height_ratios=[0.2, 2], sharex=False)

draw.stat_plot(
    aln,
    ax[0],
    stat_type="entropy",
    rolling_average=1,
    line_color="indigo"
)

draw.identity_alignment(
    aln,
    ax[1],
    show_gaps=False,
    show_mask=True,
    show_mismatches=True,
    reference_color='lightsteelblue',
    color_scheme='purine_pyrimidine',
    show_seq_names=False,
    show_ambiguities=True,
    fancy_gaps=True,
    show_x_label=False,
    show_legend=True,
    bbox_to_anchor=(1,1.05)
)

plt.show()
  1r"""
  2# What is MSAexplorer?
  3
  4MSAexplorer allows the analysis and straight forward plotting of multiple sequence alignments.
  5Its focus is to act as a simple python3 extension or shiny app with minimal dependencies and syntax. It's easy
  6to set up and highly customizable.
  7
  8# Installation
  9
 10#### Via pip (recommended)
 11```bash
 12pip install msaexplorer # or
 13pip install msaexplorer[process]  # additionally installs pyfamsa and pytrimal (not required, but optional in the app)
 14```
 15
 16#### From this repo
 17```bash
 18git clone https://github.com/jonas-fuchs/MSAexplorer
 19cd MSAexplorer
 20pip install . # or
 21pip install .[process]
 22```
 23
 24# Usage as a shiny application
 25
 26The current version of the app is also deployed to [GitHub pages](https://jonas-fuchs.github.io/MSAexplorer/app/). This application is serverless, and all
 27computation runs through your browser. There is no need to install anything. Enjoy the app!
 28
 29However, you can also run it yourself or host it however you like!
 30
 31#### Running the app
 32```bash
 33msaexplorer --run
 34```
 35Now just follow the link provided in your terminal.
 36
 37#### Exporting as a static site
 38```bash
 39pip install shinylive
 40git clone https://github.com/jonas-fuchs/MSAexplorer
 41cd MSAexplorer
 42shinylive export ./ site/  # you should now have a new 'site' folder with the app
 43```
 44
 45# Usage as a python3 package
 46
 47If you only want to use the MSAexplorer package without the shiny app, you can install it as follows:
 48
 49```bash
 50pip install msaexplorer
 51```
 52
 53## Analysis
 54
 55The `explore` module lets you load an alignment file and analyze it.
 56
 57```python
 58'''
 59a small example on how to use the MSAexplorer package
 60'''
 61
 62from msaexplorer import explore
 63
 64# load MSA
 65msa = explore.MSA('example_alignments/DNA.fasta')
 66annotation = explore.Annotation(msa, 'example_alignments/DNA_RNA.gff3')
 67
 68# you can set the zoom range and the reference id if needed
 69msa.zoom = (0, 1500)
 70msa.reference_id = 'your_ref_id'
 71
 72# access functions on what to compute on the MSA
 73msa.calc_pairwise_identity_matrix()
 74```
 75
 76Importantly, multiple sequence alignments should have the format:
 77
 78```
 79>Seq1
 80ATCGATCGATCGATCG
 81>Seq2
 82ATCGATCGATCGATCG
 83>Seq3
 84ATCGATCGATCGATCG
 85```
 86
 87Additionally, you can also read in an annotation in `bed`, `gff3` or `gb` format and connect them to the MSA. Importantly,
 88the sequence identifier has to be part of the alignment. All genomic locations are then automatically adapted to the
 89alignment.
 90
 91## Plotting
 92
 93The plotting `draw` module has several predefined functions to draw alignments.
 94
 95```python
 96'''
 97an example demonstrating how to plot multiple sequence alignments
 98'''
 99# import all packages
100import matplotlib.pyplot as plt
101from msaexplorer import explore
102from msaexplorer import draw
103
104#  load alignment
105aln = explore.MSA("example_alignments/DNA.fasta", reference_id=None, zoom_range=None)
106# set reference to e.g. the first sequence
107aln.reference_id = list(aln.alignment.keys())[0]
108
109fig, ax = plt.subplots(nrows=2, height_ratios=[0.2, 2], sharex=False)
110
111draw.stat_plot(
112    aln,
113    ax[0],
114    stat_type="entropy",
115    rolling_average=1,
116    line_color="indigo"
117)
118
119draw.identity_alignment(
120    aln,
121    ax[1],
122    show_gaps=False,
123    show_mask=True,
124    show_mismatches=True,
125    reference_color='lightsteelblue',
126    color_scheme='purine_pyrimidine',
127    show_seq_names=False,
128    show_ambiguities=True,
129    fancy_gaps=True,
130    show_x_label=False,
131    show_legend=True,
132    bbox_to_anchor=(1,1.05)
133)
134
135plt.show()
136```
137"""
138
139from importlib.metadata import version, PackageNotFoundError
140
141try:
142    __version__ = version("msaexplorer")
143except PackageNotFoundError:
144    __version__ = "unknown"