msaexplorer

What is MSAexplorer?

MSAexplorer allows the analysis and straight forward plotting of multiple sequence alignments. Its focus is to act as a simple python3 extension or shiny app with minimal dependencies and syntax. It's easy to set up and highly customizable.

Usage as a shiny application

The current version of the app is deployed to GitHub pages. This application is serverless, and all computation runs through your browser. There is no need to install anything. Enjoy the app!

However, you can also deploy it yourself and host it however you like!

git clone https://github.com/jonas-fuchs/MSAexplorer
cd MSAexplorer
pip install -r requirements.txt  # installs all dependencies
shiny run app.py

Now just follow the link provided in your terminal.

Usage as a python3 package

Installation

Some simple steps are needed at the moment but in the future you will be able to install MSAexplorer via pip install msaexplorer.

git clone https://github.com/jonas-fuchs/MSAexplorer
cd MSAexplorer
pip install .

Now you are able to use MSAexplorer like any package that you would install via pip.

Analysis

The explore module lets you load an alignment file and analyze it.

'''
a small example on how to use the MSAexplorer package
'''

from msaexplorer import explore

# load MSA
msa = explore.MSA('example_alignments/DNA.fasta')
annotation = explore.Annotation(msa, 'example_alignments/DNA_RNA.gff3')

# you can set the zoom range and the reference id if needed
msa.zoom = (0, 1500)
msa.reference_id = 'your_ref_id'

# access functions on what to compute on the MSA
msa.calc_pairwise_identity_matrix()

Importantly, multiple sequence alignments should have the format:

>Seq1
ATCGATCGATCGATCG
>Seq2
ATCGATCGATCGATCG
>Seq3
ATCGATCGATCGATCG

Additionally, you can also read in an annotation in bed, gff3 or gb format and connect them to the MSA. Importantly, the sequence identifier has to be part of the alignment. All genomic locations are then automatically adapted to the alignment.

Plotting

The plotting draw module has several predefined functions to draw alignments.

'''
an example demonstrating how to plot multiple sequence alignments
'''
# import all packages
import matplotlib.pyplot as plt
from msaexplorer import explore
from msaexplorer import draw

#  load alignment
aln = explore.MSA("example_alignments/DNA.fasta", reference_id=None, zoom_range=None)
# set reference to e.g. the first sequence
aln.reference_id = list(aln.alignment.keys())[0]

fig, ax = plt.subplots(nrows=2, height_ratios=[0.2, 2], sharex=False)

draw.stat_plot(
    aln,
    ax[0],
    stat_type="entropy",
    rolling_average=1,
    line_color="indigo"
)

draw.identity_alignment(
    aln,
    ax[1],
    show_gaps=False,
    show_mask=True,
    show_mismatches=True,
    reference_color='lightsteelblue',
    color_scheme='purine_pyrimidine',
    show_seq_names=False,
    show_ambiguities=True,
    fancy_gaps=True,
    show_x_label=False,
    show_legend=True,
    bbox_to_anchor=(1,1.05)
)

plt.show()
  1r"""
  2# What is MSAexplorer?
  3
  4MSAexplorer allows the analysis and straight forward plotting of multiple sequence alignments.
  5Its focus is to act as a simple python3 extension or shiny app with minimal dependencies and syntax. It's easy
  6to set up and highly customizable.
  7
  8# Usage as a shiny application
  9
 10The current version of the app is deployed to [GitHub pages](https://jonas-fuchs.github.io/MSAexplorer/app/). This application is serverless, and all
 11computation runs through your browser. There is no need to install anything. Enjoy the app!
 12
 13However, you can also deploy it yourself and host it however you like!
 14
 15```bash
 16git clone https://github.com/jonas-fuchs/MSAexplorer
 17cd MSAexplorer
 18pip install -r requirements.txt  # installs all dependencies
 19shiny run app.py
 20```
 21
 22Now just follow the link provided in your terminal.
 23
 24
 25# Usage as a python3 package
 26
 27## Installation
 28
 29Some simple steps are needed at the moment but in the future you will be able to install MSAexplorer via `pip install msaexplorer`.
 30
 31```bash
 32git clone https://github.com/jonas-fuchs/MSAexplorer
 33cd MSAexplorer
 34pip install .
 35```
 36
 37Now you are able to use MSAexplorer like any package that you would install via `pip`.
 38
 39## Analysis
 40
 41The `explore` module lets you load an alignment file and analyze it.
 42
 43```python
 44'''
 45a small example on how to use the MSAexplorer package
 46'''
 47
 48from msaexplorer import explore
 49
 50# load MSA
 51msa = explore.MSA('example_alignments/DNA.fasta')
 52annotation = explore.Annotation(msa, 'example_alignments/DNA_RNA.gff3')
 53
 54# you can set the zoom range and the reference id if needed
 55msa.zoom = (0, 1500)
 56msa.reference_id = 'your_ref_id'
 57
 58# access functions on what to compute on the MSA
 59msa.calc_pairwise_identity_matrix()
 60```
 61
 62Importantly, multiple sequence alignments should have the format:
 63
 64```
 65>Seq1
 66ATCGATCGATCGATCG
 67>Seq2
 68ATCGATCGATCGATCG
 69>Seq3
 70ATCGATCGATCGATCG
 71```
 72
 73Additionally, you can also read in an annotation in `bed`, `gff3` or `gb` format and connect them to the MSA. Importantly,
 74the sequence identifier has to be part of the alignment. All genomic locations are then automatically adapted to the
 75alignment.
 76
 77## Plotting
 78
 79The plotting `draw` module has several predefined functions to draw alignments.
 80
 81```python
 82'''
 83an example demonstrating how to plot multiple sequence alignments
 84'''
 85# import all packages
 86import matplotlib.pyplot as plt
 87from msaexplorer import explore
 88from msaexplorer import draw
 89
 90#  load alignment
 91aln = explore.MSA("example_alignments/DNA.fasta", reference_id=None, zoom_range=None)
 92# set reference to e.g. the first sequence
 93aln.reference_id = list(aln.alignment.keys())[0]
 94
 95fig, ax = plt.subplots(nrows=2, height_ratios=[0.2, 2], sharex=False)
 96
 97draw.stat_plot(
 98    aln,
 99    ax[0],
100    stat_type="entropy",
101    rolling_average=1,
102    line_color="indigo"
103)
104
105draw.identity_alignment(
106    aln,
107    ax[1],
108    show_gaps=False,
109    show_mask=True,
110    show_mismatches=True,
111    reference_color='lightsteelblue',
112    color_scheme='purine_pyrimidine',
113    show_seq_names=False,
114    show_ambiguities=True,
115    fancy_gaps=True,
116    show_x_label=False,
117    show_legend=True,
118    bbox_to_anchor=(1,1.05)
119)
120
121plt.show()
122```
123"""
124
125import importlib.metadata, pathlib, tomllib
126
127# get __version__ from pyproject.toml
128source_location = pathlib.Path("__file__").parent
129if (source_location.parent / "pyproject.toml").exists():
130    with open(source_location.parent / "pyproject.toml", "rb") as f:
131        __version__ = tomllib.load(f)['project']['version']
132else:
133    __version__ = importlib.metadata.version("msaexplorer")
source_location = PosixPath('.')