Issue 11, 2025

MAAPE: a tool for modular evolution analysis of protein embeddings

Abstract

We present MAAPE, a novel algorithm that integrates a k-nearest neighbour (KNN) similarity network with co-occurrence matrix analysis to extract evolutionary insights from protein language model (PLM) embeddings. The KNN network captures diverse evolutionary relationships and events, whereas the co-occurrence matrix identifies directional evolutionary paths and potential signals of gene transfer. MAAPE addresses the limitations of traditional sequence alignment methods by effectively detecting structural homology and functional associations in protein sequences with low similarity. By employing sliding windows of varying sizes, it analyses embeddings to uncover both local and global evolutionary signals encoded by PLMs. We benchmarked the MAAPE approach on three well-characterised protein family datasets: the RecA/RAD51 DNA repair protein families, the form I Rubisco families and P450 proteins from oomycetes. In all cases, MAAPE successfully reconstructed evolutionary networks that aligned with established phylogenetic relationships. This approach offers a deeper understanding of evolutionary relationships and holds significant potential for applications in protein evolution research, functional prediction, and rational design of novel proteins. The MAAPE algorithm is available at GitHub repository: https://github.com/Qinlab502/MAAPE.

Graphical abstract: MAAPE: a tool for modular evolution analysis of protein embeddings

Supplementary files

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article.

View this article’s peer review history

Article information

Article type
Paper
Submitted
08 Jan 2025
Accepted
30 Sep 2025
First published
01 Oct 2025
This article is Open Access
Creative Commons BY license

Digital Discovery, 2025,4, 3245-3259

MAAPE: a tool for modular evolution analysis of protein embeddings

X. Wang, Q. Gao, H. Zhang, J. Huang and Z. Qin, Digital Discovery, 2025, 4, 3245 DOI: 10.1039/D5DD00009B

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements