Protein language visualizer: a repository for homology exploration with language model embeddings

Abstract

The era of modern AI-driven representations of proteins is here, and moving fast, yet tools for their intuitive visualization and exploration lag behind. Sequence Similarity Networks (SSNs) have long filled this role for alignment-based methods, providing simple but widely adopted platforms for grouping proteins by homology. Building on this foundation, we present the Protein Language Visualizer (PLVis), a modular framework that applies existing pre-trained protein language model (pLM) embeddings, dimensionality reduction, and clustering to generate interactive maps of protein relationships. The central contribution is the PLVis repository, an online resource where thousands of reference proteomes can be compared and annotated through an accessible, interactive interface, much like SSNs became impactful not for their technical novelty but for their broad usability. We first validate that well-separated clusters in PLVis reliably capture homology information, while emphasizing caution when interpreting central “fuzzy” regions. We then illustrate the value of PLVis through case studies spanning individual protein families to full proteome comparisons across Mycobacterium and Plasmodium species. By combining methodological clarity with broad accessibility, the PLVis repository provides a low-barrier platform for exploring proteomes through the lens of language models.

Graphical abstract: Protein language visualizer: a repository for homology exploration with language model embeddings

Supplementary files

Article information

Article type
Paper
Submitted
21 Oct 2025
Accepted
21 May 2026
First published
16 Jun 2026
This article is Open Access
Creative Commons BY license

Digital Discovery, 2026, Advance Article

Protein language visualizer: a repository for homology exploration with language model embeddings

J. Espinoza-Herrera, M. F. Manríquez-García, S. Medina-Bermejo, A. López-Jasso, J. P. Ruiz-Alcocer, A. Siordia, S. M. Veskimägi, N. Roethler and A. Jinich, Digital Discovery, 2026, Advance Article , DOI: 10.1039/D5DD00472A

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements