Protein language visualizer: a repository for homology exploration with language model embeddings

Javier Espinoza-Herrera; María F. Manríquez-García; Sofía Medina-Bermejo; Ailyn López-Jasso; Juan P. Ruiz-Alcocer; Adriana Siordia; Sarah M. Veskimägi; Nate Roethler; Adrian Jinich

doi:10.1039/D5DD00472A

Protein language visualizer: a repository for homology exploration with language model embeddings

^a María F. Manríquez-García,^b Sofía Medina-Bermejo,^c Ailyn López-Jasso,^d Juan P. Ruiz-Alcocer,^e Adriana Siordia,^a Sarah M. Veskimägi,^a Nate Roethler^a and Adrian Jinich*^af

Author affiliations

* Corresponding authors

^a Department of Chemistry and Biochemistry, University of California San Diego, San Diego, USA

^b Instituto Politécnico Nacional, Silao, Mexico

^c Universidad Autónoma de Baja California, Mexicali, Mexico

^d Instituto Politécnico Nacional, Mexico City, Mexico

^e Instituto Tecnológico y de Estudios Superiores de Occidente, Guadalajara, Mexico

^f Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, USA
E-mail: ajinich@health.ucsd.edu

Abstract

The era of modern AI-driven representations of proteins is here, and moving fast, yet tools for their intuitive visualization and exploration lag behind. Sequence Similarity Networks (SSNs) have long filled this role for alignment-based methods, providing simple but widely adopted platforms for grouping proteins by homology. Building on this foundation, we present the Protein Language Visualizer (PLVis), a modular framework that applies existing pre-trained protein language model (pLM) embeddings, dimensionality reduction, and clustering to generate interactive maps of protein relationships. The central contribution is the PLVis repository, an online resource where thousands of reference proteomes can be compared and annotated through an accessible, interactive interface, much like SSNs became impactful not for their technical novelty but for their broad usability. We first validate that well-separated clusters in PLVis reliably capture homology information, while emphasizing caution when interpreting central “fuzzy” regions. We then illustrate the value of PLVis through case studies spanning individual protein families to full proteome comparisons across Mycobacterium and Plasmodium species. By combining methodological clarity with broad accessibility, the PLVis repository provides a low-barrier platform for exploring proteomes through the lens of language models.

Supplementary files

Article information

DOI: https://doi.org/10.1039/D5DD00472A
Article type: Paper
Submitted: 21 Oct 2025
Accepted: 21 May 2026
First published: 16 Jun 2026
This article is Open Access

Download Citation

Digital Discovery, 2026, Advance Article

Permissions

Request permissions

Protein language visualizer: a repository for homology exploration with language model embeddings

J. Espinoza-Herrera, M. F. Manríquez-García, S. Medina-Bermejo, A. López-Jasso, J. P. Ruiz-Alcocer, A. Siordia, S. M. Veskimägi, N. Roethler and A. Jinich, Digital Discovery, 2026, Advance Article , DOI: 10.1039/D5DD00472A

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Digital Discovery

Protein language visualizer: a repository for homology exploration with language model embeddings

Abstract

Supplementary files

Article information

Download Citation

Permissions

Protein language visualizer: a repository for homology exploration with language model embeddings

Social activity

Search articles by author

Spotlight

Advertisements