Protein language visualizer: a repository for homology exploration with language model embeddings
Abstract
The era of modern AI-driven representations of proteins is here, and moving fast, yet tools for their intuitive visualization and exploration lag behind. Sequence Similarity Networks (SSNs) have long filled this role for alignment-based methods, providing simple but widely adopted platforms for grouping proteins by homology. Building on this foundation, we present the Protein Language Visualizer (PLVis), a modular framework that applies existing pre-trained protein language model (pLM) embeddings, dimensionality reduction, and clustering to generate interactive maps of protein relationships. The central contribution is the PLVis repository, an online resource where thousands of reference proteomes can be compared and annotated through an accessible, interactive interface, much like SSNs became impactful not for their technical novelty but for their broad usability. We first validate that well-separated clusters in PLVis reliably capture homology information, while emphasizing caution when interpreting central “fuzzy” regions. We then illustrate the value of PLVis through case studies spanning individual protein families to full proteome comparisons across Mycobacterium and Plasmodium species. By combining methodological clarity with broad accessibility, the PLVis repository provides a low-barrier platform for exploring proteomes through the lens of language models.

Please wait while we load your content...