SEM-VLM, a domain-specific vision-language model trained via contrastive learning on SEM image–text pairs mined from scientific literature, demonstrates superior performance in recognizing SEM images of nanomaterials.
We trained multiple peptide language models and demonstrated their efficacy at predicting the substrates of ribosomally synthesized and post-translationally modified peptide (RiPP) biosynthetic enzymes.
Efficiently harnessing big data by combining molecular modelling and machine learning accelerates rational enzyme design for its applications in fine chemical synthesis and waste valorization, to address global environmental issues and sustainable development.
This review surveys trends in molecular representation learning—including GNNs, VAEs, transformers, hybrid SSL models—and their roles in property prediction, generative modeling, and cross-domain generalization.
Integrating physics-based and deep learning methods advances protein–ligand modeling, boosting accuracy, scalability, and efficiency. This review surveys progress, integration strategies, challenges, and the outlook for AI-driven drug discovery.