InChINet: a self-supervised molecular representation learning framework leveraging SMILES and InChI

Yongna Yuan; Jiahe Kang; Yuanchen Li; Ruisheng Zhang; Wei Su

doi:10.1039/D5CP04869A

InChINet: a self-supervised molecular representation learning framework leveraging SMILES and InChI

Yongna Yuan,

*^a Jiahe Kang,^a Yuanchen Li,^a Ruisheng Zhang^a and Wei Su^a

Author affiliations

* Corresponding authors

^a School of Information Science & Engineering, Lanzhou University, South Tianshui Road, Lanzhou, Gansu, China
E-mail: yuanyn@lzu.edu.cn

Abstract

Molecular representation, as one of the fundamental challenges in artificial intelligence-driven drug discovery, has attracted increasing attention due to its low cost and impressive speed while it is applied in molecular property prediction, drug molecule generation, drug–drug interactions, etc. Numerous models that integrate multi-modal representations have been proposed for molecular representation learning. However, existing methods have not yet considered the IUPAC International Chemical Identifier (InChI) as one of the multi-modal inputs. To address this issue, we propose InChINet, a self-supervised molecular representation learning framework that is pre-trained on 10 million unlabeled molecules. It leverages mutual information across the simplified molecular line input system (SMILES) and InChI. In addition, we present token reordering and token masking for SMILES. Combined with SMILES enumeration, these three strategies introduce domain knowledge and improve the model's stability against syntactic variations in SMILES representations. Benefiting from the introduction of InChI and augmentation strategies, InChINet achieves impressive performance on a wide range of downstream tasks, including molecular property prediction, drug–drug interaction (DDI) prediction, clustering analysis, zero-shot cross-lingual retrieval, and ablation study.

Article information

https://doi.org/10.1039/D5CP04869A

Article type

Paper

Submitted

15 Dec 2025

Accepted

23 Feb 2026

First published

13 Mar 2026

Download Citation

Phys. Chem. Chem. Phys., 2026,28, 8115-8127

Permissions

Request permissions

InChINet: a self-supervised molecular representation learning framework leveraging SMILES and InChI

Y. Yuan, J. Kang, Y. Li, R. Zhang and W. Su, Phys. Chem. Chem. Phys., 2026, 28, 8115 DOI: 10.1039/D5CP04869A

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Physical Chemistry Chemical Physics

InChINet: a self-supervised molecular representation learning framework leveraging SMILES and InChI

Abstract

Article information

Download Citation

Permissions

InChINet: a self-supervised molecular representation learning framework leveraging SMILES and InChI

Social activity

Search articles by author

Spotlight

Advertisements