InChINet: a self-supervised molecular representation learning framework leveraging SMILES and InChI

Abstract

Molecular representation, as one of the fundamental challenges in artificial intelligence-driven drug discovery, has attracted increasing attention due to its low cost and impressive speed while it is applied in molecular property prediction, drug molecule generation, drug–drug interactions, etc. Numerous models that integrate multi-modal representations have been proposed for molecular representation learning. However, existing methods have not yet considered the IUPAC International Chemical Identifier (InChI) as one of the multi-modal inputs. To address this issue, we propose InChINet, a self-supervised molecular representation learning framework that is pre-trained on 10 million unlabeled molecules. It leverages mutual information across the simplified molecular line input system (SMILES) and InChI. In addition, we present token reordering and token masking for SMILES. Combined with SMILES enumeration, these three strategies introduce domain knowledge and improve the model's stability against syntactic variations in SMILES representations. Benefiting from the introduction of InChI and augmentation strategies, InChINet achieves impressive performance on a wide range of downstream tasks, including molecular property prediction, drug–drug interaction (DDI) prediction, clustering analysis, zero-shot cross-lingual retrieval, and ablation study.

Graphical abstract: InChINet: a self-supervised molecular representation learning framework leveraging SMILES and InChI

Article information

Article type
Paper
Submitted
15 Dec 2025
Accepted
23 Feb 2026
First published
13 Mar 2026

Phys. Chem. Chem. Phys., 2026, Advance Article

InChINet: a self-supervised molecular representation learning framework leveraging SMILES and InChI

Y. Yuan, J. Kang, Y. Li, R. Zhang and W. Su, Phys. Chem. Chem. Phys., 2026, Advance Article , DOI: 10.1039/D5CP04869A

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements