Enhancing Predictive Modeling with Molecular Fingerprint Fusion Strategies

Abstract

A large number of chemicals remain poorly characterized in terms of their physicochemical properties, biological activity, and environmental fate. Quantitative structure-activity relationship (QSAR) models have become indispensable tools for predicting these properties, especially for compounds that lack comprehensive experimental data. The choice of structural representation as an input to such models plays a critical role in ensuring high predictive performance and in identifying molecular features that strongly contribute to activity prediction. Both hashed and non-hashed molecular fingerprints are widely employed as inputs in QSAR modeling across various domains. While some studies have explored combining multiple fingerprints to improve molecular representation, comprehensive investigations into different fingerprint fusion strategies and the generalizability of a fused fingerprint across diverse prediction tasks remain limited. In this study, we applied low-, mid-, and high-levels fusion strategies to combine six non-hashed fingerprints and evaluated model performance across six publicly available datasets, including three regression and three classification tasks. Our results demonstrate that mid-level fusion, where fingerprint bits are selectively combined based on their importance within individual models, consistently improves predictive accuracy, as assessed by RMSE and R2 for regression, and F1-score and ROC-AUC for classification. The algorithm developed for molecular fingerprints fusion is universal and can be applied to a wide range of predictive modeling problems or other non-hashed molecular fingerprints.

Supplementary files

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article.

View this article’s peer review history

Article information

Article type
Paper
Submitted
10 Jul 2025
Accepted
09 Apr 2026
First published
16 Apr 2026
This article is Open Access
Creative Commons BY license

Digital Discovery, 2025, Accepted Manuscript

Enhancing Predictive Modeling with Molecular Fingerprint Fusion Strategies

V. Turkina, M. R.W. Messih, E. Kant, J. T. Gringhuis, A. Petrignani, G. Corthals, J. W. O'Brien and S. Samanipour, Digital Discovery, 2025, Accepted Manuscript , DOI: 10.1039/D5DD00302D

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements