Issue 27, 2025

Variable-temperature token sampling in decoder-GPT molecule-generation can produce more robust and potent virtual screening libraries

Abstract

Token generation in generative pretrained transformers (GPTs) that produce text, code, or molecules often uses conventional approaches such as greedy decoding, temperature-based sampling, or top-k or top-p techniques. This work shows that for a model trained to generate inhibitors of the enzyme HMG-coenzyme-A reductase, a variable temperature approach using a temperature ramp during the inference process produces larger sets of molecules (screening libraries) than those produced by either greedy decoding or single-temperature-based sampling. These libraries also have lower predicted IC50 values, lower docking scores, and lower synthetic accessibility scores than libraries produced by the other sampling techniques, especially when used with very short prompt-lengths. This work explores several variable-temperature schemes when generating molecules with a GPT and recommends a sigmoidal temperature ramp early in the generation process.

Graphical abstract: Variable-temperature token sampling in decoder-GPT molecule-generation can produce more robust and potent virtual screening libraries

Supplementary files

Article information

Article type
Paper
Submitted
21 Feb 2025
Accepted
18 Jun 2025
First published
19 Jun 2025
This article is Open Access
Creative Commons BY license

Phys. Chem. Chem. Phys., 2025,27, 14455-14468

Variable-temperature token sampling in decoder-GPT molecule-generation can produce more robust and potent virtual screening libraries

M. Cafiero, Phys. Chem. Chem. Phys., 2025, 27, 14455 DOI: 10.1039/D5CP00692A

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements