A simple similarity metric for comparing synthetic routes†
Abstract
Experimentally validated routes to synthetic compounds can be compared to each other by quantitative metrics (step count, yield, atom economy), or by qualitative assessments (strategy, novelty). AI-predicted routes are typically compared to experimental syntheses to check for an exact match among the top-ranked predictions (top-N accuracy). This method is ideal for the evaluation of retrosynthetic algorithms on large datasets (>106 routes), but it cannot assess a degree of similarity between routes, which would be desirable for small datasets (<102 routes). Here, we present a simple method to calculate a similarity score between any two synthetic routes to a given molecule. The score is based on two concepts: which bonds are formed during the synthesis; and how the atoms of the final compound are grouped together throughout the synthesis. As a result, the similarity score overlaps well with chemists' intuition and provides a finer assessment of prediction accuracy.