A robust data analytical method to investigate sequence dependence in flow-based peptide synthesis†
Abstract
Computer-assisted methods, which hold the promise to transform synthetic organic chemistry, are often limited by experimental data lacking in quality, diversity, and quantity. In solid-phase peptide synthesis (SPPS), automated flow chemistry is well-suited to deliver such data, which is key for prediction and optimization of sequence-dependent “difficult couplings”, and insights obtained in flow-SPPS can be transferred to batch-SPPS. The current data analysis techniques rely on the height and the width of fluorenylmethoxycarbonyl (Fmoc) deprotection peaks and perform well under standard conditions. Yet any deviation in parameters (e.g. temperature, flow rate, resin loading) leads to incomplete capture of information and exclusion from the dataset. Here, we present a flexible and robust processing and analysis method that is based on the Gaussian shape of the deprotection peaks to overcome these challenges, which drastically increases the interpretable size of our data set. Using this straightforward method retains the full information and data quality while the generation of hazardous dimethylformamide solvent waste is reduced by 50%. Overall, this work highlights how the interplay between synthetic and computational analysis enables the collection of high-quality data even under non-ideal, non-standard conditions.
- This article is part of the themed collection: Emerging Investigator Series