Machine learning-assisted data-driven optimization and understanding of the multiple stage process for extraction of polysaccharides and secondary metabolites from natural products†
Abstract
Currently, extraction process optimization is generally based on a few features, regardless of their different changing trends and the panoramic view of the extraction process. Comprehensive evaluation and understanding is hard to establish due to the small number of experiments. Here, machine learning-assisted optimization is demonstrated for better understanding the complex extraction process based on data from an orthogonal experimental design (OED). From two perspectives of panoramic characteristics and specific characteristics, several observations are adopted to evaluate the performance of the extraction process, including quantitative 1H NMR, HPLC fingerprint, molecular weight, yield of dry extract and content of components. The close relationship between influencing factors and the extraction performance is described by grey relation analysis. With the help of radial basis function neural network (RBFNN), a nonlinear fitting regression equation is developed for every observation and influencing factor. A genetic algorithm is then introduced for multi-objective optimization and Pareto fronts are obtained. To select the best combination of water extraction process and ethanol extraction process, a list of the combinations of Pareto front points from those extraction processes is formed and ranked using CRITIC-TOPSIS. Finally, the ideal extraction is characterized by molecular weight, monosaccharide composition and UHPLC-MS/MS. With the verification between OED experiments and machine learning, the changing rates of all observations range from 1.33% to 30.11%, which confirms that machine learning-assisted optimization gives better performance than conventional OED. Molecular weight could range from 61.5~594.9 kDa with some are over measuring range, furthermore mannose and glucose are the most abuntant monosaccharides of the polysaccharide from ideal extraction. 160 components are identified via UHPLC-MS/MS as well. In conclusion, ML is a powerful tool for predicting and understanding extraction processes, thus accelerating the development of eco-friendly extraction processes.
- This article is part of the themed collection: 2023 Green Chemistry Hot Articles