Multivariate statistical analysis methods in QSAR
Abstract
The emphasis of this review is particularly on multivariate statistical methods currently used in quantitative structure–activity relationship (QSAR) studies. The mathematical methods for constructing QSAR include linear and non-linear methods that solve regression and classification problems in data structure. The most widely used methods for the classification or pattern recognition; are principal component analysis (PCA) and hierarchical cluster analysis (HCA) as the exploratory data analysis methods. The regression analysis tools are artificial neural network (ANN), principal component regression (PCR), partial least squares (PLS) and classification and regression tree (CART). Also some pattern recognition approaches of k nearest neighbor (kNN), the soft independent modelling of class analogy (SIMCA) and support vector machines (SVM) have been described. Furthermore, different applications were represented for further characterization of these techniques.