N. Kritikos, A. Tsantili-Kakoulidou, Y. Loukas
Oct 1, 2019
Citations
0
Influential Citations
0
Citations
Journal
Chromatographia
Abstract
The core idea behind this study was the utilization of the principal components (PCs) as a substitute of the original dataset of molecular descriptors for the derivatization products of amino acid analogues with n-propyl chloroformate. The derivatives were described through principal component analysis (PCA) with a total of over 1200 different molecular descriptors, split into groups according to their prime chemical characteristic. The form of the chemical space was modeled by PCA and optimized through a supervised procedure; whose quality was tested employing an internal cross-validation leave-more-out methodology supplying the Q2X metric (> 0.90 on most sets). The independent “spaces” formed contributed a total set of 63 variables (their PCs), the potential of which was evaluated through their application in two independent tests and more specifically in the formation of quantitative structure–retention relationships for two different chromatography systems (gas and liquid), based on published experimental data on those systems. The first model was developed through projection to latent structures methodology, while the second on multilinear regression (MLR). In both cases, the new derivatives’ descriptor set formed models of good quality (Q2Y ≥ 0.9), validated through both internal and external test set validation procedures. Their ability to be used along with other variables in simpler modelling methods, like MLR, attests to their potential to be used in models of multivariate system calibration without blowing the dataset out of proportion. Regardless of the selected modelling method, data support their use for the composition of predictive models for analytical purposes.