A chemometric study in the area of feasible solution of an acid–base titration of N-methyl-6-oxyquinolone

Multivariate curve resolution methods aim at recovering the underlying chemical components from spectroscopic data on chemical reaction systems. In most cases the spectra and concentration profiles of the pure components cannot be uniquely determined from the given spectral data. Instead continua of possible factors exist. This fact is known as rotational ambiguity. The sets of all possible pure component factors can be represented in the so-called area of feasible solutions (AFS). This paper presents an AFS study of the pure component reconstruction problem for a series of UV/Vis spectra taken from an acid–base titration of N-methyl-6-oxyquinolone. Additional information on the equilibrium concentration profiles for a varying acid concentration is taken from fluorescence measurements. On this basis chemometric duality arguments lead to the construction of a unique final solution.


Introduction
In chemistry and catalysis we are oen faced with the problem that the spectral signatures of reactants, intermediates and products overlap. A proper analysis of UV/Vis, uorescence or infrared spectra as well as deriving kinetics requires a clear model-independent decomposition method. Herein we present a general tool that is based on multivariate curve resolution methods in order to recover pure component spectra and simultaneously the concentration proles along the reaction coordinate. The concentration proles can depend on the time (progress of a reaction) or can depend on a changing temperature, acidity and so on. In most cases, a multi-component system cannot be uniquely determined from the given spectra. Mathematically, continua of possible factors exist, including the chemically correct solution. In our method, all possible component factors are represented in the so-called area of feasible solutions (AFS).
Exemplarily, we present an AFS study on the UV/Vis spectra of a recently published dye system, which has only been characterized by a two-component analysis. 1 The new approach goes much further, which is shown for the titration grades at an acid-base reaction of the dye. Now, systems including more than two components can be decomposed easily. All mathematically possible solutions are displayed in the AFS. With the additional information on the equilibrium concentration proles for a varying acid concentration taken from uorescence measurements, the AFS can be reduced to one distinct solution. For the given dye system the concentration proles have been achieved and the chemical reaction could be described properly.
The AFS approach provides a comfortable graphical user interface and any programming is superuous. For time dependent measurements reaction kinetics and thermodynamic properties could be derived. Concentration dependent studies such as titrations allow the determination of equilibrium constants, here the acid constant.

Multivariate curve resolution
Multivariate curve resolution (MCR) methods aim at extracting the contributions from the underlying sources to a given data set. An important application in chemometrics is the case that the spectroscopic observation of a chemical reaction system has yielded a matrix D˛R kÂn of absorption values on a time Â frequency grid. Therein k is the number of the measured spectra and n is the number of spectral channels of each spectrum. The problem is to nd the underlying spectra and concentration proles of the pure components. The Lambert-Beer law in matrix notation relates the pure component recovery problem to the nonnegative matrix factorization problem Proper nonnegative matrix factors C˛R kÂs and S˛R nÂs can be interpreted in a way that the s columns of C are the concentration proles of the s pure components and the columns of S are the associated pure component spectra, see e.g. ref. 2 and 3. If additional information on the reaction system is available, for example some pure component spectra or concentration proles, then this can simplify the construction of proper matrix factors C and S, see e.g. ref. 4 and the references therein.
For an overview on chemometric methods for solving the MCR problem see the monographs. 2,3 The MCR-ALS method 5,6 is very important. It works with the alternating least squares (ALS). Without claiming any completeness we would also like to mention the window factor analysis, 7 the evolving factor analysis 8-11 and the algorithms described in ref. 12 and 13. Here we focus on MCR methods which use a singular value decomposition (SVD) of the matrix D, 2,3,14 see Sec. 2.1. All these MCR methods suffer from the fact that the nonnegative matrix factorization problem (1) typically has continua of possible solutions (C, S). This fact is known as "rotational ambiguity" of the solution. [15][16][17][18][19] So-modeling (regularization) or even hard constraints (e.g. by kinetic models) are proper tools for reducing the rotational ambiguity, see e.g. ref. 2 and 3. In the best case these additional constraints are sufficiently restrictive so that a unique solution can be determined.
An approach for a systematic investigation of the rotational ambiguity is to get access to the set of all nonnegative factorizations in the form 1 for the given spectral data matrix D. A lowdimensional representation of this set is called the area of feasible solutions (AFS), see e.g. ref. 16,17,20,21. Within the AFS-setting it is possible to adjoin extra information on the matrix factors, for example by known concentration proles or spectra, in a very transparent way. By means of duality arguments, see ref. 4, 22-24 this additional information can be used in order to restrict the AFS and to visualize the mutual inuence of a given spectrum on the dual concentration proles and vice versa. 25-27

Contents and organization of the paper
In this paper we analyze series of spectra taken from an acidbase titration of the highly-sensitive dye N-methyl-6oxyquinolone as an acidometer in acetonitrile. First we analyze the ambiguity of the MCR solution. It turns out that considerable ambiguities exist for one spectrum and also for one prole of equilibrium concentrations in dependence on the acid concentration. The application of the so-called closure constraint, namely a mass balance, does not lead to a unique solution. Additional information (namely xed pure component spectra in combination with uorescence data) is used in order to construct the nal solution. The soware FACPACK 17,28 is used for all computations. The nal pure component decomposition is validated against the results of a rank annihilation analysis and a kinetic-model-based factorization; 29,30 see also the related rank-1 downdates. 31 The paper is organized as follows: Section 2 introduces SVDbased MCR techniques, the AFS approach for representing the rotational ambiguity and the related duality principles for the solution of the spectral recovery problem. The implementation of these methods in the FACPACK-soware is briey reviewed in Sec. 3. The chemometric analysis for an acid-base titration is contained in Sec. 4.

Chemometric pure component recovery
Next the AFS and related duality principles are shortly explained. The starting point is the SVD-based construction of factorizations D ¼ CS T .

SVD-based construction of pure component factorizations
From a mathematical point of view the factorization (1) is a nonnegative matrix factorization of D. Typically, the dimensions k and n of D are much greater than the number of the underlying chemical components s. For an appropriate value of s (typical values are s #7) the factors C and S are computed by means of a truncated SVD of the data matrix. 14 The truncated SVD has a noise-ltering effect and reads D ¼ USV T with orthogonal matrices U˛R kÂs and V˛R nÂs . Further, S˛R sÂs is a diagonal matrix with the singular values on its diagonal. According to 2,3,14,32 the factors C and S can be represented within the truncated bases of le and right singular vectors by means of a basis transformation matrix T˛R sÂs as follows Thus C ¼ UST À1 and S ¼ VT T are representations of the s(k + n) matrix elements of C and S by the much smaller number of s 2 matrix elements of T (and its inverse T À1 ). Sec. 2.3 shows how these degrees of freedom can be reduced from s 2 to (s À 1)s. For general T the matrices C and S are called abstract factors and can have large negative entries. The next step is to extract only the nonnegative, chemically relevant factors.

Computation of nonnegative factors
SVD-based MCR methods on the basis of eqn (2) aim at constructing a proper matrix T so that C and S are the chemically correct factors. The matrix T can be determined by solving a minimization problem for an objective function which is a weighted combination of penalty/regularization functions. [32][33][34][35] The scalar weight factors enable a proper balance between the different constraints and steer the factorization process. However, the resulting factors C and S sometimes depend on the constraint presetting of the MCR program. This is an unwanted effect. The minimization of an objective function is usually not sufficient in order to enforce only one, intentionally the chemically correct solution.
In contrast to aiming at a single solution which potentially is only an approximation, it is also possible to compute the sets of all possible nonnegative factors C and S with D ¼ CS T . Such approaches are band boundary computations 36,37 and the AFS computation.

The area of feasible solutions
The AFS is a low-dimensional representation of either all nonnegative spectra, namely the possible columns of S, or all nonnegative concentration proles, namely the columns of C, with D ¼ CS T . In other words, we consider all concentration proles and all spectra which can be extended to nonnegative matrices C and S in D ¼ CS T . 16,17,20,21,[38][39][40] These feasible columns of C or S with either k or n components can be described in a low-dimensional way by the rows of T. The reason for this is that the matrix elements of T in eqn (2) are the expansion coefficients of the spectra with respect to the basis of the right singular vectors. The associated concentration proles depend in a similar way on T À1 . Without loss of generality the desired nonnegative spectrum can be assumed to be located in the rst column of S ¼ VT T , cf. eqn (2). The associated expansion coefficients are given by the rst row of T with the form where W is an (s À 1) Â (s À 1) submatrix of T. The rst column of T equals the all-ones vector; see ref. 17 for the justication of this implicit scaling. On the basis of these arrangements the AFS for the spectral factor is dened as with rank ðTÞ ¼ s; The AFS comprises all (s À 1)-dimensional vectors x˛R sÀ1 which can be completed by a matrix W˛R (sÀ1)Â(sÀ1) so that T by eqn (3) is a regular matrix and C, S $0. Similarly, one can also dene the AFS M C which represents all feasible nonnegative columns of C, see ref. 39.
The AFS sets M S and M C for two-component systems can easily be constructed. 14,15,41 Several geometric and numerical algorithms are known to compute the AFS for (s ¼ 3)-component systems. 16,17,20,21,28,[42][43][44] For (s ¼ 4)-component systems the AFS computation is much more difficult and only few publications are available. 18,44 See also ref. 38 and 39 for an overview on the AFS topic.
Here three-component systems (s ¼ 3) are in the foci of interest. For this case the polygon ination method 17,28 is an effective, very fast and easy-to-control algorithm for AFS computations. In Sec. 3 the soware module complementarity & AFS (3 components) of FACPACK is used in order to construct the AFS. It is also used to reduce the ambiguity successively by involving additional system information, see Sec. 2.4.
Up to now we have rigorously assumed nonnegativity of D, C and S. However, experimental spectral data aer preprocessing steps, e.g. background subtraction, may contain small negative entries. The rank-s truncation of the data matrix by the SVD can be a further source of small negative entries. Then small negative entries should also be accepted in C and S as otherwise the product CS T cannot reproduce small negative entries of D.
To this end the polygon ination algorithm uses a control parameter 3 $0 on the acceptance of small negative entries of C and S. The feasibility check works as a lower bound on the relative magnitude of negative entries. If rank (T) ¼ s, then a violation of the inequalities and i ¼ 1, ., s is used for a penalization in the minimization process.

Duality underlying the factors C and S
The factorization problem D ¼ CS T is sometimes accompanied by a certain pre-knowledge of parts of the factors. For instance, a spectrum of a reactant or a reaction product might be known or it is possible to determine the concentration prole of a chemical component. A further case is that a frequency window is known in which some of the chemical components are absent. This information on the columns of C and/or S can be exploited in order to reduce the rotational ambiguity of the solution. The reason for this is that the constraints of nonnegativity of C and S and the equality D ¼ CS T imply restrictions on C if S is partially given and vice versa. These mutual constraints are related to the duality principle or complementarity theory. 4,[22][23][24]26 The underlying idea for the detailed analysis, which is explained in, ref. 4 is based on eqn (2) where C and S are coupled via the matrix T. If for example one pure component spectrum is given, then an associated row of T can be determined. Due to the equation T À1 T ¼ I s , a known row of T implies linear and affine constraints on the columns of T À1 . This yields according to C ¼ UST À1 in linear, respectively affine, constraints for the columns of C. An extreme case is that all but one spectra are given. Then the concentration prole of the remaining/complementary chemical component is uniquely determined except for positive scaling.

Reduction of the AFS by duality arguments
The linear and affine constraints due to known parts of C or S can be visualized in the AFS. [25][26][27] The reduced ambiguity expresses itself in a reduced size of the AFS aer taking into consideration the known parts of C or S. The reduction of the ambiguity is analyzed in this paper for the three-component system of an acid-base titration, see Sec. 4. For this system we demonstrate how a known spectrum of one of the components (this spectrum is represented by a certain point in the AFS) restricts by duality arguments the s À 1 concentration prole of the two remaining chemical components. In the AFS of the concentration factor these components are located in an (s À 2)-dimensional affine hyperplane. This hyperplane is (in a mathematical sense) dual to a given xed point in the spectral AFS. To be explicit, the dual affine hyperplane of a three-component system for the case of a given spectrum is a line in the concentrational AFS. Similar relations hold in the reversed direction. For an (s ¼ 4)-component system a given point in the spectral AFS is dual to a plane in the concentrational AFS and vice versa. See ref. 25 and 45 for more details on these relations and for mathematical formula underlying this duality of points and affine hyperplanes.

Data analysis with FACPACK
The chemometric analysis in Sec. 4 uses the soware package FACPACK which provides a convenient MatLab graphical user interface (GUI) for AFS-computations for two-, three-and fourcomponent systems. The soware is available on the FAC-PACK-homepage. 46 In particular we utilize the FACPACK module complementarity & AFS (3 components) that serves to construct a pure component decomposition on the basis of the two AFS-sets for the factors C and S. Known parts of the factors can be identied in the AFS. The program uses duality arguments, see the complementarity theorem, 4 in order to visualize the correlations of the factors C and S interactively. This approach reduces the rotational ambiguity of the nonnegative matrix factorization problem drastically.
The steps of the chemometric analysis are illustrated by Fig. 1 and 2 that show screen-shots of this program if applied to the UV/Vis-data of Sec. 4. First the spectral data is loaded to the program (see step 1 in Fig. 1). Certain control parameters can be set in an optional step (see step 2 in Fig. 1). The AFS sets are drawn aer checking the AFS box (see step 3). The chemometric pure component reconstruction is started by selecting the radio button rst (see step 4). Then the mouse pointer can be moved through the concentrational AFS. Simultaneously the concentration prole which belongs to the AFS-coordinates under the mouse pointer is drawn. Any solution can be locked by clicking the le mouse button. The selected solution in the concentrational AFS is linked to a straight line in the spectral AFS (by duality arguments). This blue straight line in Fig. 1 represents a signicant restriction on the feasible spectral proles.
Then Fig. 2 (upper screen shot) demonstrates how a second concentration prole is determined. Once again, duality arguments result in restrictions in the spectral AFS, see the green straight line. The point of intersection of these two straight lines uniquely determines the spectrum of one chemical component. Finally, the screen shot in the lower part of Fig. 2 illustrates how the pure component decomposition is completed aer determining a third concentration prole. The user has then the option to rene the decomposition by releasing any arbitrary concentration or spectral proles and to modify it until a complete optimal solution is found. Fig. 1 A screen-shot of the graphical user interface of the FACPACK-module complementarity & AFS (3 components). A first concentration profile is constructed. The example data set is explained in Sec. 4. The construction steps are explained in Sec. 3. The boundaries of the two AFSsets for C and S are drawn in black in the two lower plots. The user can move the mouse pointer through the AFS and the associated spectrum or concentration profile is shown simultaneously. By pushing the left mouse button, a certain solution can be fixed. The different scaling in the plot of M C compared to the AFS plots in Fig. 6-8 is explained by the fact that the matrix S is taken into account here, but is omitted in Fig. 6-8.  Fig. 1 these two screen-shots demonstrate the construction of the second (upper screen-shot) and of the third (lower screen-shot) concentration profile. The duality theory increasingly limits the feasible solutions, which means that the rotational ambiguity is reduced.
The FACPACK soware uses the polygon ination algorithm for AFS computations and provides all the chemometric soware tools within a conveniently usable graphical user interface. This includes interfaces for the data import, for an optional data preprocessing and the data export. Other AFS computation methods are the so-called Borgen plots 20,21 and the recent dual Borgen plot approach. 45,47 Alternatively, the rotational ambiguity underlying MCR factorizations can be illustrated in terms of the bands of feasible proles 36,37 and by using the MCR-Bands soware. The steps of our chemometric analysis can be applied in similar form to the sets of feasible bands.

Control parameter setting
The numerical AFS computation is controlled by several parameters, e.g. stopping criteria for the optimization procedure, the boundary precision, a bound on the sum of least squares of the objective function, the maximal number of cycles of the optimization and the maximal number of function evaluations. For the detailed description of these parameters see ref. 17. The program provides default values for all parameters which ensure in most cases a stable, precise and fast AFS computation. Finally, the parameter 3 in eqn (5) controls the size of acceptable negative entries of C and S and thus the size of the AFS. Increasing 3 results in an expansion of the AFS-sets. For all computations we used 3 ¼ 2 Â 10 À4 .

Chemometric analysis of an acidbase titration
Here we study a series of UV/Vis spectra of a titration of Nmethyl-6-oxyquinolone (MQz) in acetonitrile with the tri-uoromethanesulfonic superacid. The acid is denoted by HA. The series of spectra is plotted in Fig. 3. The AFS is constructed for the spectral factor and for the factor of equilibrium concentration proles in dependence on the acid concentration. Finally, a unique pure component factorization is constructed by involving information on known pure component spectra and uorescence measurements of the equilibrium concentrations. The addition of information for the two matrix factors C and S distinguishes the present approach from other works as. [25][26][27] See Sec. 4.4 for the details.  Fig. 3 shows the series of spectra in a 2D-and a 3D-plot.

Experiment and spectral data
The three dominant chemical components of this reaction system are the chemical indicator MQz, the dimer species [MQzHMQz] + , the protonated indicator MQc + as well as HA and A À . The latter two components in negligible extent contribute to the absorption in the analyzed wavelength interval. The reaction equations with kinetic constants read   The singular values and the singular vectors indicate a relatively large signal-tonoise ratio for the given spectra D. This is a good basis for a successful construction of the two AFS sets and also for exploiting the underlying duality of the factors C and S. The polygon ination method is applied with d ¼ 3 b ¼ 10 À4 and 3 ¼ 2 Â 10 À4 as upper bounds on the relative size of negative entries. The AFS-sets indicate a small ambiguity of the solution for the two components MQz and MQc + (in blue and red) in the spectral AFS since the area of the associated subsets of the AFS is very small. The subsets of the concentrational AFS which belong to the components MQz (blue) and [MQzHMQz] + (green) are also small. Thus the associated series of spectra and concentration proles only show a small variation. In other words the rotational ambiguity is of moderate magnitude. Only the pure component spectrum of [MQzHMQz] + and concentration prole of MQc + contain considerable ambiguities.

Bands of possible proles representing the ambiguity
The rotational ambiguity inherent to an AFS can also be represented by drawing the associated bands of feasible spectra and the band of feasible equilibrium concentration proles. This is done in Fig. 7. The colored crosses in the le two AFS plots mark positions for which the associated spectra or concentration proles are drawn. More than one point for one chemical component is considered in the spectral AFS of [MQzHMQz] + and in the concentrational AFS of MQc + .
The series of spectra and concentration proles are drawn in Fig. 7. The upper row of plots show the spectral AFS and their spectral bands. The color code for the AFS sets and the bands is as follows. Blue color is used for MQz, green for [MQzHMQz] + and red for MQc + . The subsets of the AFS-sets with the largest area, namely [MQzHMQz] + in the spectral AFS and MQc + in the concentrational AFS, are associated to the series of the feasible spectra (green) and concentration proles (red), see the centered column of Fig. 7.
The two plots in the centered column of Fig. 7 show the bands of the possible factors in a non-scaled form (as obtained by the FACPACK soware). Two spectra (MQz and MQc + ) and one concentration prole ([MQzHMQz] + ) are almost uniquely determined; the latter by duality. The equilibrium concentration prole of (MQz) has a very low rotational ambiguity. However, the spectrum of [MQzHMQz] + and the concentration prole of MQc + show a considerable ambiguity.
The two plots in the right column of Fig. 7 show the same proles aer an application of a scaling with respect to the socalled closure constraint, which is the mass balance underlying. 7 The scaling constants are computed in the sense of least-squares along the full acid concentration axis. This results in concentration values of MQc + equal to the initial value c 0 ¼ 9.84269 Â 10 À4 at the highest acid concentration. A side effect of this scaling is that an additional scaling ambiguity appears for the concentration prole of the dimer [MQzHMQz] + (green curves). In other words the prole of this component has been qualitatively determined, but not quantitatively. With the given information on the system this  remaining ambiguity cannot be broken up. For the related triples of concentration proles in the right lower plot of Fig. 7 the squared sum of errors has approximately the value 4.1 Â 10 À8 . Therein the index i runs through the 12 different values of the acid concentration for which the equilibrium concentrations of the three components MQz, [MQzHMQz] + and MQc + are to be determined.

Involvement of additional chemometric information
In order to attain a nal and unique pure component decomposition some additional information on the chemical reaction system is to be added. This is done in two steps: First the pure component spectrum of MQz is set to be equal to the rst measured spectrum D(1,:). The justication for this is that the concentration vector of the three chemical components for an initial acid concentration of zero equals (c 0 ,0,0). Furthermore, the last spectrum D(12,:) is set to the pure component spectrum of component MQc + . This xes two points in the spectral AFS. The underlying duality uniquely determines  This journal is © The Royal Society of Chemistry 2018 (up to scaling) the equilibrium concentration prole of the dimer [MQzHMQz] + , see the le column of plots in Fig. 8. As explained in Sec. 4.3 some ambiguity still remains.
The second step is that uorescence measurements make it possible to determine the equilibrium concentration proles of MQz (blue curve) and MQc + (red curve). Once again the duality of these known parts of the factor C to the factor S uniquely determines the spectrum of the dimer [MQzHMQz] + . This completes the pure component recovery. All results are shown in Fig. 8.
If perturbations are ignored, thenD is a rank-1 matrix which contains in its columns only multiples of the spectrum of the dimer [MQzHMQz] + . For experimental spectral data we must take into account noise and other perturbations. Thus a singular value decomposition ofD is applied. The le and the right singular vectors corresponding to the largest singular value are the desired equilibrium concentration prole and spectrum of [MQzHMQz] + . The proles are plotted in Fig. 9 by dashed lines. The results of the AFS-based approach are plotted by solid lines. Relevant difference must be stated in particular for the spectrum of the dimer [MQzHMQz] + which attains close to 500 nm a minimal negative component of À1.7 Â 10 À2 by rank annihilation. The AFS-based approach prevents negative entries of such a magnitude. There are also differences between the equilibrium concentration proles of the two methods.
In order to judge which of the approaches provides the better results, we have tted the kinetic model eqn (6) to the computed pure component factors each for the two computational approaches. Such kinetic models are well known to be stringent decision makers. 40 For these computations we have set k À1 ¼ k À2 ¼ 0 as the triuoromethanesulfonic superacid does not let expect a notable back reaction. The results are plotted in Fig. 10. They clearly indicate that the AFS-based decomposition provides the better results. This conclusion is supported by the following relative error values

Conclusion
The ambiguity of the solutions of the pure component factorization problem is a fundamental complication, which is oen hidden by the fact that MCR soware packages produce only one solution. However, this single solution must be considered to be only a more or less reliable approximation of the true solution. In this study we have shown that a unique pure component decomposition can be gained for the given three

Conflicts of interest
There are no conicts to declare.  This journal is © The Royal Society of Chemistry 2018