Open Access Article
Kennedy Putra
Kusumo
a,
Kamal
Kuriyan
a,
Shankarraman
Vaidyaraman
b,
Salvador
García Muñoz
b,
Nilay
Shah
a and
Benoît
Chachuat
*a
aThe Sargent Centre for Process Systems Engineering, Department of Chemical Engineering, Imperial College London, London SW7 2AZ, UK. E-mail: b.chachuat@imperial.ac.uk
bSynthetic Molecule Design and Development, Lilly Research Laboratories, Eli Lilly & Company, Indianapolis, IN 46285, USA
First published on 22nd July 2022
The predictive capability of any mathematical model is intertwined with the quality of experimental data collected for its calibration. Model-based design of experiments helps compute maximally informative campaigns for model calibration. But in early stages of model development it is crucial to account for model uncertainties to mitigate the risk of uninformative or infeasible experiments. This article presents a new method to design optimal experimental campaigns subject to hard constraints under uncertainty, alongside a tractable computational framework. This computational framework involves two stages, whereby the feasible experimental space is sampled using a probabilistic approach in the first stage, and a continuous-effort optimal experiment design is determined by searching over the sampled feasible space in the second stage. The tractability of this methodology is demonstrated on a case study involving the exothermic esterification of priopionic anhydride with significant risk of thermal runaway during experimentation. An implementation is made freely available based on the Python packages DEUS and Pydex.
This paper focuses on the latter objective. The term model-based experimental design (MBDoE) was coined to refer to techniques that focus on nonlinear process models and it has found numerous applications in chemical engineering and process systems engineering; see the seminal survey by Franceschini and Macchietto.10 A particular emphasis in this literature has been on gradient-based approaches to MBDoE and the design of dynamic experiments. By contrast, the concepts of experimental supports and efforts, which are ubiquitous in the statistical experiment design literature,8,9,11,12 have been somewhat foregone in engineering applications and only recently have effort-based approach to MBDoE been investigated.13,14
A primary challenge with nonlinear process models is that optimal campaigns become dependent on the unknown model parameter values that the experiments are meant to help estimate in the first place. The basic technique entails a local design approach, whereby the model is linearized around a nominal model parameter value. The challenge of inappropriate nominal values leading to uninformative experiments can be counteracted to a certain extent with robust experimental designs.15–22 A second challenge, which remains to be fully addressed, is that of infeasible experiments. To ensure the safety and economic feasibility of experimental runs, many systems must comply with operational constraints. Inappropriate nominal model parameter values exacerbate the risk of designing infeasible experiments under such constraints.
The methodology developed through this paper aims at the construction of experimental campaigns for nonlinear processes subject to hard operational constraints. Both its importance and relevance have been highlighted in the process systems engineering literature. For instance, the position paper by Bonvin et al.23 stresses the need for satisfying safety constraints when applying experimental design techniques to pilot- and full-scale processing plants, or in developing physiological models for personalized health care. To generalize, this problem is of paramount importance whenever data are collected from a constraint-sensitive system. One class of relevant problems is model maintenance experiments, which are run to maintain, update or re-calibrate a model over time; e.g., to capture catalyst degradation,24,25 fouling,26,27 aging of heart valves,28etc. Another class is model addition experiments, which are run to calibrate extra model components introduced to account for previously unaccounted phenomena; e.g., with applications in troubleshooting, retrofitting, or decommissioning.29–31 These additional data, taken directly from the full-scale processing plant, are required to capture atypical process behaviour before drawing any conclusions.
Precursor work by Galvanin et al.32 involved backing off of the active process constraints in a maximally-informative experiment to ensure its feasibility against process disturbances and modeling uncertainties. Mesbah and Streif33 formulated a robust OED problem with chance constraints, for which they applied the Cantelli–Chebyshev inequality34 to obtain a deterministic surrogate of the chance constraints in combination with polynomial chaos expansion for propagating the uncertainty to the process dependent variables. The formulation by Telen et al.35 also maximizes the expected value of the information criterion subject to probabilistic constraints by applying the sigma-point method36 to approximate the uncertainty propagation. More recently, Petsagkourakis and Galvanin37 used Gaussian Process (GP) surrogates to establish a safe experimental space and to approximate the expected value and variance of the experimental information criterion.
Due to the computational complexity of considering uncertainty and hard constraints when designing experiments, the aforementioned contributions have focused on the problem of designing a single experimental run, as opposed to experimental campaigns comprising multiple parallel runs. And despite this restricted view, the resulting optimization problems remain computationally challenging, especially for systems with large uncertainty. The standard alphabetic information criteria38 lead to nonconvex optimization problems for many practically relevant process models, whose solution with gradient-based techniques39 can result in suboptimality. Numerical failure caused by singular information matrices is also commonplace.
The main objective of this paper is to propose a computational methodology for ensuring feasibility of constrained experimental campaigns under parametric model uncertainty, which relies on the synergistic and tractable combination of sampling and convex optimization techniques, and proceeds in two stages: (i) the discretization of the feasible experimental space using an adaptive sampling method; followed by (ii) the design of feasible experimental campaigns using a convex, continuous-effort OED formulation that searches over the discretized space. Section 2 starts with a formalization of the OED problems subject to process constraints, adopting the Bayesian paradigm to establish the restricted experimental space. An illustrative example is presented that involves a response surface model before the two-stage computational methodology is described in section 3, including numerical results of the illustrative example to highlight key aspects of the solution framework. This is complemented by a simulated case study in section 4 involving the design of a dynamic experimental campaign for the exothermic esterification of propionic anhydride and butan-2-ol, with risks of thermal runaway during experimentation. Another simulated case study for re-calibration of a model of a jacketed stirred tank reactor that is undergoing a doubling in throughput is provided as part of the ESI.†
and ny measured responses
,y = η(θ, x) + ε. | (1) |
in the mathematical model η are typically inferred from noisy experimental data using frequentist or Bayesian inference and are thus inherently uncertain. For simplicity, the measurements are assumed to be independent throughout, and the measurement error
is assumed to have zero mean
(ε) = 0 with uncorrelated and homoscedastic covariance Σy. Although it is given in closed form in (1), note that the model η may also be defined implicitly via a set of nonlinear algebraic or differential equations without the loss of generality.
| f(x, y) ≤ 0. | (2) |
| g(θ, x) := f(x, η(θ, x)) ≤ 0. | (3) |
One can define the restricted experimental space in various ways. Ignoring the model uncertainty first, a nominal restricted experimental space is readily defined as
0 := {x ∈ : g(θ0, x) ≤ 0}, | (4) |
To mitigate the risk of constraint violations, one may adopt the Bayesian paradigm and describe the model parameter uncertainty with a probability distribution π(θ); for instance, obtained as the posterior distribution from a Bayesian inference problem with an appropriate likelihood function and prior distribution. In this way, the feasibility probability of the constraints can be predicted as
![]() | (5) |
α := {x ∈ : [g(θ, x) ≤ 0| π(θ)] ≥ α}. | (6) |
Another definition for the restricted space can adopt a set-based paradigm, which considers a bounded set Θ representing all possible values θ may take. The set-based restricted experimental space below is the counterpart to the probabilistic set (6),
Θ := {x ∈ : g(θ, x) ≤ 0, ∀θ ∈ Θ}. | (7) |
Although both provide valid definitions for a restricted experimental space, our focus herein is on the probabilistic restricted space
α. In particular, the definition of
α in (6) assumes that any structural mismatch present in the model η is captured through the probability distribution π of the model parameters θ. Note that other methods directly account for the presence of structural model mismatch, for instance by further restricting the experimental space to subdomains where the model is deemed reliable,46,47 but these are outside the scope of this work.
![]() | (8) |
…,
xNc} is called the support of the experimental design ξ, denoted by supp(ξ). In an actual campaign a support xi must not have an effort of zero; that is, a support xi ∈ supp(ξ) is a unique run within ξ with non-zero effort pi > 0 by convention.
If ri is the number of times xi is repeated, then the effort pi = ri/Nt is a quantized variable that may only take values of positive multiples of the fraction 1/Nt. Designs where the pi's are quantized are called exact designs and require Nt to be specified. To emphasize this, exact designs are denoted with a subscript ξNt. In contrast, a continuous design allows pi to vary continuously in the standard simplex ∑ipi = 1 with pi > 0, ∀i. Continuous designs are independent of Nt and can be viewed as exact designs with Nt → ∞. The distinction between continuous and exact thereby vanishes as Nt gets larger.
An optimal design ξ* is a design that maximizes some scalar information criterion ϕ,
![]() | (9) |
+, the experimental controls
, and the corresponding efforts p1, …,pNc ∈ (0,
1]. Note that (9) is applicable to both continuous and exact designs.
Herein, the focus is on the problem of computing experimental campaigns which are optimal for model calibration. The information criterion ϕ is often expressed as a function of the Fisher information matrix (FIM),
given by8
![]() | (10) |
![]() | (11) |
the regressor matrix that is specific to the model at hand.
For models that are linear in the parameters θ, the regressor matrix F becomes independent of θ and is readily available through rearrangement of the model eqn (1) into its standard form,
| y = F(x)θ + ε. | (12) |
![]() | (13) |
ϕ(ξ) := Φ(M(ξ, θ0)), | (14) |
| Optimality criterion | Φ(M) | |
|---|---|---|
| Standard form | Eigen-form | |
| Determinant | log(det(M)) | log(Πkλk) |
| Average or trAce | −Tr(M−1) | −∑kλk−1 |
| Extreme | λ min(M) | minkλk |
Analogous to defining the probabilistic restricted experimental space
α, the average design (AD) approach makes use of a prior probability distribution π(θ) of the uncertain model parameters when designing experiments.22 It maximizes the expected value of information content over the uncertain scenarios,
![]() | (15) |
Notice that robust experimental design techniques such as the AD approach still relies on linearization, as the experimental information at a given θ is computed through the linearized regressor matrix F(·,θ) in (13). Other techniques that do not rely on linearization exist, such as the Bayesian experimental design method,16 but are outside the scope of this work.
![]() | (16) |
s.t. [g(θ, x) ≤ 0|π(θ)] ≥ α, ∀x ∈ supp(ξ). | (17) |
Depending on the distribution π(θ) and the confidence level α, the set
α could be empty. This would indicate that the process model is too uncertain, and there is simply not enough information to ascertain a safe experiment at the desired confidence level α using that model. The options available are to lower the confidence level α, modify the experimental setup itself, and/or gather extra information about the system through other means before designing experimental runs.
: = [−1, 1]2 and a single response y, predicted by the following response-surface model| y = θ1 + θ2x1 + θ3x2 + θ4x1x2 + θ5x12 + θ6x22 + ε. | (18) |
6. The measurement error ε is assumed to be Gaussian distributed with zero mean
(ε) = 0 and homoscedastic variance Σy = 1.
The task at hand is to compute an optimally informative campaign ξ* to estimate θ in (18), in the D-optimal sense (Table 1). Since (18) is linear with respect to θ, notice that the LD and AD approaches are equivalent.
A complicating factor is the presence of a safety concern whenever the response y goes outside a known range of [1.85, 3], thereby imposing hard operational constraints. The model (18) is used to predict the response y, allowing for the probabilistic restricted experimental space to be defined as‡
α: = {x ∈ [−1, 1]2: [1.85 ≤ F(x)θ ≤ 3.00| (μθ, Σθ)] ≥ α}, | (19) |
The resulting constrained OED problem is given by
![]() | (20) |
![]() | (21) |
. Although the model is linear in the uncertain parameters, computing a globally optimal solution to the constrained OED problem (20) and (21) is challenging. And this complexity will be exacerbated in the case of more complex, possibly implicit and dynamic models as well as higher-dimensional experimental spaces. In response to this, the following section introduces a computationally-tractable methodology, based on sampling and convex optimization for approximating the solution to the general problem (16) and (17).
![]() | ||
| Fig. 1 Illustration of the two-step computational methodology and information flow within the model calibration cycle. | ||
Leveraging the process model and π(θ), the first stage focuses on establishing the probabilistic restricted experimental space
α at a given reliability value α. This is achieved through a suitable sampling algorithm that generates a sample of Ns experimental points that satisfy the probabilistic constraints (17),
α := {x1, …, xNs} ⊆ α, | (22) |
α is indeed empty. One drawback is the loss of continuum within the restricted experimental space, which may lead to sub-optimally informative campaigns. As with any sampling methodology, another drawback is attributed to the curse of dimensionality with the number of experimental controls nx.
The key component for an efficient stage 1, therefore, is the sampling algorithm that is used to generate the experimental samples
α. To mitigate the loss of continuum and the chances that large parts of
α are missed by the discretization, the Ns samples in
α should be as uniformly distributed as possible inside
α. An efficient sampler will furthermore have a high acceptance rate, in order to minimize the total time spent on drawing samples outside of
α, and be highly parallelizable to enable reduction in wall-clock time when needed.
The samples
α are passed on to the second stage, where the focus is on designing the experimental campaign within
α, under the crucial assumption that
α is sufficiently well described by
α. This stage comprises 3 steps:
(i) The sensitivity analysis step involves computation of the atomic information matrices A(xi, ·) of all samples in
α, and thus the corresponding regressor matrices F(xi, ·);
(ii) The optimization step solves a convex optimization problem to determine an optimal continuous experimental design ξ* as a set of optimal efforts
for each sample xi ∈
α; and
(iii) The rounding step apportions the fractional efforts
to non-negative multiples of 1/Nt, given the total number Nt of experimental runs.
In practice, the rounded efforts would be used to run the experimental campaign. The experimenter would get to choose the frequency of model parameter updates and whether runs are conducted sequentially or in parallel. The optional updates are depicted with the dotted lines connecting the experimental campaign and the model parameter uncertainty in Fig. 1. In principle, as soon as a single measurement is taken, the probability distribution π(θ) could be updated, e.g., through Bayesian inference. This would update the restricted space
α in turn, and thus the experimental design. Any parameter update in a nonlinear model will affect stage 2 regardless of whether or not stage 1 was affected because the atomic information matrix A is dependent on θ. In other words, a different experimental campaign design may be obtained even when
α is unaffected by the parameter update. In the case of a linear model, by contrast, the atomic information matrix A is independent of θ. A different experimental campaign, therefore, may only be obtained when the experimental samples
α have changed.
An ideal model calibration cycle involves conducting parameter updates as frequently as possible thus minimizing the chances of uninformative or infeasible experimental campaigns. However, this frequency is limited by the computational cost for completing an iteration of parameter update and experimental design within the restricted space, on top of actually running the physical experiments. In practice, a termination condition for this iteration could entail checking if the variances in model parameters or predicted responses are within acceptable levels.
Following an outline of the methodology and a discussion on how it fits into the overall model calibration cycle, we describe and discuss both stages in more detail next.
α defined in (6). We consider an extension of nested sampling (NS) for feasible space characterization, as described in Kusumo et al.40 and its improvements in Kusumo et al.49 An excellent feature of NS for our methodology is the ability to specify a minimal number of samples to draw from
α as stopping criterion. This allows for easier control over the complexity of the following solution steps. Furthermore, NS is especially designed to uniformly draw samples from a target set with a high acceptance rate, which meets the desiderata of an ideal sampler for stage 1 as discussed before. Only a brief account of NS is given next for the sake of brevity, while interested readers are referred to previous publications40,49 for further details on this approach.
NS for feasible space characterization40 is an adaptive sampling technique that evolves a set of live points through regions of increasing feasibility probability, until a stopping criterion is met. The number of live points is chosen as Ns here. The sampling starts with a uniform sample of these Ns live points within the unrestricted experimental space
. The feasibility probability of each live point is approximated by discretizing the probability distribution π(θ) into a set
π comprising Nπ parameter scenarios θj,
![]() | (23) |
[·] is the indicator function with
[x] = 1 when xk ≤ 0,∀k and
[x] = 0 otherwise. Since one can always transform a set of scenarios with unequal weights into an equally weighted one, we shall assume that samples are equally-weighted thereafter. To arrive at a truly Bayesian approach
π may be chosen as the sampled posterior returned by a Bayesian estimation algorithm on some initial experimental data using an appropriate prior. Nevertheless, NS is applicable regardless of how
π was generated.
During the NS iterations, the live points are replaced by new points. Np proposals are generated per iteration, and their feasibility probabilities are again estimated using (23). Each proposal replaces the current live point with the lowest feasibility probability if they have a higher feasibility probability, otherwise they are rejected. Rejected replacements are called ghost points, and replaced live points are called dead points. The effectiveness of NS hinges on the chosen strategy to propose these replacements. The strategy used here samples within an enlarged ellipsoid around the current live points, as first proposed by Mukherjee et al.50 The enlargement factor of the ellipsoids decreases over the iterations according to two user-defined tuning parameters, the initial enlargement factor and the shrinking rate of this factor. Replacements are done one at a time within a single iteration, following the order that the points are generated. It is possible, therefore, that a new live point is replaced by another one within the same iteration. At the end of an iteration, two stopping criteria are checked: the primary criterion checks if all live points have feasibility probabilities higher than the target reliability value α; a secondary criterion checks the maximum number of iterations, without which the sampling technique could never terminate when
α is empty. When the primary stopping criterion is met, the final collection of live points are (at least) Ns samples from
α. The follow-up work by Kusumo et al.49 implements a vectorized function evaluation and two enhancements of this algorithm: (i) increasing the number of live points as the algorithm progresses closer to the target reliability; and (ii) warm-starting the initial live points by conducting a nominal search first.
α obtained from stage 1 and proceeds in three steps, which are further discussed below. This dependence of stage 2 on stage 1 highlights the importance of an accurate characterization of the restricted space, and in turn the need for selecting an accurate probability distribution π. It is worth re-emphasizing the benefits of adopting a Bayesian approach in this context, where π corresponds to the posterior distribution from an inference problem.
α passed on from stage 1, and possibly for each uncertainty realization in
π in the AD approach (15). These information matrices are computed out of the sensitivities of the response model η with respect to the model parameters θ, as given in (13). In principle, the analysis can start as soon as a first sample is drawn from the restricted space
α. However, for ease of conveying computational complexity and run-times, we do not use this enhancement herein. The atomic information matrices are furthermore completely independent from one another, allowing them to be computed entirely in parallel.
For process models that are linear with respect to the model parameters and given in the standard form (12), computation of each atomic information matrix (11) is cheap, requiring a single matrix multiplication. For nonlinear models available in explicit form (1), one can apply either symbolic differentiation or automatic differentiation51,52 to compute the linearized regressor matrices (13). For models given in implicit form, such as systems of differential-algebraic equations (DAEs), a finite-difference scheme may be used; for instance, simple forward or central differentiation or Richardson extrapolation53,54 for higher accuracy. Although readily implemented, this approach can become computationally prohibitive for large process models. Instead, methods and software for sensitivity analysis of DAE models may be used. They have reached a stage of maturity where both forward and adjoint sensitivities can be computed reliably and efficiently.55–58 They rely on numerical integration techniques tailored to the special structure of the sensitivity system, such as staggered or simultaneous corrector methods for forward-in-time sensitivity integration and state interpolation and check-pointing for backward-in-time adjoint integration, and can be conveniently combined with automatic differentiation tools.52
α passed on from stage 1. The optimization problem searches over the experimental efforts pi associated with each experimental point xi ∈
α and its construction relies on the atomic matrices computed in step 2.1. For instance, the local design (LD) problem can be defined as![]() | (24) |
![]() | (25) |
![]() | (26) |
![]() | (27) |
π.
Both formulations take advantage of the available discretization
α of the restricted experimental space to reduce the constrained, typically nonconvex, optimization problem (16) and (17) into a more tractable optimization problem. In particular, the chance constraints (17) are always satisfied by construction (up to the error introduced by the uncertainty discretization
π). Both cost functions (24) and (26) are furthermore convex whenever the selected criterion Φ is concave—a property shared by the classical A, D and E criteria listed in Table 1. Convexity eliminates potential issues with local optima during optimization and implies quicker solutions when modern convex optimization solvers are used. Another feature is that, while (24)–(27) scale linearly with the number Ns of samples in the restricted experimental space
α, they are independent of the total number of experimental runs Nt, thereby making the formulation tractable for designing large experimental campaigns.
An important difference with the general OED problem (16) and (17) is that the optimal efforts
obtained from (24) and (25) or (26) and (27) are allowed to take a value of zero. But the support of the optimal design ξ* is readily recovered after solving the latter, as the collection of experimental samples in
α with a nonzero effort,
![]() | (28) |
Regarding initialization, an effective and generic initial guess that seeks to avoid a singular FIM consists of allocating equal efforts to all candidates, pi = 1/Ns, ∀i. Note that this approach is akin to conducting a practical identifiability (or estimability) analysis59 with exactly one repetition of all Ns experimental candidates: obtaining a regular FIM with these initial efforts provides a confirmation that the model is practically identifiable, while a singular FIM suggests a lack of practical identifiability with the current discretization of the experimental space. In this latter scenario, either fixing the values of certain model parameters or adding more experiment candidates may help recover practical identifiability.
The focus herein is on the efficient rounding method, defined in terms of the following maximum likelihood ratio.60 Given two experimental designs (whether exact or continuous) ξ1 and ξ2 such that supp(ξ1) ⊆ supp(ξ2) and with respective efforts p1 and p2, the maximum likelihood ratio of ξ1 relative to ξ2 is computed as
![]() | (29) |
| ϕ(ξ1) ≥ εξ1/ξ2ϕ(ξ2), | (30) |
![]() | (31) |
When used to indicate rounding quality, ξ1 and ξ2 take the role of the rounded design ξNt and the continuous design ξ*, respectively. The efficient rounding method guarantees that the rounded design ξNt has a criterion value that is at least
times the criterion value of the optimal continuous design ξ*. Another property of this method is that an efficiently rounded design for Nt + 1 runs corresponds to the rounded design ξNt for Nt runs with one extra run assigned to one of the support points of ξ*. In that sense, ξNt is necessarily a subset of ξNt+1. Regarding the actual rounding efficiency, the closer
to 1 the closer the information content of the rounded design ξNt+1 to that of the optimal continuous design ξ*. But the actual rounding efficiency remains a relative measure of the information content, not an absolute one. Although a rounded campaign ξNt+1 will always contain more information than ξNt when using the efficient rounding method, nothing can be said as to the relative magnitude of
compared to
.
A limitation of the efficient rounding method is that it may only be meaningful when the continuous design ξ* has a number of supports Nc ≤ Nt; otherwise,
takes a value of zero. When the number of supports Nc > Nt, we may apply the greatest effort rounding procedure in place of the efficient rounding, which involves assigning an equal effort 1/Nt to the Nt number of supports in ξ* with the largest efforts and an effort of zero to the rest.
0.85; and (ii) the number of Monte Carlo scenarios Nπ to discretize the model uncertainty
(μθ, Σθ). A simple rule-of-thumb is to start small for both parameters and gradually increase them. A discussion of this process for the illustrative example is reported in the ESI.† The computational statistics with the settings described below are reported in Table 3.
The first stage generates a set of experimental control points from within
0.85. Running DEUS with Nπ = 1000 uncertainty scenarios and drawing Np = 10 proposals at each nested sampling iteration leads to a sampled restricted space
α with Ns = 3885 points from
0.85, illustrated as the blue circles in Fig. 2. Believing in the model (18), therefore, suggests that
0.85 takes the shape of a 2-dimensional torus. This is a consequence of the model (18) describing a paraboloid and the process constraints 1.85 ≤ y ≤3 cutting this paraboloid at two different levels. It is also worth re-emphasizing that although the model is linear in the parameters θ, the corresponding parametric uncertainty still affects the restricted space via the chance constraints (21).
![]() | ||
| Fig. 2 Continuous D-optimal design for the response-surface model (18) using 3885 experimental candidates drawn from the restricted experimental space 1.85 ≤ y ≤ 3 at reliability level 0.85. Translucent blue circles represent the drawn candidates. The red hollow circles represent the effort allocated to the experimental candidates, their radii being proportional to the corresponding optimal efforts. The solid blue line illustrates the boundaries of the analytical feasible region at reliability level 0.85. | ||
The second stage computes the optimal experimental design within the restricted space
0.85, further divided into three steps. The first step computes the 6-by-6 atomic information matrices A for all Ns experimental samples generated in stage 1, according to (11). Given the simplicity of the model, the regressor matrices F were provided analytically.
The second step of stage 2 determines the optimal efforts of each experimental candidate in order to maximize the D-optimal criterion, by solving the optimization problem (26) and (27). The resulting D-optimal efforts are reported in Table 2 and represented as red hollow circles in Fig. 2. It is worth reemphasizing that no explicit constraint is put on the efforts to limit the number of non-zero efforts in the optimal solution that is recovered as the support set (28). This is a consequence of Caratheodory's theorem, which when applied to experimental designs states that any optimal information matrix M can always be written as a convex combination of at most d = nθ(nθ + 1)/2 = 21 atomic information matrices; see theorems 2.1 and 2.2 of Fedorov and Leonov.9 The constrained D-optimal campaign comprises 14 support points and shows a general pattern where all support points lie on the inner and outer boundaries defined by the constraints y ≤ 3 and y ≥ 1.85, respectively. A larger share of the total experimental effort is allocated to the outer boundary with 9 supports, while the remaining 5 supports lie on the inner boundary.
| x 1 | x 2 | p*a (%) | Total number of runs, Nt | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 6 | 7 | 8 | 9 | 10 | 12 | 14 | 20 | 30 | |||
a Experiment candidates with efforts below a minimum threshold of 10−4 are reassigned a zero effort and the remaining non-zero efforts are re-normalized so they add up to one.
|
|||||||||||
| −0.485 | −0.437 | 1.1 | — | — | — | — | — | — | 1 | 1 | 1 |
| 0.129 | −0.524 | 1.4 | — | — | — | — | — | 1 | 1 | 1 | 1 |
| 0.138 | −0.841 | 8.5 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 2 |
| −0.113 | −0.624 | 0.6 | — | — | — | — | — | — | 1 | 1 | 1 |
| −0.282 | −0.823 | 8.3 | — | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 |
| −0.764 | 0.311 | 12.9 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 3 |
| −0.849 | −0.192 | 5.4 | — | — | — | — | 1 | 1 | 1 | 1 | 2 |
| 0.481 | −0.186 | 12.4 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 3 |
| −0.719 | −0.486 | 9.8 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 3 |
| −0.296 | 0.497 | 6.0 | — | — | — | 1 | 1 | 1 | 1 | 1 | 2 |
| −0.401 | 0.188 | 4.7 | — | — | — | — | — | 1 | 1 | 1 | 2 |
| 0.414 | −0.645 | 6.8 | — | — | 1 | 1 | 1 | 1 | 1 | 1 | 2 |
| 0.092 | −0.029 | 11.1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 3 |
| 0.042 | −0.388 | 11.0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 3 |
| Efficiency bound (%) | — | — | — | — | — | — | 50.24 | 70.34 | 81.72 | ||
| Actual efficiency (%) | 98.6 | 95.7 | 95.6 | 95.6 | 93.5 | 95.2 | 93.3 | 98.3 | 99.8 | ||
The third step of stage 2 apportions the continuous design to the desired number of experimental runs Nt. Because the response-surface model (18) has 6 parameters, the theoretical minimum for an invertible FIM is also Nt = 6 experimental runs. Moreover, the continuous D-optimal campaign (Table 2) has 14 support points so the efficient rounding method60 cannot be applied if Nt < 14, in which case the greatest effort rounding method is applied. Table 2 summarizes the rounded D-optimal designs for between 6–30 runs. The bottom two rows of Table 2 report the efficiency bound and the actual efficiency. Overall, the actual efficiencies
are consistently above 90%, even for a low number of runs Nt. This indicates that the exact design ξNt retains a majority of the information of the optimal continuous campaign ξ*, despite the low values of corresponding efficiency bounds
. Notice also that κξ12/ξ* < κξ6/ξ* and κξ14/ξ* < κξ7/ξ*, an indication that the rounding scheme applied is suboptimal for Nt = 12 or 14. Indeed, more informative exact designs for ξ12 and ξ14 could be obtained by doubling the efforts in ξ6 and ξ7, respectively. The same does not occur for either Nt = 20 or 30 since κξ20/ξ* > κξ10/ξ* and κξ30/ξ* > κξ10/ξ*.
![]() | (32) |
Experiments start with an initial charge of the anhydride (A), containing N0A moles with volume V0A at the given reaction conditions. The alcohol (B) is fed into the reactor at a time-varying volumetric feed rate u (L s−1) and concentration CinB (mol L−1). The anhydride is acidified prior to the initial charge, containing 0.8% of sulfuric acid (relative to the mass of A). Both the initial charge of A and the feed of B are pre-heated to the set reaction temperature before starting the experiments. The esterification reaction is initiated by introducing the alcohol into the reactor and runs for a given batch time τ.
The candidate mechanistic model of the calorimeter posits that the reaction is first-order in each reactant,69
![]() | (33) |
![]() | (34) |
![]() | (35) |
| NA(t) = N0A[1 − ξA(t)] | (36) |
| NB(t) = CinB[V(t) − V0A] − N0AξA(t) | (37) |
r(t) = (−ΔHr)r(t)V(t), | (38) |
r (J s−1) the thermal power converted from the chemical reaction, T (K) the reaction temperature, ΔHr (J mol−1) the molar enthalpy of reaction, k° (L, mol, s) the pre-exponential factor, Ea (J mol−1) the activation energy, α and β the reaction orders for A and B, respectively, and R = 8.314 J K−1 mol−1 the universal gas constant.
To quantify the risk of a thermal runaway should the temperature control system fail, a safety constraint is imposed on the cooling failure temperature Tcf—also known as the maximum temperature of synthesis reaction (MTSR). This constraint is expressed in terms of the amount of accumulated non-converted reactant at failure of the cooling system, in order to ensure that a maximum cooling failure temperature Tmax = 405 K is never exceeded even under adiabatic conditions,
![]() | (39) |
The goal of the case study is to design maximally-informative experiments using the bench-scale calorimeter to precisely estimate the thermokinetic parameters Ea, k°, α, and β while guaranteeing a safe operation. The prior knowledge on the thermokinetic parameters assumes independent and Gaussian distributed uncertainties, given by Ea ∼
(8.25 ± 0.41) × 104 J mol−1, and k° ∼
(9.72 ± 2.92) × 107 (L, mol, s). These values are based on the model calibration results reported in Ubrich et al.,69 herein with a 2- and 4-fold increase in the uncertainty of Ea and k°, respectively, to add to the conservatism. Additionally, discrete uncertainties are introduced on both the reaction orders α and β, which assume a 75% probability of taking a value of 1 and a 25% probability of taking a value of 2.
The calorimeter and the installed spectroscopic probes provide measurements, at regular 50 minute intervals, of the thermal power
r and the molar conversion ξA.67 The measurements corresponding to each output are assumed to be independently and identically distributed and follow Gaussian distributions with a standard deviation of 0.01 for the conversion ξA, and 1 J s−1 for the thermal power
r.
The reaction temperature T and feedrate u(t) are selected as the experimental controls. In particular, the time-varying feedrate u(t) is parameterized into a piecewise-constant profile on 4 stages,
![]() | (40) |
000 s (8 h 20 min).
The resulting samples are presented in a corner plot in Fig. 3. Of the 1793 candidate experiments drawn by nested sampling, 1080 experiments belong to the target restricted experimental space at 95% reliability (red points), another 599 experiments have a feasibility probability in the range 80–95% (yellow points), 91 a feasibility probability in the range 70–80% (blue points), and the remaining 23 a feasibility probability below 70% (magenta points), with the lowest feasibility probability being 44%. Clearly, a much larger number of samples are drawn inside or near the target level (c. 60% acceptance rate), thereby confirming the suitability of nested sampling in the proposed methodology.
Despite the significant overlap between various coloured samples, meaningful insights can be drawn from Fig. 3. First of all, notice that a large part of the experiment design space is empty (top 3 rows and columns), particularly areas of high alcohol feed rate throughout the batch where it would be overly risky to operate in the event of a cooling system failure. This aligns with the intuition that a lower alcohol feedrate lowers the risk of thermal runaway. Consequently, the red samples occupy a smaller subspace in the lower feedrate range, u(t) ≤ 1.2 × 10−5 L s−1. Regarding the effect of temperature (bottom row), the cluster of red samples are relatively unaffected by the reaction temperature. This indicates that within the relatively narrow temperature range considered, the reaction temperature plays a smaller role than the alcohol feedrate on the risk of thermal runaway. Notice also that a majority of the red samples are found within the higher temperature range, which may seem counter-intuitive: a higher reaction temperature means a faster rate of reaction and a lower amount NA throughout the batch, thus decreasing the second term of the MTSR to an extend that overpowers the effect of the increase in reaction temperature T, and ultimately permitting a wider range of feed rates to satisfy the safety constraint (39).
The yellow samples occupy a similar, albeit larger, subspace than the red samples. Intuitively, if a lower reliability level were selected experiments with a larger alcohol feedrate would be accepted, leading to more informative, yet also riskier, experimental campaigns. Despite their small numbers, the blue and magenta samples indicate even riskier combinations of feedrates and temperatures. Located away from the main cluster of red points, they depict a clear separation between safe combinations of low alcohol feedrates and unsafe ones with overly large feedrates.
Overall, the D-optimal average design comprises five supports. The temperature subplots (bottom row of Fig. 5) show that the supports correspond to low and high temperature experiments only, that is, no experiments with mid-range temperatures. The optimal experimental effort is split amongst the high and low temperature experiments, with slightly larger total (c. 62%) for the high temperature experiments. The u1 subplots (first column) show that four of the supports have high u1 values among the restricted experimental space. The u2 subplots (second column) show that three supports have low u2 values, and medium to high u2 values for the other two. By contrast, the u3 and u4 subplots (third and fourth columns) show a wider range of values, with a less discernible pattern in the maximal average D-optimal design as compared to u1 and u2. Four of the five supports are visually at, or near the boundary of the restricted space from the projections in Fig. 5. This is confirmed by Fig. S7,† which presents the cooling failure temperature Tcf over time for each support.
From Table 4, the actual efficiencies obtained with the rounding scheme (section 3.2.3) are found to be acceptable, even though the efficiency bounds do not provide any guarantee when the total number of runs Nt is lower than the number supports Ns = 5. It is worth re-emphasizing that we apply the greatest effort rounding for Nt that are less than Ns, and the efficient rounding otherwise. As expected, the actual efficiencies approach 100% when a large number of runs Nt are considered. Interestingly, a relatively high actual efficiency is obtained for Nt = 3, as compared to the others except Nt = 10, with a rounded design that has 3 supports and equal effort distribution. However, one should keep in mind that the actual efficiencies do not provide any information on the relative amount of predicted information between the rounded design and the optimal exact design.
α in stage 1 was conducted using the Python code DEUS.40,49 Optimal experimental campaigns within the sampled restricted space
α as part of stage 2 were computed using the Python code Pydex. The solver MOSEK61 interfaced through CVXPY62 was used to solve the convex continuous-effort optimization problems. For the illustrative example, the atomic matrices needed to construct these optimization problems relied on analytical sensitivities passed by the user. For the case study, the DAE model was implemented using Pyomo63 and Pyomo.DAE64 and their sensitivity analysis was automated with the solver IDAS part of the SUNDIALS suite65 coupled with the automatic differentiation tool CasADi.52 The effort rounding procedure in Pydex implements the efficient rounding method60 and defaults to the greatest effort rounding procedure when the continuous design has more supports than the total number of experimental runs in the campaign. Links to both Python code DEUS and Pydex can be found in the ESI†
| Case | Sampling of experimental space | Sensitivity analysis | Optimization | ||||
|---|---|---|---|---|---|---|---|
| CPU model | Time (hh:mm:ss) | CPU model | # cores | Time (hh:mm:ss) | CPU model | Time (hh:mm:ss) | |
| Illustrative example | AMD Ryzen 2600x | 00:00:53 | AMD Ryzen 2600x | 1 | 00:00:11 | AMD Ryzen 2600x | 00:00:15 |
| Restricted, average | Intel Xeon E5-2667 v4 | 01:34:36 | Intel Xeon E5-2697 v2 | 24 | 00:08:31 | Intel Xeon E5-2667 v4 | 03:42:23 |
| Restricted, local | Intel Xeon E5-2667 v4 | 01:34:36 | Intel Xeon E5-2667 v4 | 1 | 00:00:46 | Intel Xeon E5-2667 v4 | 00:03:44 |
| Unrestricted, local | AMD Ryzen 2600x | 00:00:01 | Intel Xeon E5-2667 v4 | 1 | 00:00:44 | Intel Xeon E5-2667 v4 | 00:02:52 |
In experimental design problems where the optimization becomes intractable, one way of lowering both computational time and memory requirement is to consider a local design instead of an average design in stage 2. While this ignores the effect of uncertainty on the predicted information content, the crucial effect of uncertainty on the restricted experimental space is still accounted for as part of stage 1. As reported in Table 3, opting for a local design cuts the optimization wall-time by 60 fold, down from about 4 h to 4 min. Notice that the cut in sensitivity analysis wall-time (from 9 min to 1 min) is significantly less because this step was parallelized for the average design using 24 cores while conducted in serial for the local design.
The consequence of a local design is of course a higher risk for designing a sub-optimal or uninformative experiment campaign. The results of the restricted D-optimal local design are shown in Fig. S8 and S9 and Table S1 of the ESI.† They present large similarities with the restricted D-optimal average design (Fig. 4 and 5, Table 4), implying that a local design may not cause significant information loss in the present case study under the prior probability distribution used for the uncertain model parameters.
| Temperature | Feed rate | Effort | Total number of runs, Nt | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| T (K) | u(t) × 105 (L s−1) | p*a (%) | 2 | 3 | 4 | 5 | 10 | |||
| u 1 | u 2 | u 3 | u 4 | |||||||
a Experiment candidates with efforts below a minimum threshold of 10−4 are reassigned a zero effort and the remaining non-zero efforts are re-normalized so they add up to one.
|
||||||||||
| 347.67 | 1.23 | 0.91 | 0.54 | 0.66 | 44.9 | 1 | 1 | 1 | 1 | 4 |
| 348.14 | 1.19 | 0.07 | 0.19 | 1.04 | 5.0 | — | — | — | 1 | 1 |
| 339.33 | 0.55 | 0.07 | 0.26 | 0.69 | 7.1 | — | — | 1 | 1 | 1 |
| 338.34 | 1.16 | 0.63 | 0.27 | 0.16 | 31.3 | 1 | 1 | 1 | 1 | 3 |
| 347.85 | 1.24 | 0.19 | 0.09 | 0.17 | 11.7 | — | 1 | 1 | 1 | 1 |
| Efficiency bound (%) | — | — | — | 44.6 | 85.5 | |||||
| Actual efficiency (%) | 89.2 | 98.1 | 91.1 | 87.8 | 99.5 | |||||
A rather radical simplification to further reduce the computational time would be to ignore the operational constraints altogether, and conduct the experiment design in the unrestricted space instead. The hope is that the resulting optimal experiments would still satisfy the operational constraints with a high enough confidence—the feasibility probability of the experiment being computed a posteriori—but there is naturally no way that this can be guaranteed. In the present case study, this approach entails designing the experiments within the whole range of experimental factors values permitted, T ∈ [338, 348] (K) and u1, u2, u3, u4 ∈ [0.80, 2.80] × 10−5 (L s−1). Doing so essentially eliminates the time required to sample the experimental space, so an unrestricted local design can now be computed within minutes of wall-time. However, opting for an unrestricted local design leads to all the experimental supports having feasibility probabilities lower than 70%, which is considered to be unsafe—the results of this unrestricted D-optimal local design are shown in Fig. S10 and S11 and Table S2 of the ESI.† A possible compromise could be to consider a nominal restricted space in stage 1, as defined in (4), where the effect of the model uncertainty is ignored. To increase the likelihood of satisfying the desired reliability target, one could then choose the nominal parameter values θ0 conservatively. For instance, in the present case study, one could recognize that higher reaction rates are more likely to cause the system to violate the MTSR constraint and adjust the chosen θ0 accordingly.
Applications of the methodology in a simple response-surface model, then in a more challenging dynamic reaction system described by differential-algebraic equations, demonstrate its suitability and tractability in designing robust experimental campaigns. In particular, the latter depicts a realistic experimental design problem with safety constraints and model uncertainty, that we believe to be currently out of reach for state-of-the-art model-based design of experiment techniques. Although discretization brings advantages, some key challenges with this solution approach also seem to stem from it. In problems with a large number of experimental degrees of freedom, a prohibitively large number of samples may be required to retain a faithful enough representation of the experimental space. Of course, a silver lining is that discretization allows one to start with a small number of samples and gradually refine the covering of the experimental space whenever needed and possible. Herein, we utilize a tailored sampling approach adapted from nested sampling in Bayesian inference to efficiently and uniformly sample the feasible experimental space at a desired reliability level. This sampling method furthermore enables control over the number of feasible experimental candidates generated, thereby controlling the complexity of the subsequent steps as well. Parallel computation of the atomic information matrices is also demonstrated to be effective in significantly cutting computational times and enabling the solution of larger problems. Although not shown, both stages of the methodology could furthermore be conducted in parallel rather than sequentially in order to further expedite the solution.
We envision a few important follow-ups to further improve the methodology. Investigations to enhance computational tractability need to be explored. For instance, the use of the sigma points could help propagate uncertainty more efficiently than through standard Monte Carlo simulation. Deployment of surrogate modeling techniques is another promising approach to mitigate computational intractability with expensive process models, while for tackling problems with more experimental controls recursive schemes that either adapt or augment the set of sampling points recursively could be devised. Lastly, analyzing the trade-offs between experiments conducted sequentially or in parallel, especially in the presence of operational constraints, delineates a promising research area.
Footnotes |
| † Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d1re00465d |
| ‡ An explicit expression of the constraint feasibility probability is given by |
| This journal is © The Royal Society of Chemistry 2022 |