Combining random walk and regression models to understand solvation in multi-component

Polysaccharides, such as cellulose, are often processed by dissolution in solvent mixtures, e.g. an ionic liquid (IL) combined with a dipolar aprotic co-solvent (CS) that the polymer does not dissolve in. A multi-walker, discrete-time, discrete-space 1-dimensional random walk can be applied to model solvation of a polymer in a multi-component solvent mixture. The number of IL pairs in a solvent mixture and the number of solvent shells formable, x , is associated with n , the model time-step, and N , the number of random walkers. The mean number of distinct sites visited is proportional to the amount of polymer soluble in a solution. By also fitting a polynomial regression model to the data, we can associate the random walk terms with chemical interactions between components and probe where the system deviates from a 1-D random walk. The ‘frustration’ between solvents shells is given as ln x in the random walk model and as a negative IL:IL interaction term in the regression model. This frustration appears in regime II of the random walk model (high volume fractions of IL) where walkers interfere with each other, and the system tends to its limiting behaviour. In the low concentration regime, (regime I) the solvent shells do not interact, and the system depends only on IL and CS terms. In both models (and both regimes), the system is almost entirely controlled by the volume available to solvation shells, and thus is a counting/space-filling problem, where the molar volume of the CS is important. Small deviations are observed when there is an IL–CS interaction. The use of two models, built on separate approaches, confirm these findings, demonstrating that this is a real effect and offering a route to identifying such systems. Specifically, the majority of CSs – such as dimethylformide – follow the random walk model, whilst 1-methylimidazole, dimethyl sulfoxide, 1,3-dimethyl-2-imidazolidinone and tetramethylurea offer a CS-mediated improvement and propylene carbonate results in a CS-mediated hindrance. It is shown here that systems, which are very complex at a molecular level, may, nonetheless, be effectively modelled as a simple random walk in phase-space. The 1-D random walk model allows prediction of the ability of solvent mixtures to dissolve cellulose based on only two dissolution measurements (one in neat IL) and molar volume.


Introduction
Random walks are models of a walker, be it a drunkard, a molecule, or a stock price, exploring space (real space for the drunkard and molecule, phase space for the stock price) randomly.There has been a wealth of research into random walks in many fields. 1,2Within chemistry and materials science, random walks have been applied to polymer absorption, 3 copolymer structure, 4 electron traps in semiconductors, 5,6 quantum mechanical paths, 2,7 phonons in liquids, 8 phospholid motion in cell membranes, 9 diffusion in zeolites, 10 motion of insulin granules in cells, 11 intramolecular migration of chemical species along oligomers, 12 rotaxanes diffusing along polymers, 13 and modelling of polymer motion. 14The most extensive use has been the that of a self-avoiding random walk to generate polymer conformations. 2,7,15,16here are several types of random walks.Brownian motion 17 and bacterial motion 18 are examples of continuous-time, continuous-step random walks, where the walker takes a step at random points in time in a random direction.Time and space can also be discretized, giving the discrete-time, discretespace random walk model, which is useful for modelling lattices [19][20][21] and is uniquely accessible by cellular automata modelling techniques. 22Most random walks are Markovian (and can be modelled as Markovian chains) because they are memoryless, i.e. the direction (and timing) does not depend on the system's previous history.For Markovian chains, each step depends on the step before.Self-avoiding random walks, contrastingly, must have a complete memory of the system, as the walker must not go where it has been before (it is this rule that allows this type of walk to be used to model polymers: two bonds cannot be in the same place; or quantum mechanics: two electrons cannot occupy the same state). 2Random walks can also take place in a number of dimensions: 1-D walks are walks along a line, 2-D over a plane, and 3-D over a volume (and there are higher dimensional random walks).
Most mathematical results are concerned with many walker systems.Studying the motion of a single walker selected from a set can, as the walkers are identical and the walk is selected randomly, be a study of the average properties. 23Conversely, order statistics, which is the study of the first walkers to arrive at a point, can model single molecule experiments. 24,25Another useful property is the size of a random walk, which relates to the radius of gyration for a polymer, for example. 26,27In 1951 Dvoretzky and Erdo ¨s suggested that the number of distinct sites visited by any random walker, when there are N interacting random walkers, was an interesting problem. 28The first solution was published in 1992 by Weiss et al., [29][30][31] and was solved by describing the system with generating functions for probability distributions, and then approximating the behaviour of a coordinate (Laplace) transform approximation of the generating function at a singular point: this gave solutions that were valid only for the large number of walkers limit and extended time. 30niversality is the concept that the microscopic details of a system do not alter the asymptotic properties of that system, 2 which is why many systems seem to have similar behaviour.The most well-known result is the central limit theorem, which states that, for many statistically independent random variables, the output distribution will be a Gaussian (normal) distribution regardless of the underlying distribution -and any system that exhibits this behaviour is a member of the Gaussian universality class.(Interestingly, the memoryless random walk belongs to the Gaussian universality class, while the self-avoiding random walk used to model polymer structure belongs to a different universal class. 2 ) Thus, it is possible to model seemingly unrelated systems with a random walk, if that system is a member of the same universality class.In this paper it is posited that the amount of a solute dissolvable in a mixture across phase space can be modelled as a 1-D random walk because once a 'site' in a mixture is filled, it does not spontaneously 'unfill' (i.e. the polymer come out of solution) and can frustrate the solution of further solute in a manner, similar to queuing effects seen in 1-D random walks.In this paper, a 1-D random walk model is applied to the problem of cellulose solvation in an IL:CS solution.
An alternative model for the system is a multi-polynomial regression model.Polynomial regression models are a particular form of a linear regression model that include higher order terms following the general equation: where y is the response variable, x is the predictor variable, h is the degree of the polynomial, b 0 À b h are coefficients, and e is the error variable.Multiple predictor variables can also be included within a linear regression model, giving rise to interaction terms: where z is a second predictor variable and xz is the interaction term between x and z.These two models can be combined, resulting in polynomial interaction terms with higher order terms for either predictor variable, e.g.x 2 z.This skews the interaction term such that the variance of one predictor variable has a greater impact than that of another, i.e. the weighting is not equal.Models including polynomial interaction terms have previously been used to describe the variation of aerosol optical depth, 32 flow-induced fibre orientation, 33 and cardiorespiratory interaction. 34n order to process materials, it often necessary to dissolve them.Tuning solvent mixtures to get useful properties and balancing requirements such as chemistry, cost, toxicity, physical properties, environmental-friendliness, and so on, have recently gained traction in industry.The dissolution of polymers is an important problem for the semi-conductor industry (polymeric resists), membrane science, plastics recycling and drug delivery. 35n example is that of cellulose dissolution.Cellulose, the main constituent of plant cell walls, is a naturally occurring biopolymer with an estimated 28.2 billion tonnes produced via biomass each year. 36It is already well established as a raw material for biocompatible and environmentally-friendly products, including synthetic fibres, coatings and additives for foods and cosmetics. 36][38] The recent development of the dissolution of cellulose using ionic liquids (ILs) has enabled the facile dissolution of cellulose without the need for chemical modification. 39Whilst a number of different ILs have been used, the majority are based on an imidazolium cation; hence the focus on 1-ethyl-3-methylimidazolium acetate, perhaps the most studied IL for lignocellulosic dissolution, in this paper. 38,40Whilst ILs have negligible vapour pressure and are often miscible with water, the high viscosity and economic cost limit their current use. 41The observation that mixtures of dipolar aprotic solvents, such as dimethyl sulfoxide (DMSO), with some ILs facilitates instantaneous dissolution of cellulose 42,43 has broadened the range of solvent systems available.Mixtures of ILs and CSs have been termed ''organic electrolyte solutions'' (OESs). 42Recently, we have described a predictive methodology for selection of the CS. 440][51][52] Whilst computational models have been developed for specific systems at set IL molar fractions (w IL ), 44,52 there have been, to the best of our knowledge, no studies aimed at developing an operational understanding of the cellulose dissolution curves generated as w IL changes.
In this paper, we build a 1-D random walk model for the quantity of cellulose that dissolves in an OES with variable CSs, and compare this to a multi-polynomial regression model of the same system.We demonstrate that the 1-D random walk model can be used to vastly reduce (to two) the number of dissolution experiments required to characterise the cellulose dissolution profile in new OESs, such that the model is of great utility to experimentalists.

Random walk model
In the 1-dimensional (1-D), discrete-time, discrete-space random walk for many walkers, space is represented as a 1-D lattice.Sites on the lattice can be occupied by a walker, or are empty.Time is discretised into time-steps; walkers can move left or right with equal probability; two walkers cannot occupy the same site; and walkers cannot 'hop over' each other (i.e.walkers can get 'stuck' behind each other).One of the commonly calculated properties of a random walk is the number of distinct sites visited by N walkers in n steps, often given as an expectation value on a distribution, hS N (n)i, which is given by: for a large number of N, where s is the variance. 29For a single walker, S 1 ðnÞ h i¼ ffiffi ffi n p , so the effect of extra walkers frustrating each other's random walks is ffiffiffiffiffiffiffiffiffi ln N p .If N is fixed and n -N, then S N ðnÞ h i%s ffiffi ffi n p in the case where the walkers have taken enough steps to get out of each other's way and the system is best described as N versions of a single random walker.This is not possible for the 1-D random walk, and walkers always frustrate each other's exploration.
When the number of steps is low, the system is in regime I, where n { N early on in the random walk, then hS N (n)i E n.In regime II, n E N, and the walkers start to interact.
This model is here applied to a system where a solute, Z, dissolves in a mixture of two components, X and Y, where Z is soluble in pure X, insoluble in pure Y, and soluble in some mixtures of X and Y, dependent on the amount of X.The solution comprises solvation shells (SSs), which are the part of the solvent that is interacting with the solute, perturbed by its presence and providing energetic support to keep the solute in solution.These are embedded in a bulk background solvent, which is the part of the solvent that is not perturbed by the solute.This model could apply to colloids, or phase-separated mixtures, as well as true solutions, and the mixture need not be restricted to only two components.For our example system, X is the IL, Y is a CS and Z is cellulose.The solvent shell is expected to be mostly made up of IL embedded in bulk mixture of IL and CS: see Fig. 1.For this particular model, it is assumed there is no interaction between CS and IL.
The number of SSs dissolved in the mixture is modelled as the amount that will fit into the physical volume of solvent.The structure of the solvent shells is ignored and modelled only as occupied volumes of space, reflecting the assumption that these do not change size, or shape, for different SS concentrations.Rate of dissolution, or any time-based measure, is not included, so it is assumed that the amount of solute dissolved is the maximum soluble in that solvent mixture, i.e. that it is at equilibrium.As such, this is a counting/space-filling problem: the number of solvent shell volumes that are filled out of the possible maximum number achieved if the volume were maximally occupied by solvent shells is counted.As the number of 'sites' filled are being counted, this is a 1-D problem: see Fig. 1.A macroscopic physical model for this would be placing hard spheres into a tube.
There is a measure of spatial frustration here.A volume of mixture can contain only so much of Z and each molecule of Z has a SS of a certain size.The maximum amount of Z that can fit into a volume is c max , ¶ where it is understood that this is the ) there is plenty of 'space' in the solution and little interaction between SSs.(b) At a high concentration (for example, w IL = 3 4 ) the SSs compete for space, which is modelled as random walkers frustrating each other's exploration of model space.w IL is the molar fraction of IL, and the maximum number of SS that could be fit into the volume, c max , is 4 in this schematic.
maximum value over all possible combinations of X and Y (i.e.all molar fractions), and is generally expected to be the value measured when w X = 1 if Y is a CS that does not dissolve Z.The maximum amount of Z that can dissolve in a mixture at any given w X is c(w X ).This is a measure of how many sites there are in the liquid mixture.Thus, c is a measure of the holding capacity of a solvent mixture (and the fact that we can describe the process in terms of c/c max explains why the process is 1-dimensional).If a mixture is close to its value of c, then for another molecule of Z to dissolve, the SSs of all molecules of Z must rearrange to admit another molecule of Z. Thus, solvation is modelled as site filling, and the number of sites are proportional to w X .
Instead of N random walkers, there are x 'sites' in the liquid mixture capable of 'holding' a unit of c, i.e. a volume that would be occupied by the solute and its SS.These sites interact with each other because two SSs cannot occupy the same position.Instead of n time-steps, there are n IL ion pairs available, and the process is not considered over time, but over phase-space: it is modeled across chemical phase space on the continuum w X from 0 (pure Y) to 1 (pure X).
The theoretical model for the expectation value of the mean number of distinct sites visited by N random walkers is used.Once visited by a walker for the first time, a site cannot be visited again for the first time.In this system, once a site is visited for the first time, a SS is formed around a 'piece' of cellulose and the site is occupied.8The model uses discretized space in the discrete sites available for dissolving Z, bearing in mind that these sites do not exist until Z is added to the mixture and the solvent forms a 'dissolution site' around it (this is identified with SSs, as we shall see later).The existence of these sites is predicated on the number of IL ion pairs available (n), thus compositions with greater n can form more SSs.The mean number of distinct sites visited hS N i is then proportional to the amount of Z that can dissolve in a mixture.

Applying the random walk model to cellulose dissolution in ionic liquid mixtures
The system under study is a multi-component liquid consisting of an IL, 1-ethyl-3-methylimidazolium acetate, [EMIM][OAc], (X = [EMIM][OAc]), and a dipolar aprotic CS (Y), one of: dimethyl sulfoxide (DMSO), N,N-dimethylformamide (DMF), N,N-dimethylacetamide (DMAc) or N-methylpyrrolidinone (NMP).These had previously been identified as good OES CSs. 44In addition, key solvent parameters were identified from amongst Catalan SP, SdP, SA, SB parameters and Lawrence DI, ES, a 1 and b 1 parameters.Ranges were identified for these parameters that would describe a 'good' CS for the formation of OESs for the dissolution of cellulose.Specifically, the Lawrence a 1 and Catalan SA measures were found to be the most important measures of these sets for identifying good CSs. 44ndom walk model results for the full set of solvents previously tested, 44 which additionally includes g-valerolactone (g-val), g-butyrolactone (g-but), 1-methylimidazole (1-MI), 1,3dimethyl-2-imidazolidinone (DMI), sulfolane, and propylene carbonate (PC), is given in Fig. S1 in the ESI.†

Multi-polynomial regression model for cellulose dissolution in an OES
In order to investigate the effect of different CSs on the maximum dissolution of cellulose in an OES, a multi-polynomial regression model was independently developed based on the molar ratio of IL (w IL ) and the molar ratio of CS (w CS = 1 À w IL ): where w cell is the molar fraction of cellulose dissolved, normalised to the value obtained in the pure IL (w IL = 1), and A-F are constants.The first term describes the linear effect due to the IL; the second is the linear effect due to the CS; the third term describes the CS:IL cross-term interaction; and the final term describes the IL interaction with itself.In all cases, A = 2 and F = 1 to ensure that w cell = 1 at w IL = 1 (thus the maximum number of fitting parameters used is four).
Given that the pure CS cannot dissolve cellulose, B o 0 and determines the point at which the curve crosses the abscissa, i.e. the point at which dissolution of cellulose is possible.The term C(w IL ) D (1 À w IL ) E determines whether the dissolution of cellulose is modulated due to interaction between the IL and CS.C can either be positive or negative (for example, CSs that inhibit the dissolution of cellulose in this system have negative C values) and is equal to 0 when there is no interaction between the CS and IL.The coefficients D and E weight the interaction term in favour of the IL or the CS.
Constants B, C, D and E were calculated using the 'Solver' function in Microsoft Excel applying the GRG non-linear solving method. 56Both the sum of the differences between the modelled and real values, and the sum of the squared differences were minimised.
The minimum IL, w IL | min for each solvent, taken as the abscissa crossing point, was fitted against the molar volume of CS and OES, V CS M .This was fitted in Mathematica using the equation: where v IL is the volume of IL in the solution and (1 À v IL ) replaces the volume of CS in the solution v CS , leaving f as a free fitting parameter.A linear fit with m and g as fitting parameters, was also performed on the transformed data: These fits were executed in Mathematica 11 using the 'NonlinearModelFit' and 'LinearModelFit' functions, respectively. 57his journal is © the Owner Societies 2017 Phys.Chem.Chem.Phys., 2017, 19, 17805--17815 | 17809 3 Results

The random walk model
The identification of the relation between the site model of a solution and a random walk over space and time leads to the question of how the regimes mentioned above correspond to the solution model.Regime I is identifiable as a low concentration regime, w IL { 1, where the number of occupied sites is much less than the maximum available (c max ), and is limited by the amount of IL in the mixture.Thus, there is little interaction between pairs of IL ion pairs and the SSs do not interact or encounter each other.At this point, c B x, where x is the number of solvent shells; all ions of IL are associated with a SS, and c scales linearly with w IL .
Regime II occurs when the concentration of IL is high enough that the SSs start to interfere with each other; at this point c is non-linear with w IL .
If the amount of cellulose, c, is proportional to the expected number of distinct sites visited by N random walkers, we would expect that: As the random walk model is valid for the large x case, we calculated the number of dissolved cellobiose units in a mole of OES, n cell , and plotted it against the number of IL pairs in a mole of OES, n IL , (which was taken from the molar fraction data), Fig. 2. The data fits the theory with the equation: which, by comparison with eqn (3), directly relates the number of dissolved cellobiose residues to the expected number of sites visited by N random walkers.In the random walk model the walk starts at '1' because there are many walkers present at the first time step and it is possible to have an expectation value of zero.As we need a certain number of IL molecules present (which is more than 1) in order for the first 'site' to be 'occupied' (i.e. to form a SS), the random walk is offset from the origin by d, the point where the curve crosses the abscissa.
Fitting parameters are given in Table 1.The curvature of the curves in Fig. 2 is m in the fit, which is identified as ffiffiffiffiffiffiffi ffi ln x p in our model, and so should be related to the extent of interaction (i.e.frustration) between the solvent shells.This value is different for different CSs.The quantity ffiffiffiffiffiffiffi ffi ln x p is a measure of the volume the CS occupies.Bulky CSs take up a larger volume for a given solvent composition compared with smaller, less bulky CSs -with the proviso that some of the CS 'bulk' is a measure of electrostatic interactions, see Fig. 3.The molar volumes of the CS investigated here ranged from 71-107 cm 3 mol À1 (see Table S2 in the ESI †), or 43-65% of IL molar volume (which is 166 cm 3 mol À1 ).
In Fig. 4 the volume fraction that is available to the IL to form SSs is plotted against the amount of cellulose dissolved: this has removed the effect of the CS in the mixture entirely.There is a linear dependence (R 2 is over 0.95 for each good CS, see Table 2), and there is no appreciable difference between solvents; in fact, a straight-line fit to the all the good CSs has an R 2 -value of 0.976.Including all CSs only perturbs this fit slightly (the gradient is unchanged and R 2 value is 0.930).There is little difference between OESs and certainly any difference is within the spread of the experimental data, thus, the only important quantity is the volume available to the IL.Similarly, the number of IL ion pairs per cellobiose in pure IL is between 7.4 and 8.7 (Fig. 5), with NMR experiments suggesting that there are 6-8 IL ion pairs to cellobiose in the primary SS.** 58 The gradient is 1/5, so an extra 5 IL pairs are needed to allow each extra cellobiose to be dissolved.This number reflects that determined from NMR  studies, 58,60 as there are 5 hydroxyl groups for the IL to interact with, and it has been found that the volume fraction of cellulose in IL is 0.2, 60 which relates to the gradient (1/5) observed when the system is in regime II.Therefore, the spatial frustration from SSs accounts for a loss of around 2.6 IL pairs from SSs.

The multi-polynomial regression model
The fitted constants for the multi-polynomial regression model, eqn (4), are given in Table 3.For the majority of the dipolar aprotic CSs tested (NMP, DMF, DMAc, sulfolane, g-but and g-val) there is no interaction between the CS and the IL: the constants C, D and E equal 0 (Table 3).An example of the resulting curve is given in Fig. 6a.This gives rise to three regimes (0, I, and II).In regime 0 cellulose dissolution is not possible; the IL is prevented from forming SSs around the cellulose.From a packing perspective, the free-space available to the IL is not large enough to fit a complete SS, thus, cellulose cannot be dissolved.It follows that the larger the CS molar Fig. 3 Different CSs with different molar volumes (V M ) can correspond to the same w IL .The CS with the smaller molar volume (left) has less available volume for SS, and thus a higher frustration interaction than the CS with a larger molar volume (right), which has larger OES volume for the same molar fraction.This effect is the basis for most of the differences between OES formulations.2. Note, the excess volume of mixing is ignored as it was found to be very small in experiments. 44ble 2 Coefficients for the w cell versus n IL fits.The equation used was: a All CSs with a positive interaction are: 1-MI, DMSO, DMF, DMI, DMAc, sulfolane, g-but, g-val, TMU and NMP (data are plotted in Fig. S3 of the ESI).
Fig. 5 The number of IL pairs per cellobiose residue.Upper and lower boundaries (blue lines) are drawn at The full set of data is given in Fig. S4 in the ESI.† volume, the greater the w IL required to initiate dissolution, implying that the w IL at which cellulose dissolves should be proportional to CS molar volume, as is shown in Fig. 7. Regime I, where Aw IL c Fw IL 2 , is equivalent to c B x; the increase in the number of SSs results in a corresponding linear increase in the amount of cellulose that can be dissolved, as described by the random walk model.At a higher proportion of IL, Aw IL c Fw IL 2 -Aw IL = Fw IL 2 , regime II occurs; a greater number of SSs are required to dissolve a cellulose unit due to their interaction with one another, limiting cellulose dissolution.This increase in relative weight of the w IL 2 term is a measure of SS frustration, equivalent to ffiffiffiffiffiffiffi ffi ln x p in the random walk model.A positive interaction between the CS and IL is predicted for 1-MI, DMSO, DMI and TMU.For 1-MI, DMSO and DMI, E 4 1, resulting in experimental curves that initially follow the modelled curve with an equal weighting between the CS and IL, before falling into line with the curve reflecting no interaction at higher IL fractions (Fig. 6b).However, for TMU, E E 1, giving little deviation from the curve with equal weighting (Fig. 6c).It is theorised that these four CSs (1-MI, DMSO, DMI and TMU) are able to participate in the formation of the SS between the IL and cellulose, resulting in a reduction in the number of IL ion pairs required in the SS and enhancing dissolution over that of the pure IL.This is in agreement with RISM calculations conducted on a glucan chain-[EMIM][OAc]-DMSO system with a low concentration of IL (w IL = 0.019), where it was observed that DMSO appeared to solvate the glucan chain in a similar manner to the acetate anion. 44he transitional nature of the enhancement is also in agreement with a previous molecular dynamics study in which it was reported that DMSO does not interact with cellulose at w IL = 0.5; 52 at this concentration the CS:IL interaction term is negligible.
In this scenario, regime I is described by Aw IL + Cw IL (1 À w IL ) E c Fw IL 2 , and whilst c B x, x is no longer directly proportional to the number of IL molecules in the system, but rather  dependent on the number of IL and CS molecules (as x is a measure of SSs, this fits with the supposition that these CSs are participating in the SS).For DMSO, 1-MI and DMI, regime II still tends to the Aw IL = Fw IL 2 limit because the Cw IL (1 À w IL ) E term becomes insignificant.For the TMU-based OES, regime II tends to the Aw IL + Cw IL (1 À w IL ) E = Fw IL 2 limit instead.For DMSO, 1-MI and DMI it is apparent that the number of CS molecules interacting with the SS decreases between regimes I and II, resulting in a transition region, regime IIa, (Fig. 6c), whereas, the number of TMU molecules in the SS remains constant over the entire phase-space resulting in a direct transition from regime I to II (Fig. 6b).
Finally, PC has a negative interaction with the IL, resulting in suppression of the dissolution of cellulose.Its transition from having no interaction, to interacting at higher w IL is unexpected.In regime I it conforms to Aw IL c Fw IL 2 and then transitions to tending towards regime II as the CS:IL interaction term becomes more significant.From the perspective of the random walk model, the presence of PC results in an increase in an interaction between the SSs above that of the 'normal' system where there is no CS:IL interaction.(Conversely, the presence of DMSO, 1-MI, DMI or TMU results in a decrease in the theoretical interaction at particular molar fractions.)It is theorised that at low w IL , where SSs have formed, the PC molecules interact with the shells, but there are enough other PC molecules present in the bulk such that they cannot interact with more than one shell.Therefore, the interactive term is negligible and the curve initially follows the Aw IL { Fw IL 2 curve.However, as the w IL increases, the 'shielding' effect of the free PC molecules decreases, which implies that PC molecules interact with more than one SS and decrease the dissolution of cellulose compared to a non-interactive system, regime IIa.Finally, in regime II the PC molecules interact with the maximum number of SS, minimising dissolution.3.2.1 Regime 0. The random walk model starts with regime I, as, if there is an empty site, a walker can always move into it.As this model relies on site filling, it is not possible to predict regime 0, i.e.where cellulose is insoluble in the mixture.It is, however, useful to discern the start of regime 1, i.e. the lowest IL concentration at which cellulose begins to be soluble, based on a small number of measured data points, as is discussed in detail below.In our system, an empty site is a SS and the point of transition from the zeroth regime, where there is no cellulose dissolution, is a measure of the amount of cellulose solvated by a SS and the amount of IL in the OES.The transition point between regime 0 and regime I is correlated with the CS molar volume, Fig. 7.This trend is described by the following equation (which comes from the definition of w IL ): where v IL is the volume of IL in the mixture, v CS is the volume of CS in the mixture, and V IL M and V CS M are the molar volumes of IL and CS, respectively.Eqn (8) fits the data well (R 2 of 0.957) with the f = À28.89 and v IL = 0.0414 (this quantity is related to the choice of IL).As a check, the data were also linearised (see eqn ( 6)), yielding a fit with R 2 = 0.830 with An intuitive explanation of how SS, with approximately the same volume, could have drastically different w IL | min is given in Fig. 8.In contrast with Fig. 3, where the w IL was constant, here we assume that the volume available is constant, and compare the effect of V CS M on w IL .As shown in Fig. 8, the number of CS molecules that can fill the available volume (v CS ) depends on its V M .The bulkier the CS, the less fit into the available volume (v CS ), thus the ratio of IL:CS is greater, resulting in a larger w IL .

Comparison of the models
Both models were built separately, from separate assumptions, and both lead to the conclusion that (if the CS is an appropriate OES former, i.e. has suitable solvent properties, as defined previously 44 ), the volume available to form solvent shells is the most important aspect of the system.The 1-D random walk, grounded in ideas of universality explains this: once the solvent system is tending towards its limiting behaviour, a deep understanding of the underlying chemistry of the system is not required, the system can be modeled as described (as evidenced by the fact that differences between different high w IL OES are within experimental error) and solvent shells act like hard spheres in a finite box.This holds over the range identified as regime II of that model, which is associated with a significant IL:IL term in the linear model (the models are compared in Table 4).That this model is able describe most of the behaviour for most of the OESs is remarkable.The random walk model does not allow direct separation of the effects of a CS and IL interaction, but the multi-polynomial model does: a comparison of the two models shows where this interaction affects the system, and thus where the chemistry of the CS and IL should be considered.

Use of the random walk model
The random walk model is useful to the experimentalist, allowing good prediction of experimental data from very few measurements, and should work for other ILs.As a demonstration, the predictions that could be gained from the model and measurements from a single OES composition, for Fig. 2 will be calculated; other figures can be calculated following a similar method.The value of a mixture of w IL = 1 is known, the value of w IL | min can be obtained from eqn (8) (or Fig. 7, or eqn ( 6)), so only a single OES composition need to be measured.For an experimentalist, the most important range is nearest the corner of the curve as this is the most efficient composition. 44As this is not known, a value of 0.2 r w IL r 0.4 is a good choice for measurement.To demonstrate the method, typical CSs, g-but and DMAc, were chosen and, to demonstrate the limits, the CS least-well described by the model in Fig. 2, DMSO, was chosen.A subset of three points was chosen and the fits to these were compared to the fit to the entire data.The molar volumes of g-but, DMAc, DMSO and IL were input into eqn (8).DMAc has a w IL | min value of 0.09 (the less accurate, linearized eqn (6) gives 0.12 and the value calculated from the fit in Fig. 2 is 0.06), w IL | min DMSO is 0.056 and g-but is 0.06.The measurements around w IL = 0.35, w IL = 0.4 and w IL | min = 0.4 were chosen as the single measurement (which is made of 3 repeats for DMSO and DMAc, two repeats for g-but) for DMAc, DMSO and g-but respectively.To balance the fit, the points as w cell = 0 were repeated the same number of times as the experimentally measured point (i.e. 3 repeats for DMSO and DMAc and 2 repeats for g-but).The final point was the value for w IL = 1 which was known.
As an example, the fit to the chosen subset of points is shown for g-but in Fig. 9 (the data for DMSO and DMAc are given in Fig. S7 and S8 of the ESI †).For DMAc OES, the single measurement fit had an R 2 value of 0.995 to the whole data (as the fit to the whole data had an R 2 of 0.999 (see Table 1) by only measuring one point most of the variance in the data has been explained).The average residual between the single measurement fit and the actual data is only 7.06%.For DMSO, the single measurement fit has an R 2 to the whole data of 0.991 (again, using just a single measurement point affects only the third decimal place), and an average error of 9.43%.For g-but, the single measurement fit has an R 2 to the whole data of 0.995 (the fit using the entire data has an R 2 of 0.996), and an average error of 7.40%.
Thus, to predict the data to within 10% error, an experimentalist need only locate the molar volume of the selected CS and measure one point in the range 0.2 r w IL r 0.4 (assuming the solubility in pure IL is known, if not, that measurement is also required).This is valid because, as results presented here have demonstrated, the volume of the SS (which relates to the volume of the CS and IL) is the most important feature for predicting behaviour in these systems.A good predicted solubility curve is provided, even where there are interactions between CS and IL, which cause the OES to deviate slightly from the 1-D random walk model.For example, the predicted curve for the DMSO-based OES fits well with that predicted from the random walk model, even though the regression model suggests CS/IL interactions: the non-random walk terms account for only around 10% of the variance.So, even for these CSs, an experimentalist could save significant time by starting with the predictions from the one measurement fit to the random model and refining any critical regions, if very precise data are required.
If the experimentalist chose to investigate a different IL (provided that it was measured experimentally with a minimum of two data points, including one that was the solubility of cellulose in the neat IL), this method would be appropriate and, moreover, very quick to apply.The key value needed, which is the minimum mole fraction of IL required to initiate cellulose dissolution, could be calculated from eqn (8).An alternative, and simpler approach, especially for finding the OES composition needed to dissolve a known amount of cellulose, is to use the fit to Fig. 4.

Conclusions
It has previously been demonstrated that suitable cellulose dissolving OESs may be selected by matching co-solvent parameters. 42,44Here we show that, once such a co-solvent is identified (either by data-mining of solvent properties, 44 or by experimental means), minimal experimentation combined with the 1-D random walk model described here allows prediction of the entire experimental dissolution curve of a solute in an OES.In this scenario, the most relevant parameter governing the behavior of 'good' OESs is the volume available to form solvent shells, which is related to the molar volumes of both CS and IL.This fits with the previously noted indication of preferential solvation of cellulose chains derived from RISM calculations. 44he 1-D random walk model is an example of a type of universal process, which has been applied to complicated multicomponent systems.It is useful to identify and understand the single dimension that a system can be described by, and doing so gives insight into the process and economy of the model.The success of this model helps highlight the important aspects of these systems and the most relevant control parameter (for the OESs in this paper), which is the volume fraction available to the IL.The amount of cellulose soluble in an OES is largely a function of the number of solvent shells that can be formed by the available IL, which relates to the number of IL ion pairs available and the volume of the solvent shell as a proportion of the total volume available.Thus, a space packing/ counting model of solvent shells, modelled as hard spheres being added to a box, is sufficient to describe most (and in some cases all) observed behaviour.This model is general and could be applied to other solvent mixtures.
The 1-D random walk model is suitable for rapid and efficient prediction of a dissolution curve with changing composition of a mixed solvent system, such as an OES, requiring only two experimental measurements: maximum solubility of solute in the pure IL and solubility of the solute in the OES in the descending portion of the solubility curve, i.e.where solubility is sensitive to changes in composition.Thus, this provides a useful tool for vastly reducing the number of experiments required to develop the solubility curve for a new co-solvent, as demonstrated here for g-butyrolactone.The very utility of this model derives from the reductionist approach.However, even the most superficial consideration of the chemistry of the components would suggest that intermolecular/interionic interactions are likely to modulate solvent properties in a manner not modelled by consideration of space-filling alone.Thus, a polynomial regression model was developed independently of the 1-D random walk model, to allow robust comparison.This highlights OESs that demonstrate slight deviations from the behavior predicted by the 1-D random walk model and, thus, points to interactions that are likely to be important.For example interaction, or cross terms, of the form (w IL ) D (1 À w IL ) E are required to describe some systems.Amongst the CSs tested, consideration of molar volume alone proved adequate to describe OESs formed with [EMIM][OAc] and NMP, g-but, DMF, sulfolane, DMAc or g-val, i.e. the CS:IL interaction was negligible.Positive interactions between CS and IL result in enhanced solubility, above that predicted by the 1-D model, for 1-MI, DMSO and DMI at low w IL , although it is notable that these require such minor adjustments to the model that most effects would disappear into experimental error, except in very comprehensive testing.In addition a positive CS:IL interaction was indicated for TMU across the entire range and a negative CS:IL interaction suggested for PC (a rather poor solvent for preparing OESs for cellulose dissolution).This shows the value of our approach whereby a simple, analytical and easy to understand model can describe almost all of the measured data, and a fitted regression model can highlight where other aspects, such as CS:IL interactions, are important.In one of these cases, previous results from more detailed theory (RISM) agreed with the findings from the regression model and confirmed the presence of a DMSO:IL interaction at low w IL and its absence at mid-range w IL . 44,52learly, such intentional simplification of OES systems does not serve to describe the detailed interactions at a molecular, or electronic level, which would require detailed modelling approaches utilising high levels of theory.Nonetheless, the remarkably good fits demonstrated indicate a model that is very useful to reduce experimentation required, allowing rapid selection and implementation of new OESs.Furthermore, the deviations identified in the regression model point to groups of solvents and CS/IL combinations that merit further in-depth modelling to understand the subtle interactions occurring.

Fig. 1
Fig. 1 Model schematic.The dissolved cellulose is surrounded by an ionic liquid (IL) containing solvent shells (SS) embedded in bulk solvent mixture of IL and CS (bulk OES); the SSs are modelled as filled or empty sites available in the solution.In 1-D model space, only the number of these volumes is included in the model; their spatial arrangement, temporal dynamics or kinetics are ignored.(a) At a low concentration (in this case, w IL = 1 4) there is plenty of 'space' in the solution and little interaction between SSs.(b) At a high concentration (for example, w IL = 3 4 ) the SSs compete for space, which is modelled as random walkers frustrating each other's exploration of model space.w IL is the molar fraction of IL, and the maximum number of SS that could be fit into the volume, c max , is 4 in this schematic.

Fig. 5
shows the number of IL pairs per cellobiose residue.The system is linear when it is in regime II (between ca.0.2 o w IL r 1), and there is little difference between the different CS-based OESs in this regime.The minimum number of IL pairs required per cellobiose residue is ca.2.4, and the straight lines are drawn at n IL n cell ¼ 2:4 þ 5w IL and n IL n cell ¼ 3:7 þ 5w IL , which gives an estimate (from the ordinate crossing point) of the minimum number of IL pairs that would be required, were the system permanently in regime II, of 2:4 o min n IL n cell !o 3:7.This range fits with the values of 2 r IL o 3 found for cellulose in a DMSO-based OES.

Fig. 2
Fig. 2 Number of cellobiose units of cellulose, n cell (taken from molar fraction data) in a mole of solution against the number of IL pairs for 'good' CSs.The fit is n cell ¼ d þ m ffiffiffiffiffiffi ffi n IL p .These good CSs demonstrate curves that resemble 1-D random walk curves.(Data for all solvents tested are given in Fig. S2 of the ESI.†) Points are experimental data measured in pairs of over-and under-estimates of the maximum cellulose dissolvable, the lines are the 1-D random walk fits.

Fig. 4
Fig.4Molar fraction of cellobiose, w cell , versus volume fraction of ionic liquid, n IL , (as calculated from molar volumes and molar fractions).The amount of cellulose dissolvable in a solution mixture is only related to the volume of IL available, suggesting that a space-filling model of dissolution is appropriate.Blue circles: DMSO; orange squares: DMF; yellow diamonds: DMAc; pink triangles: NMP.The fit parameters are given in Table2.Note, the excess volume of mixing is ignored as it was found to be very small in experiments.44

Fig. 6
Fig. 6 Lower-limit (underestimate) experimental (red dots) and theoretical solubility curves for (a) g-butyrolactone, (b) TMU, (c) DMSO and (d) propylene carbonate.Curves with no (w IL )(1 À w IL ) interaction term denoted in blue; red includes the interaction term (w IL )(1 À w IL ) (equal weighting); orange curves include (w IL ) D (1 À w IL ) and curves with (w IL )(1 À w IL ) E are in purple.Regime 0 is white, regime I is highlighted in blue, the transition regime IIa in yellow, and regime II in red.'Normal' CSs OES have no CS:IL interaction and follow the random walk model (a), deviations can come from a positive interaction (b and c), or a negative interaction (d), which slightly modify the curve.

Fig. 7
Fig.7Correlation between co-solvent molar volume, V CS M , and minimum w IL at which the dissolution of cellulose occurs based on upper and lower limits for each CS.The bulkier the CS, the greater the w IL required to initiate dissolution, as less free-space is available for SS formation at low w IL .The fit is given in eqn(5).

Fig. 8
Fig.8Cartoon depiction of the relationship between V CS M and w IL | min .Assuming a fixed volume available to the solution, and minimal number of SS, fewer bulky CS are required to fill the remaining volume (10 in this example) than the smaller CS(42).As the number of IL ion pairs is fixed(6), this number makes up a larger number proportion for the bulky CS 6 16than for the smaller CS 6 48 , as illustrated on the right.

Fig. 9
Fig.9Using the random walk model for prediction of g-but OES data.Brown: data (dots) and fit (line) to all measured data.Black: fit (line) to measured or estimated points (dots).Only the measurement at around w IL = 0.4 needs to be measured, the point at w IL = 1 is known and the point at w cell = 0 can be estimated from eqn(8).The single measurement fit differs from the actual data by only 7.4% on average.

Table 1
Coefficients for the n cell versus n IL fits in Fig.2.The equation used was:n cell ¼ d þ mffiffiffiffiffiffi ffi n IL p .The norm of the residuals is given by R 2 .Full data are given in TableS1of the ESI

Table 3
Upper and lower values for polynomial regression model constants A-F, eqn (4), for tested CSs.PC is propylene carbonate

Table 4
Comparison between the models !x max -Aw IL À Fw IL 2 as w IL -1