Marcus J.
Tillotson
a,
Nikolaos I.
Diamantonis
b,
Corneliu
Buda
c,
Leslie W.
Bolton
d and
Erich A.
Müller
*a
aDepartment of Chemical Engineering, Imperial College London, London, UK. E-mail: e.muller@imperial.ac.uk
bbp, Innovation and Engineering, Sunbury, UK
cbp, Innovation and Engineering, Naperville, USA
dIndependent Consultant, Fleet, UK
First published on 28th April 2023
This manuscript provides an overview of the current state of the art in terms of the molecular modelling of the thermophysical properties of fluids. It is intended to manage the expectations and serve as guidance to practising physical chemists, chemical physicists and engineers in terms of the scope and accuracy of the more commonly available intermolecular potentials along with the peculiarities of the software and methods employed in molecular simulations while providing insights on the gaps and opportunities available in this field. The discussion is focused around case studies which showcase both the precision and the limitations of frequently used workflows.
Fig. 1 Influence of the errors in the determination of the separation factor, α (the ratio of vapour mole fraction to liquid mole fraction between two components), on the minimum number of theoretical stages in a typical distillation column. For “difficult” separations (α close to 1), one needs many theoretical stages, e.g. between 100 and 200 stages, and high investment costs. An uncertainty of the order of a few percent can cause deviations which are larger than 100%. Taken with permission from ref. 5. Copyright 2002 Elsevier. |
Physical properties of fluids can be, and have been measured experimentally since the dawn of modern science. Densities, vapour pressures, critical point data, viscosities, solubilities, surface tensions, thermal conductivities, etc. are all routinely measured and collated in existing open-access6 and proprietary databanks. Pure compound data is presently available for several thousand to tens of thousands chemical compounds of interest (depending on the data-bank), a number which pales against the over 120 million chemical structures identified to date.7 When one considers mixtures, a much smaller (relative) number of systems has been explored (e.g. DECHEMA claims to have data for roughly 200000 systems from 80000 constituent pure fluids). Databases themselves grow at a slow pace, to the order of a million data points per year. Entire scientific journals8 devoted to this pursuit give evidence that the progress is very incremental, at the most. This level of “data ignorance” is problematic, and no practical increase in the level of experimentation will allow us to explore this universe in any meaningful way. The implications of this void are vast and not limited to chemical processes: the plight of the pharma industry to discover new drugs is a clear consequence of the asymmetry between our existing physical property knowledge and the phase space that could be potentially searched.9
Notwithstanding their immense value, the slow pace of acquisition (and high cost) of experimental data needs requires them to be extended and supported by theories and numerical models. The appreciation of the impact that models that help correlate and predict physical properties have on the scientific community can be summarized by two Nobel prizes awarded almost a century apart. The 1910 physics Nobel prize was awarded to J. D. van der Waals, for the recognition that analytical (mathematical) models were capable of modelling vapour–liquid equilibria, including the unique phenomenon of the appearance of a critical point. Today, successors of these expressions, better known as cubic Equations of State (EoS) are a staple of engineering design. The van der Waals equation kick-started a century of research into the link between the different macroscopic thermodynamic properties, at the time when science was assessing and understanding the atomic nature of matter. Statistical mechanics ultimately provided the link between the intermolecular forces amongst molecules and the integrated macroscopic observable product of the collective behaviour of ensembles of molecules.10,11 While elegant and self-consistent, the expressions provided by statistical mechanics are only solvable for a handful of ideal cases. Our understanding of the nature of intermolecular forces provided yet another path to the description of the molecular interactions through the solution of Schrödinger's equations. Referring to this, Dirac suggested almost a century ago12 that “the underlying physical laws necessary for the mathematical theory of a large part of physics and the whole of chemistry are … completely known, and the difficulty is only that the exact application of these laws leads to equations much too complicated to be soluble”. The combination of quantum mechanical calculations and highly sophisticated numerical tools provides the promise of an avenue to process and correlate the available experimental data. This, in its own right, is a striving area of research, particularly with the use of group contribution approaches,13,14 but its practical usefulness is limited by the requirements of large computational resources and the availability of experimental data to validate the results.
A more recent chemistry Nobel prize award was presented in 2013 to Martin Karplus, Michael Levitt and Arieh Warshel acknowledging that atomistically-detailed force fields based on information taken from quantum mechanical calculations represented a game-changing approach allowing the computer-based prediction of physical properties. Indeed, in the past 50 years we have experienced an unexpected and unparalleled change in the way scientists view physical property prediction.15 Within the lifetime of many (and in particular of the more senior) industrialists and academics, computational power has progressed immensely on the back of the shade provided by Moore's Law (the empirical observation/prediction that computer hardware speed doubles every two years).16 Digital computers, which started becoming available to mainstream scientists as early as the 1950's, were initially employed for supporting the development of statistical mechanical theories.17,18 Seminal papers describing molecular dynamics19,20 and Monte Carlo21,22 simulations appeared in the literature at this time and garnered the attention of science-driven engineers and companies who saw them as platforms for the calculations of properties of real substances.23 Today, those “heroic” simulations of the past century are dwarfed by equivalent (or much larger) ones performed routinely even as undergraduate-level coursework.24,25 The reader is referred to some of the review papers on the history of molecular modelling and on the overview of its implementation in the physical and chemical sciences26–28 for a detailed chronicle.
The level of molecular modelling described in the preceding paragraph is orders of magnitude less demanding than the full quantum calculations which it tries to emulate and matches well the current availability of rather inexpensive and relatively powerful hardware. This, alongside the accessibility of dedicated software prompts the use of these molecular modelling tools to predict the physical properties of fluids.29,30 The rationale behind this is self-evident. Simulations have the potential to provide for exact solutions of the statistical mechanical expressions that govern the macroscopic behaviour of matter. They provide so within a reasonably inexpensive framework and are able to access regions of phase space inaccessible to experiments because of practical constraints such as toxicity,31 extreme conditions,32 and ocassionally low (or unknown) purity of materials. Furthermore, molecular simulations are in principle able to predict the behaviour and properties of new materials and molecules prior to their synthesis, a key driving tool for innovation in the pharmaceutical and material science industries. Unfortunately, one of the largest challenges faced by molecular simulation is the “hype” or over-selling of the techniques implied by this last sentence. This has led to unrealistic expectations among those seeking to use them to supplement, or even to replace the experimental measurements. A method of modelling all entities and properties of a system with a low degree of uncertainty is certainly the ultimate goal, however such predictive method remains far from being realised. In practice, some degree of judgement is needed to select simplifications which make solution tractable yet do not compromise accuracy. Reasonable expectations must always be borne in mind and only in very few spaces can one find information on the myths and misconceptions of computer modeling.33
Two decades ago, a multidisciplinary panel reviewed the development and applications of molecular and materials modelling across the globe, with particular focus on the deployment within industrial companies.34 The report and the interviews with the leaders and practicing engineers and scientists, reflect the mindset of the time – that molecular modelling was for only very difficult ultra-high value problems, such as the discovery of a new catalyst (ostensibly because they were experimentally challenging), and pursuing the presumably simpler things, such as predicting fluid properties, was not so much a high priority – as measurement was often cheaper (and always more accurate!) than simulation. Nonetheless, the complexity of doing such “high value” calculations was consistently underestimated (e.g., it was not always obvious what to simulate) and the results were rather underwhelming. The report already highlighted the disproportionate hype and over-promising that affected the field, maybe just a manifestation of the excitement of the players at the opportunities that lay ahead. In fact, this report suggested that in the ten years following it, “molecular based modelling approaches (understood here as the collective of computational quantum chemistry, molecular simulations, mesoscale modelling, molecular-structure/macroscale property correlations and related information technologies) would profoundly affect how chemistry, biology, and materials physics are understood, communicated, and transformed to technology, both in intellectual and commercial applications”. It anticipated that in the near future (i.e., today) experimentalists and management would not only become used to accepting the use of molecular modelling, but they would expect it.3,35 But a decade past this deadline, has this prophesy been fulfilled? Where do we stand with respect to the balance between promise and delivery? This manuscript is focused on providing an objective assessment and a guide to manage these expectations and to understand both the opportunities and limitations of the molecular modelling36 from the viewpoint of a joint academic/industrial experience, complementing some of the previous guidance in the open literature.37,38
Specialized hardware has been designed for the unique task of performing MD simulations,46 which is capable of running a 23558-atom benchmark system at a rate of 85 μs day−1. More recently, the community has benefitted from advances derived from the gaming industry, for which graphical processing units (GPU's) were developed at relatively low cost. It was rather straightforward to adapt these game boards to molecular modelling software, as parallel algorithms are now the norm for MD simulations.47 The result was a noticeable increase in the available computational power and the possibility of having desktop units performing at the level of the supercomputers of a few decades ago (see for example the comments made during a recent Faraday Discussion on the topic48).
In summary, Moore's law16 has been a contributing factor to the explosion of capabilities in the field. However, no amount of computer power will ever suffice the researcher, who will always find a bigger and more complex system to study. Notwithstanding, the sustained increase in computer power (understood either as the increase in simulation speed per unit of currency, or the maximum speed, or the maximum floating point operations per seconds) with time clearly suggests that hardware is not the most relevant bottleneck of physical property predictions49 (or at least is not one we have control over). For more accurate and subtle overviews on the increase in computer power for simulations (and especially biomolecular simulations) the reader is referred to the discussions by Schlick et al.50,51
Quantum computers hold a significant promise in terms of propelling chemical computations (in particular, the solutions of Schrödinger's equations) into a new era, by providing several orders of magnitude of computational power above their digital counterparts. While the so-called “quantum supremacy” has already been hailed,52 it is unlikely that large-scale quantum computers will be commonly available in the near future. Certainly, there are already glimpses that they could provide for a transformative change in the way thermophysical properties are predicted.53 This is, however, a very long-term prediction which would be foolish to appraise.14
In general, nowadays, software for running simulations is readily available, both in free open-access format and in commercial suites. Most of the “free” packages are maintained by, or have their roots, in collaborative academic programs.55 These codes are a compilation of many coder/years and as such are not simple for the novice to employ. It might take a typical graduate student a few months to be reasonably confident with the use of the programs, even when having had a previous knowledge of the theory behind modelling. The choice is bewildering and is commonly driven by familiarity rather than by performance. While some open-access programs have well-established user-manuals and a community of users willing to provide guidance, it is evident that there is a large entrance barrier to be able to use these codes efficiently. Even at this stage, the use of the programs is far from straightforward and much care need to be taken in running and interpreting the results. With each of the several steps associated with performing a molecular simulation (setup of the system and data files, production runs, data analysis and visualization), one encounters a large choice of stand-alone programs which may or may not be used in an integrated fashion. Commercial packages56 strive to bridge this gap, providing for graphical user interfaces (GUI), pre-packed force fields, and technical support, all for a significant price. The field is now being driven towards open-source suites which integrate different programs, e.g. the MoSDeF software stack.57
In our view, one of the hazards here (and not one which uniquely applies to molecular modelling) is that user-friendly GUI-based tools enable non-expert users to set-up and successfully run molecular simulations, but not necessarily the most “appropriate” ones. Here, the story of the “Sorcerer's Apprentice”,58 comes to mind.
Atomistically-detailed, also referred as all-atom (AA) analytical force fields (e.g. OPLS,59 AMBER,60 CHARMM,61 COMPASS62etc.) developed historically for biomolecular simulations are now extensively used in engineering and in the physical chemistry fields.63 The underlying functional form of these force field families are strikingly similar (cf.Fig. 2), with differences in the technical details and parametrization strategies. The original force-fields based on the general form shown in Fig. 2 (sometimes referred to Class I force fields) have been expanded by the inclusion of cross terms describing the coupling between stretching, bending, and torsion leading to a new category of force fields, referred to as Class II. The most important recent enhancement to the standard potential is the inclusion of polarization effects explicitly.64,65 In general, the force-field approach has proven extremely successful, however, a high accuracy in the results is closely correlated to the fact that the simulated system of interest is constructed from a set of similar previously parametrized chemical moieties. Force fields of this sort are empirically optimized to reproduce internal molecular degrees of freedom calculated through quantum mechanics (QM) and some limited experimental properties.66 This fitting process is difficult and cumbersome,67 and most often relies on reaching compromises amongst multi-objective functions and/or on employing ad hoc procedures. The consequence is that, in general, the force field will perform well for the description of the systems and properties which were used to parametrize it, but issues of representability, robustness and transferability68,69 will inevitably be present. There is no force field that can claim to be useful for all systems of interest and it is not uncommon to find that a force field of choice lacks parameters for certain atom groups. Consequently, the engineer must use his/her judgement and expertise to select from a library of force-fields each having some advantages and disadvantages. This choice can have a profound impact on the simulation results.
Fig. 2 Hand-drawn representation of the total potential energy contributions of a molecule as the sum of simple analytical terms allowing for bond stretching, bond angle bending, bond twisting, van der Waals interactions and electrostatics, attributed to Shneior Lifson in the early 70's. Almost all force fields employed today still retain most of the elements shown above. Reprinted by permission from SpringerNature from a retrospective by Michael Levitt, ref. 70. |
Coarse graining (CG) is a term that refers to the use of simplified molecular models, where the atomistic detail is removed and substituted by the description of molecules in terms of “super-atoms” which represent, typically, a small number of heavy atoms. For example, the TraPPE71 force field, parametrized extensively toward liquid state and phase equilibrium properties of fluid systems widely encountered in chemical engineering, recognizes that the effect of explicitly considering the hydrogen atoms in organic molecules adds a substantial degree of complexity to the calculations which is not balanced by a corresponding additional degree of accuracy. Hence, in a first level of coarse graining, “united atoms” are considered, where the influence of associated hydrogen atoms are included in effective heavy atom beads. In a further degree of coarse graining, a propane molecule could be modelled as an isotropic spherical bead where all the electronic details, the intramolecular vibrations, bond fluctuations and molecular topology are incorporated within a point pair-wise interaction model. Further levels of integration (and corresponding lower fidelity coarse graining) for example, would be suitable for the description of entire polymer segments as single particles.72 One of the key issues in developing CG force fields is the methodology used to parameterize the intermolecular potential.73 Although not uniquely, most CG approaches start with an atomistically-detailed model and integrate out the degrees of freedom not deemed to be relevant.74 This procedure, by its own nature, removes information and the resulting force field is inherently deficient, especially in terms of transferability and robustness. Furthermore, it is driven by the judgement and expertise required to decide what is and is not “relevant” detail in a simulation problem. A fundamentally different “top-down” approach can be employed where the CG potential parameters are optimized to reproduce the macroscopically observed thermophysical properties (instead of integrating high fidelity atomistic models).75,76 With judicious choices, the resulting models and force fields have proved that they can be of equal or superior accuracy to the current state-of-the-art quantum and atomistic models77 while reducing the computational requirements by several orders of magnitude.
A new area of research has spawned in the quest of expanding the range of applications of accurate quantum calculations by correlating them with the aid of machine learning (ML) models78,79 (cf. Section 5.5). This approach, in principle, allows the production of force-fields with a high level of accuracy which could be embedded in classical MD simulations80 but at a significantly reduced computational overhead compared to the underlying quantum models.81 A particularly interesting feature of ML force fields is that they can be made to include many-body interactions in a natural fashion without having to pre-empt any particular mathematical closed form. Topical research in the field aims at the production of “universal” force fields, as for example the ANI-182 and GAP83 family of potentials. These workflows attempt to both generalize and extend the results obtained from quantum-based calculations (such as density functional theory). While useful for describing electronic structure prediction, reactions and solids, they struggle with the prediction of dispersion interactions, which, unfortunately, are key to most of the liquid and dense fluid properties84 as even the requirement of “quantum chemical accuracy” has proven to be woefully unsuitable for fluid-phase properties.85
For both the novice and for environments where dedicated resources are scarce, turn-key solutions are becoming available, which include a “black-box” equipment pre-loaded with the required software and databases, comprising the required force fields and GUIs that allow the rapid set-up and running of molecular simulations.100 This presents a key problem: the outcome of a molecular simulation is dependent on the expertise of the individual selecting and applying the force-field. It also supposes that a force-field capable of correctly describing all of the different molecules in the system exists, whereas in practice most force-fields are optimised for particular components or chemical families.
To clarify, consider an apparently simple task, such as determining the density of an unknown compound, 1-(2-hydroxyethyl)-2-imidazolidinone (HEIA), a compound of interest for capturing carbon dioxide from flue gas. No experimental data are (yet) available in the open literature. Correlations are not applicable and common equations of state and models fail as the molecule has unknown critical point and has peculiar non-idealities (a consequence of a significant dipole moment and the distortion of the electronic clouds due to the presence of the nitrogen centres). Fig. 3 shows the results of the density predictions of a well-developed force field (PCFF+)104 as tested against unknown and undisclosed data, along with the results of a similar molecule, 1,3-dimethyl-2-imidazolidinone, (DMIA) sharing the same atoms and functionalities, but for which there are published data. There is a clear discrepancy between the accuracy of the results for the two molecules, while for DMIA the simulation data fall within the spread of the experimental data, the results for the unknown HEIA show a systematic underestimation of the density by 2%. A more meaningful measure, however, is the proportion of the difference between the densities of the two species (i.e., more than 10%).
Fig. 3 MD simulations with the PCFF+ force field are able to represent the properties of a known compound (DMIA – blue symbols) to which the force fields parameters are tuned, however it fails to accurately predict the properties of an “unknown” compound with similar morphology (HEIA). Open symbols are simulation data, closed symbols are experimental data.105–109 |
In general, the accuracy of molecular simulations to predict on liquid densities will be close to 2% for mono-functional molecules and 4% for multi-functional molecules. Exception is the region very close to the critical point, where in most cases, even the shape of the phase envelope is poorly captured. This is caused, amongst other factors, by the fact that the finite size of the simulations precludes the capturing of the large fluctuations and long correlations lengths (which rapidly exceed the dimensions of the simulation boxes). Notwithstanding, the main limitation here is most likely the adequacy of the force fields used.36Fig. 4 shows a further example where one can appreciate the extent to which the liquid density predictions for some cyclic and polycyclic compounds can be made with united atom models, with typical errors less than 1%.
Fig. 4 The TraPPE-UA potential, modified for ternary and quaternary carbons in naphthenes and aromatics, provides excellent liquid density predictions for a wide range of molecular conformations. Symbols represent predictions of the TraPPE-UA force fields compared to experimental liquid densities (straight line). There was no significant drift in observed deviation with regard to chemical family. Adapted with permission from ref. 110. Copyright 2019 Elsevier. |
From a simulation point of view, vapour pressure calculation requires a two-phase system, where the saturated phases coexist (in sufficient quantity). This can be particularly challenging for low vapour pressure compounds (e.g. molecules with a MW > 200 Da) where the statistics will be poor due to the constraints on system sizes (as the density of the vapour phase will be very low). Pressures are usually calculated in simulations relying on the evaluation of largely fluctuating quantities.112 The direct consequence of this is that an average error of 2–4% is generally expected for the normal boiling point for most small molecules, although the error in the vapour pressure (at other temperatures) is generally much larger. Fig. 5 shows a typical example which also highlights the effect of using different force fields.
Fig. 5 Comparison of the average absolute deviations of the AUA and TraPPE-UA force fields for the prediction of boiling point data for ethers and glycols. Boiling point calculations differed against DIPPR data by no more than 5% for TraPPE-UA and no more than 4% for AUA. Adapted with permission from ref. 113. Copyright 2013 Taylor & Francis. |
A closely-related property is the heat of vaporization, ΔHvap, which can be obtained from the slope of the vapour pressures curves on the P–T representation. The heat of vaporization can be obtained directly from GEMC, as it is obtained by assessing the configurational energy difference between the equilibrium vapour and liquid simulation boxes (plus the corresponding PΔv term). Furthermore, the calculation of a single value of ΔHvap and a point on the saturation line is enough to, in principle, calculate the whole saturation line via thermodynamic integration.114 In addition, the cohesive energy difference, and the related Hildebrand solubility parameter, can also be obtained directly from equilibrium simulations, (particulary GEMC), recording the energy difference amongst phases. It is obvious that the fidelity in describing the vapor pressure impinges on the accuracy of the prediction of the heats of vaporization (and the solubility parameter). Even for simple fluids (such as alkanes), they typical error is in the order of 10%.115 The agreements (or disagreements) show evidence of the quality of fit of the energy scales (or in essence the non-bonded interactions) of the force fields.116
Property | Experimental value | SPC/E | TIP4P | ||
---|---|---|---|---|---|
Error (%) | Error (%) | ||||
Heat of vaporization | |||||
ΔHvap/kcal mol−1 | 10.52 | 11.79 | 12 | 10.65 | 1.2 |
Vapour pressure | |||||
Pv (350 K)/bar | 0.417 | 0.14 | −66 | 0.57 | 37 |
Pv (450 K)/bar | 9.32 | 5.8 | −38 | 13.3 | 43 |
Liquid density | |||||
ρ (298 K)/kg m−3 | 997 | 994 | −0.3 | 988 | −0.9 |
ρ (450 K)/kg m−3 | 890.3 | 860 | −3.4 | 823 | −7.6 |
Shear viscosity | |||||
η (298 K)/mPa s | 0.896 | 0.729 | −19 | 0.494 | −45 |
η (373 K)/mPa s | 0.284 | 0.269 | −5.3 | 0.196 | −31 |
Interfacial tension | |||||
γ (300 K)/mN m−1 | 71.73 | 63.2 | −12 | 59 | −18 |
γ (450 K)/mN m−1 | 42.88 | 36.7 | −14 | 27.5 | −36 |
While water itself is a fascinating fluid, it rarely exists in pure form. Many areas of interest exist for the study of electrolyte solutions, including biological, geological and industrial applications. The scope of properties of interest includes solubility, osmotic pressure, chemical potentials, activity and osmotic coefficients. To its merit, simulation has the potential to explore a diverse range of concentrations from the infinitely dilute up to the solubility limit and beyond into the thermodynamically metastable supersaturated regime. Central to the simulation of brines, polyelectrolytes, deep eutectic solvents131 and/or ionic liquids132 using classical force fields is the incorporation of many-body forces such as polarizability and charge transfer. Early on it was recognised that the use of fixed charges in the studies of ionic liquids leads to unsatisfactory prediction of transport properties (viscosity was overestimated and diffusivity underestimated with regard to experiment).133–135 Additionally, ions exhibited overly strong clustering resulting in too low solubility values as a result of the system phase separating into a salt rich phase and a salt poor phase.136–141 Unsurprisingly, the results for solubility improve if polarisable models are used.142,143 Examples of polarisable interaction charge models include fluctuating charge models,144 AMOEBA145 and models based on Drude oscillators.146Fig. 6 shows a typical example of the improvement that can be achieved in the description of the mean ionic activity of an electrolyte in water. However, incorporating polarizability in simulations is demanding and only partially captures the effect of charge transfer. A work-around involves utilizing charge scaling of ionic charge values122,147,148 Scaling the short-distance ion charge interaction improves the ion–water term and leads to more realistic ion pairing and clustering. Electronic polarizability depends on the density of the polarisable ions so unavoidably, the scaling factor will also depend on the density.
Fig. 6 Mean ionic activity coefficient, γ, of NaCl in water at (red) 298.15 K and 1 bar and (black) 473.15 K and 15.5 bar versus molality, m. Solid lines are experimental data. Filled symbols are polarizable models, open symbols are fixed charge models. Adapted with permission from ref. 149. Copyright 2015 American Chemical Society. |
In summary, (i) predicting properties of pure water is a challenging activity on its own and we are still far from being able to champion a force field that is satisfactory for all properties, and (ii) predicting properties of mixtures involving water is likely to be highly prone to a high degree of error as models not only need to be good both for water and for the other molecules involved but require physically accurate description of the density dependent polarizability effects.
Since the size, shape and flexibility of the molecule are key properties for the accurate description of transport properties, one would expect that all-atom models would excel in this regard, and progressively, as the refinement in the potentials decreases, that the quality of fit would decline. However, united-atom and even coarse-grained models can be successfully employed to describe transport properties.
An example is provided in Fig. 7 comparing the shear viscosity of three linear alkanes (n-decane, n-hexadecane and n-docosane) with different force fields. While one may improve the results with an appropriate choice of force field, the overall accuracy of viscosity calculations is low (with typical errors of 35%).
Fig. 7 Percentage error in the calculation of shear viscosities as compared to experimental data for long chain alkanes. Colours refer to different force fields, blue symbols are TraPPE,67 green and red symbols are Mie-based potentials.152,153 Adapted with permission from ref. 154. Copyright 2019 Elsevier. |
A particular pitfall of molecular modelling of viscosity arises when one approaches relatively low temperatures, close to those where the melting (freezing) is expected to occur. In this region, the fluid viscosity typically increases exponentially, as the system becomes progressively arrested. This behaviour is rarely captured by common force fields for fluids as they seldomly reproduce accurately the correct solid (crystal) phase(s). The determination of the melting point of a compound depends crucially on the details of the potential but even more importantly, on the electronic structure and polarizability. Researchers in the area of crystal structure prediction routinely fit intermolecular force fields that capture the different polymorphs of rigid molecules using a combination of ab initio (quantum calculations) and empirical components,155 or by DFT calculations.156 While it would seem logical that said potentials would give accurate representations of melting points and/or be good choices to predict fluid phase behaviour, the practice suggests otherwise, and the potentials fit to the solid phases do not accurately reproduce the thermophysical properties of the corresponding liquids. In fact, it is rare to find molecular models that can be used for both solid and fluid phases.157 This apparent inconsistency points to some of the underlying limitations of the standard “Lennard–Jones plus point charge” models in current use for fluids and to the fact that some of the assumptions (such as the anisotropy of the atomic van der Waals radii, the neglect of multi-body effects and of polarizability or the non-spherical features of the atomic charge densities) might be important.158 Analogous arguments can be made with respect to the calculation of self-diffusion coefficients close to the melting point and of second order phase transitions (e.g., glass transition temperatures159) and related properties such pour point calculations and wax appearance temperatures.160
Although the calculation of viscosities at high pressures will inevitably explore very high molecule packings, it seemingly does not suffer from the limitations of an underlying solid phase, and one can confidently explore up to the GPa region161 in as long as the system retains fluidity and the molecular model retains semblance to the full atomic structure (CG models inevitably fail, as they are incapable of resolving the correct molecular packing162). A study by Ewen et al.163 compares the viscosity of hexadecane at ambient conditions to those at high pressures (and moderate increases in temperature) for a series of force fields. Although the density predictions are satisfactory (below 15% error) for all cases, the viscosity predictions were consistently under-predicted by UA force fields. Furthermore, for some AA force fields, viscosity predictions were two orders of magnitude higher than the experimental value due to crystallization. The predictions are poorer at higher pressures where the molecular 'roughness' has a greater impact on the viscosity prediction suggesting that the molecules move past each other more easily at low density conditions. Similar arguments come into play for very polar fluids and those where hydrogen bonding is relevant. The physical representation of hydrogen atoms may not be important at ambient conditions but become relevant at high pressures where the accurate description of the structure of the fluid becomes important. The use of UA force fields is likely to have a detrimental effect on the simulated friction coefficients for the interest of tribological situations. In spite of this, they continue to confer an advantage for capturing the trends in large, complex systems due to their relatively low computational expense.164 In a similar way, polymer structural properties such as glass transition temperatures, entanglement, relaxion times, can be confidently explored with AA and UA models, provided the systems are large enough to avoid finite-size effects.165
In the study by Bellaire et al.,171 the self-diffusion values of binary mixtures in bulk were measured by 1H pulsed field gradient PFG-NMR spectroscopy. This is a method that enables the tracking of diffusive motion without perturbing the system. The D from MD simulation was acquired using rigid multi-centre Lennard–Jones (LJ) models with superimposed point dipoles and point quadrupoles. There was good agreement between the experimental data and the MD simulations for the D of simple organics (e.g. toluene/cyclohexane mixture). However, a challenge to the simulation is observed for the toluene/ethanol system (Fig. 8). The poor agreement with MD simulation is presumably a consequence of the effect of hydrogen bonding not being well described. This is a particularly common occurrence when dealing with mixtures in which the cross-interactions are of a different nature than those occurring in the pure components (e.g., water-alkanes; CO2-organics; ketones-alcohols, etc.).
Fig. 8 PFG-NMR data (open symbols) for the self-diffusion, D, of individual toluene or ethanol molecules in the binary mixture at 298.15 K and ambient pressure as compared to simulation data (filled symbols). Adapted with permission from ref. 171. Copyright 2018 Springer Nature. |
The prediction of interfacial and transport properties is a strenuous test for any force field within molecular models since most intermolecular potentials are fitted to a set of properties (e.g., densities, heats of vaporisation, radial distribution function) in the homogeneous fluid state. Therefore, the representation of other thermodynamic state points not involved in the original fitting can be employed as a gauge to the overall performance of the molecular model.
Fundamentally, the interfacial tension is directly related to components of the macroscopically observed pressure tensor, and to the vapour and liquid densities, hence the quality of the prediction depends crucially on these factors. Models such as those for water (see Section 4.3), which consistently provide for poor estimates of vapour pressure, provide equally poor predictions for the interfacial tension. For example, Underwood and Greenwell174 exemplify how the surface tension is underestimated by at least 15% by most water models, except TIP4P2005 which underestimated the value by 7%.1
An interesting comparison of models took place during the 9th industrial fluid properties simulation collective (IFPSC) competition.71 Liquid–liquid interfacial tensions were investigated for binary mixtures of dodecane + water, toluene + water and a 50:50 (wt%) mix of dodecane/toluene + water at 1.825 MPa (250 psig) and 110–170 °C. A wide range of models and techniques were tested (Fig. 9), which included atomistic models, semi-empirical theories and coarse-grained models. In most of these simulations interfacial tensions are calculated from an elongated simulation cell, sampled through either molecular dynamics or Monte Carlo methods where the global composition is predetermined.175 A consequence of this set-up is that the coexisting phase compositions can not be specified a priori, but result from the combination of the phase split and the (usually unknown) surface enrichment at the given pressure and temperature. The use of Gibbs Ensemble Monte Carlo simulations circumvent this latter problem.176 Binary interaction parameters describing the cross-interactions were obtained by fitting constituent binaries at lower temperatures and pressures, then taken as constants for all conditions and mixtures studied. The spread of the computed results was very high, with overpredictions of up to 10 mN m−1 although the trends with temperature were followed faithfully. Importantly, the two-phase simulations were able to shed light on the molecular detail of the interfaces (see Fig. 10).
Fig. 9 Benchmark and predicted interfacial tensions for the water/50:50 n-dodecane + toluene mixture. The uncertainty is of the order of 1 mN m−1 for the experimental data and aprox. 0.5–3 mN m−1 for the predictive methods. Adapted with permission from ref. 77. See original source for the full discussion of the methods and force-fields used. Copyright 2018 Elsevier. |
Fig. 10 Snapshot of the liquid–liquid interface of an equilibrium configuration of the ternary mixture of water (blue) + toluene (red) + dodecane (green) at 130 °C from the winning entry of the 9th IFPSC challenge.177 The ternary mixture was overpredicted by an average of 1.3 mN m−1. It is seen that the liquid–liquid interface of the water–toluene–dodecane mixture is very diffuse, spanning about 3 nm. The aqueous phase is essentially pure water, whilst an appreciable amount of water is seen to diffuse into the organic phase. As the temperature increases the interface becomes wider, the toluene enrichment is less pronounced and the interfacial tension decreases. Taken with permission from ref. 177. Copyright 2018 Elsevier. |
While there has been considerable attention placed on the prediction of volumetric properties, much less attention has been given to directly measurable caloric properties, exemplified by heat capacities, thermal conductivity and the coefficient of thermal expansion. In the particular case of heat capacities, classical simulations provide only the properties corresponding to the configurational contribution, (i.e., the contribution that stems from the intermolecular force field). However, the contributions stemming from the translational (and rotational) degrees of freedom need to be explicitly and independently taken into account. In the case of flexible molecules, the other further contributions arise from internal degrees of freedom and intramolecular potentials.181 The resulting value is a sum of terms which need to be calculated both from a pure fluid simulation and an additional single-molecule in vacuum calculation.182 The prerequisite of knowing the ideal gas contribution is by no means a constraint, as this quantity can be estimated by using well-established quantum mechanical techniques (or even by semi-empirical group contribution methods) with errors of a few percent.
The MD predicted heat capacity at constant pressure, Cp, is often overestimated183 indicating more energy is adsorbed (i.e., in molecular modes of vibration) for every degree of temperature rise. To this end, Fig. 11 showcases the calculation of the thermal conductivity of n-decane employing all-atom force fields.184 The thermal conductivity is found to be heavily overestimated, interestingly enough, the TraPPE-UA force field provided better performance in comparison to experimental data, presumably due to the removal of high-frequency degrees of freedom that act as quantum-mechanical oscillators and do not contribute to thermal conduction.
Fig. 11 Comparison of the thermal conductivity of n-decane for different force fields at 3 MPa. Experimental data (solid line). Figure adapted with permission from ref. 184. Copyright 2021 Elsevier. |
Other properties which are influenced by the internal degrees of freedom require similar care in their calculation, such as bulk (not shear) viscosity, κ, is also known as the volume or dilation viscosity.185 Additionally, derivative properties, such as the Joule–Thomson coefficient, μJT, which are commonly used to validate both equations of state and force fields, can be calculated from intermolecular potentials.186,187
As an example of how the different implementations of a program can affect the ultimate result, consider the simulation of the density and vapour pressure of pure CO2. Details of the force field and computations are provided in ref. 142. A comparison is made between a small selection of available simulation packages (Table 2).
T/K | DL_POLY | GROMACS | ms2 | GIBBS |
---|---|---|---|---|
Vapour pressure/MPa | ||||
220 | 0.496 | 0.652 | 0.636 | 0.629 |
240 | 1.211 | 1.343 | 1.264 | 1.284 |
260 | 2.416 | 2.483 | 2.412 | 2.422 |
280 | 4.24 | 4.19 | 4.07 | 4.08 |
Saturated liquid density/kg m−3 | ||||
220 | 1144 | 1149 | 1151 | 1153 |
240 | 1067 | 1075 | 1077 | 1077 |
260 | 980 | 988 | 993 | 994 |
280 | 868 | 875 | 889 | 888 |
A further comparison with the expected theoretical results is provided in the ESI.† In any case, the important comparison is amongst the simulation results, as they should all reproduce the same values, within the simulation uncertainty. For most cases, this is true, although some values calculated with DL_POLY are seen to be outliers, particularly at low temperatures. It is for this analysis that the human element (sometimes driven by experience) is invaluable. In this particular case there is a cross-reference which serves to validate the results, but the bigger question is what happens when such a gauge does not exist. While this is by no means an exhaustive comparison,109 it does show that care has to be taken in both the selection of the program employed to resolve the molecular simulations and in the subsequent interpretation of the results.
On the other hand, simulations must be treated as if they were experiments performed in silico. They are subject to statistical uncertainty, due to the relatively small system sizes and configurations that are explored.196 They should be repeated or cross-checked if the data is to be used for sensitive calculations. As with the experiments they try to supplant, some types of calculations are prone to larger errors, e.g. the interfacial tension calculations depend on the monitoring of the difference between two large quantities (the normal and tangential elements of a pressure tensor), which themselves are subject to large fluctuations, while others, e.g. densities are relatively insensitive to the simulation parameters. The expected fluctuations and statistical uncertainty of the results should always be reported and taken into consideration.
For the case of polymers, the problem becomes apparent by the sheer nature of the molecular weight of single molecules. If one considers that even with state-of-the-art equipment, it is unrealistic to model more than a few million atoms at a time (cf. Section 3.1), it becomes apparent how futile it can be to attempt to model a realistic high molecular weight polymer blend or a complex biological system. However, another aspect of the problem is that of the time scale involved. Molecular dynamics can only “observe” events whose expectation times are typically in the order of nanoseconds and in the best of cases of the order of fractions of microseconds. However, it is very possible that the characteristic time of the systems easily exceed the simulation time. The long-standing scientific question of qualitatively predicting protein folding events is a particularly extreme example of a research question which surpasses the current and foreseeable capacity of atomistic modeling.202 However, more mundane problems, regarding the self-assembly of soft matter or the solidification of simple fluids will also encounter the same limitations. While in some cases the limitations are obvious, in others they are not – and therein lies the danger. Fig. 12 showcases the results of performing a reasonably large simulation of a system of 27 asphaltene-like molecules dissolved in a good (toluene) and a poor (heptane) poor solvent. Both simulations (Fig. 12 middle) appear to reach a plateau in terms of the average cluster size, suggesting that no further aggregation is expected after 80 ns of simulation time. It is only when looking at a simulation which is an order of magnitude longer does one perceive the actual physical behaviour which implies a rather complete clustering (precipitation) of the asphaltene in hexane.
Fig. 12 All-atom simulations of 27 asphaltene C molecules in 7% mass heptane (green) and toluene (red). The plots show the average number of asphaltene molecules in a cluster. Middle corresponds to an 80 ns simulation while right showcases the results of a 0.5 μs simulation. Taken with permission from ref. 203 and 204. Copyright 2017 American Chemical Society. |
In the cases where either the size complexity is unsurmountable and/or the disparity in the observed and inherent time scales becomes important, it might be necessary to consider coarse-grained (CG) models. A distinction is made here between the models focused on describing the general phenomenology and those models which are quantitatively accurate and can be used directly to estimate thermophysical properties.
Fig. 13 showcases the results of the simulation of a CG model of long chain complex models, namely atactic polystyrene (PS) in n-hexane solvent modelled using the SAFT-γ force field.70 The model is employed within large-scale simulations that emulate approximately one million atoms and serve to describe the temperature-composition fluid-phase behaviour of binary systems. A single temperature-independent unlike interaction energy parameter is employed to reproduce experimental solubility behaviour; this is sufficient for the quantitative prediction of both upper and lower critical solution points and the transition to the characteristic “hourglass” phase expected for these systems. Transferability was demonstrated through the ability to represent PS models of different molecular weight. Noticeably, the values of the diffusion coefficient were between 2 and 3 orders of magnitude higher than experiment, presumably due to the lack of friction from the coarser representation resulting in faster dynamics.
Fig. 13 Temperature–volume fraction (Φ) phase diagram for polystyrene (MW = 4800) + n-hexane. The snapshots correspond to equilibrium configurations of the system at different temperatures. Greyed-out regions correspond to the two-phase regions. Dashed lines are smoothed experimental data. Taken with permission from ref. 205. Copyright 2017 American Chemical Society. |
The method requires each molecule to be described by a quantum chemically generated charge density (σ) surface. The 3D distribution of the polarisation charges σ on the surface of each molecule X is converted into a surface composition function, pX(σ) (a histogram function called the σ profile). This describes the amount of surface with polarity σ for each molecule. The interactions between molecules are modelled by applying statistical mechanics on the interactions between surfaces. The properties of phase equilibria, solubility of solid solutes including polymers in different solvents, prediction of acid dissociation constants, partitioning in micellar systems and modelling of systems that contain isomers can be calculated from the σ potential of the mixture. The main advantage of quantum-chemically generated charge densities is that it enables the prediction of properties of molecules that include functional groups not available from traditional group contribution methods.
A popular implementation, known as COSMO-RS209 positions itself as an alternative predictive method to the rather empirical, structure-interpolating group contribution methods and the relatively longer time-consuming force field methods based on Monte Carlo and Molecular Dynamics simulation. As such, it has become one of the standard industrial tools for the simulation of fluid phase thermodynamics and especially for solvent and solute screening.210,211 COSMO-RS is implemented in the commercial software package COSMOtherm.212 Open-source implementations of a competing version (COSMO-SAC) are, however, available online.213
While COSMO models excel at calculating solvation properties (e.g., solubilities of small molecules in organic solvents, pKa, etc.) their performance in calculating fluid phase equilibria is comparable to the other force field methodologies, and to traditional excess Gibbs energy models (e.g., UNIFAC). A key area where COSMO models are a gold standard are for the prediction of ionic liquids mixtures.214–217
The 6th IFSPC challenge218 was set up to predict liquid–liquid equilibria (LLE) of a commercial dipropylene glycol dimethyl ether (DPGDME) isomeric mixture and water at ambient pressure and a range of temperatures. Modellers were provided with the results of the liquid–liquid equilibria at room temperature and asked to predict the temperature dependence of the mutual solubilities in the aqueous and ether-rich phases. The experimental data shows an inverse temperature dependence on solubility, i.e., increased solubility at lower temperatures, resulting from the interplay of hydrogen bonding and hydrophobic interactions. It represents a system consisting of interactions from a conformationally flexible and relatively large solvent molecule with the small, strongly hydrogen bonding solvent water molecules. This DPGDME/water LLE system is challenging to predict using molecular simulation for two reasons; (i) the DPGDME molecules are much too large to allow direct particle exchange in Gibbs ensemble Monte Carlo simulations and (ii) the DPGDME diffusion coefficients are extremely low making molecular dynamics simulations of phase transitions impractical. In addition, the DPGME molecules are amphiphilic due to a balance of hydrophilic and hydrophobic moieties and hence are likely to form complex aggregates in the aqueous phase. Fig. 14 shows the resulting predictions obtained by employing configurational bias Gibbs ensemble Monte Carlo simulations using the TraPPE-UA force field219 (along with the TIP4P water model). Alongside are the predictions employing two flavours of the COSMO models: COSMO-RS220 and COSMO-SAC.221 The predictions are of remarkable accuracy, especially at high temperatures. The industrial referees commented that “The molecular structures studied here are neither especially large nor exotic, yet predictions of their phase behaviour by molecular modelling represent a very significant challenge (especially in the absence of any experimental data)”.36 It is important, however, to note that the modellers were able to fine-tune the models (cf. red symbols in Fig. 14), as the predictions tended to degrade significantly without that “calibration”.
Fig. 14 Liquid–liquid equilibrium of POLYGLYDE MM (a mixture of dipropylene glycol dimethyl ethers) in water. Experimental calibration points (red) were provided to fine-tune parameters for the models. Experimental results and cloud point data222 are compared against results from Gibbs ensemble Monte Carlo using the TraPPE-UA force field, and two versions of the COSMO approach. Adapted with permission from ref. 220. Copyright 2011 Elsevier. |
A path which is not currently mature, but is gaining considerable momentum, leads from the reasonably automated production of “pseudo-experimental” data from quantum-based simulations and other less refined scales of modelling to the full prediction of properties.72,242,243 The workflows to make these techniques available to the process industry and in particular within process simulators, as we currently do with classical theories and correlations, is still in its infancy.244–246 A key tool here will be the development and maturity of machine-learned potentials, capable of combining quantum ab initio accuracy with a much more manageable computational overhead. One can confidently foresee that data-driven research will be a key element of our toolbox in the future.
The underlying question, however, is whether the current molecular modelling approaches have matured enough to become an infallible and universal tool to predict thermophysical properties of fluids and the immediate answer is probably negative. For the “simpler” properties discussed in this manuscript (densities, vapour pressures, viscosities, etc.) and “simple” organic fluids, the outlook is promising. However, the systems very rapidly become challenging (and the results unreliable) as the demand to incorporate complexity increases. Furthermore, molecular modelling is not simple to deploy in an industrial scenario, requiring specialized software used by experts with a broad knowledge of its pitfalls and limitations. More accurate computer modelling, in the realm of quantum mechanical calculations is even more difficult, less applicable to larger systems and complex molecules and even further away from practical deployment. On the other hand, experimental determination is time-consuming and requires specialist laboratories and personnel. The real “competition” comes from empirical correlations, including group contribution models.247,248 These models have been refined over many years, employing large amounts of data, and provide for a rather robust interpolation in the domains to which they have been fitted. Molecular modelling, however, has a significant advantage: once a potential has been validated (or fitted) to a particular state point, the extrapolation to other closely related conditions can be confidently made. Furthermore, although we have not dealt with it here, molecular modelling is capable of providing a wealth of additional information, which might be relevant to the user and would be challenging to obtain through either experimentation or correlation. We think here of molecular-level characterizations such as the distribution of molecules along interfaces and within clusters, incipient stages of phase separation, liquid crystal behaviour, etc. Similarly, related properties such as water–octanol partition coefficients,223,249 infinite-dilution activity coefficients, etc. are usually well-behaved extrapolations which build upon the underlying robustness of the physics behind the force fields. Notwithstanding, while we have high hopes for the digitalization of thermodynamics,250 the reality is that we are still far away from achieving success and substantial progress remains to be made. The sheer diversity and complexity of chemical systems continues to defy attempts to find a universally applicable, yet tractable, approach for accurate, reliable simulation. It remains necessary for modellers to apply deep insight and judgement to choose appropriate models and apply them effectively. This uncertainty in the accuracy of the prediction from simulations employing classical force fields has a critical impact on our ability to trust the results. A method of modelling all entities and properties of a system with a low degree of uncertainty is certainly the ultimate goal, however such a predictive method is far from being realised.
• Never trust the results of a simulation on the basis that they come from a computer; the greatest strength of a computer is the speed of the computations, not its “intelligence”. There is still no substitute for the human intuition which ultimately has to decide the applicability of the results. The efforts to form researchers in this area lags behind the scale of the progress in software and hardware.
• The digitalization of thermodynamics has been advancing at a heightened pace in the last decade; advances in cheminformatics fuelled by enhanced hardware and machine learning algorithms are bringing in rapid changes to the way we look at physical property prediction.
• Molecular dynamics simulation of pure n-alkanes and their mixtures at elevated temperatures.
• Molecular simulation of thermodynamic properties from models with internal degrees of freedom.
• The use of molecular dynamics to measure thermodynamic properties of n-alkanes – case study with GROMACS.
• Fluid-solid phase transition of n-alkane mixtures.
• Comparison of classical force-fields for Molecular Dynamics simulations of lubricants.
• Water-alkane interface at various NaCl salt concentrations.
• Molecular dynamics simulations of CO2 diffusivity in n-hexane, n-decane, n-hexadecane, cyclohexane and squalane.
• Self-diffusion coefficient and viscosity of propane.
• Self-diffusion coefficients – force field comparison and finite boundary effects in the simulation of methane/n-hexane mixtures at high pressures.
• Comparison of force fields with fixed bond lengths and flexible bond lengths.
• Thermal conductivity of n-decane at sub/supercritical pressure.
• Thermodynamic and transport properties of supercritical carbon dioxide and methane.
• Cyclic and polycyclic compounds.
• Enthalpy of mixing predicted using molecular dynamics.
• Phase equilibria applied to alkanes, perfluoroalkanes, alkenes and alcohols.
• VLE and interfacial properties of fatty acid methyl esters from molecular dynamics simulations.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d2cp05423j |
This journal is © the Owner Societies 2023 |