The molecular self-association of carboxylic acids in solution: testing the validity of the link hypothesis using a quantum mechanical continuum solvation approach3

According to the ‘‘link hypothesis’’ there is a connection between the solvent-dependent most stable selfassociated groups of molecules and the building unit present in the polymorph that crystallizes from solution. We have tested this hypothesis by computing the Gibbs free energies for the molecular selfassociation in solution of three selected monocarboxylic acids that crystallize in different polymorphic forms and exhibit selective crystallization depending on the solvent: tetrolic acid (TTA), m-aminobenzoic acid (mABA) and m-hydroxybenzoic acid (mHBA). Calculations have been conducted at the density functional theory (M06-2X) level with the SMD polarizable continuum model to simulate aqueous and organic solutions. For all three systems we have found that the solvation environment significantly affects the stability of dimers. The most stable dimer in solution is the classic carboxylic acid dimer, and its stability decreases on going from nonpolar solvents, such as chloroform or acetonitrile, to aqueous and alcoholic solutions. However, whilst for TTA and mABA there is a link between the carboxylic dimer synthon in solution and the structural synthon packed in the metastable a-TTA polymorph, which crystallizes from chloroform, and in the crystal form-II of mABA, which crystallizes from acetonitrile, for mHBA the stabilization of the centrosymmetric carboxylic dimeric in acetonitrile and ethyl acetate does not correspond with recent experimental observations that the polymorph containing this structural unit (form I) is not the crystal form that nucleates preferentially from these solvents (Cryst. Growth Des., 2013, 13, 1140). Starting from the carboxylic dimer of TTA, the processes of trimerization and tetramerization of TTA from solution have been also modelled. The calculations suggest that the formation of tetramers in chloroform occurs from the self-association of the carboxylic acid dimers and not through the association of the TTA monomers and trimers. The free energy formation of the ionized forms of the mABA dimer (non-zwitterionic–zwitterionic and zwitterionic–zwitterionic) has been evaluated and, according to our results, zwitterionic–zwitterionic mABA dimers might be abundant in supersaturated aqueous and alcoholic solutions of m-aminobenzoic acid.


Introduction
Crystallization of molecular crystals is a key industrial process that is closely linked to the phenomenon of polymorphism, that is, the ability of a molecule to crystallise in more than one structure.The pharmaceutical industry in particular requires the production of the desired polymorph as drugs receive regulatory approval for only a single polymorph.Approximately one-third of all active pharmaceutical ingredients are confirmed polymorphic systems, 1 and a similar number has been reported for organic compounds in general. 2 It is therefore essential to have reliable and effective methods for the selection of each specific polymorphic form of a molecular crystal.However, the lack of fundamental understanding of the factors dictating the outcome of crystallization processes is such that achieving control of a specific polymorph remains a significant challenge.
While a variety of mechanisms are thought to play a role, including growth in confinement, adjustment of the precipitating medium and the formation of amorphous precursor phases, it is the solvent that mostly influences the thermodynamics and kinetics of crystal growth processes. 3In fact, the approach that is usually followed to achieve polymorph selection during solution crystallization is the modification of the solvation environment by changing the nature of the solvent or through the addition of additives (organics, peptides or simple ionic salts). 4,5However, this approach for polymorph selection still remains ad hoc and is heavily dependent on trial and error.
At the present there is no general model explaining how the modification of the chemistry of a solution causes the formation of different polymorphs and simple kinetic models such as Classical Nucleation Theory are unable to predict, or even qualitatively explain, the phenomenon of polymorph selection during solution crystallization. 6There are, however, some hypotheses regarding which processes in solution could be responsible for the selective crystallization of organics.
[10] Therefore, if the ''link hypothesis'' is correct, then from the knowledge of the growth synthons present in solution, as a consequence of thermodynamically driven self-association equilibrium, it should be possible to determine the structural synthons packed in the crystal, as identified by crystallography, and consequently predict the formation of a specific polymorph.For example, if the association to a specific dimer or trimer occurs in solution, then the same building unit should be observed in the crystal. 113][14][15][16] However, to evaluate the thermodynamic stability of molecular clusters in solution, what is required is to calculate the free energy change associated with the process of molecular association.In molecular dynamics simulations the free energies of association can be computed using umbrella sampling or metadynamics, but these methods require the identification of one or a set of collective variables to span the ''reactive'' subspace of the chemical system under investigation.With the exception of simple molecular systems, such as tetrolic acid, 17 the choice of collective variables associated with the molecular selfassembly process of organic molecules is not always obvious.Moreover, MD simulations of organic molecules in explicit solvent molecules usually rely on classical molecular mechanics force fields, which are unable to account for electronic polarization or proton transfer.For example, contradictory results regarding the stability and structure of the molecular associates of glycine 9,12 and urea 18,19 in aqueous solutions were obtained using different force fields, and only thanks to the application of ab initio molecular dynamics it was possible to identify the most stable dimers of glycine 20 and urea. 21However, this technique is limited by the size and complexity of the system, as well as by the time frame of the simulation.
An alternative first principles method that can be used to compute the free energy of molecular association in solution is based on the quantum mechanical continuum solvation approach, 22 in which the solute (monomer, dimer, trimers, etc.) is described at a quantum mechanical level (ab initio or density functional) and the solvent is treated as a polarizable continuum.In particular, the quantum mechanical treatment of molecular associates using modern density functionals specifically developed for non-bonding interactions 23,24 gives an accurate description of the hydrogen (H) bonding and van der Waals forces that can be important in polymorph selection at an accessible computational cost.On the other hand, polarizable continuum models represent a practical approach to simulate the solvation environment and determine the effect of the medium on the structure and stability of solutes in solution.
Here, we report the application of this quantum mechanical continuum solvation approach to compute the free energy of association in solution of three monocarboxylic acids that crystallize in different polymorphic forms and exhibit a selective crystallization depending on the solvent: tetrolic acid (TTA), m-aminobenzoic acid (mABA) and m-hydroxybenzoic acid (mHBA) (see Table 1).The objectives of this paper are to assess the effect of the solvation environment on the stability of pre-nucleation molecular aggregates of tetrolic acid, m-aminobenzoic acid and m-hydroxybenzoic acid, and to verify if there is a connection, as suggested by the link hypothesis, between the structures of the most thermodynamically stable molecular self-associate in a certain solvent and the structural synthon of the crystal polymorph that crystallizes from that particular solvent. 8

Free energies of association in solution
In the gas phase, the free energy of molecular association has been computed as: Where G o X is the total free energy of the species X (X = AB, A or B) in the gas phase: In eqn ( 2), E e,gas is the gas-phase total electronic energy of the solute, and dG o VRT;gas is the vibrational, rotational and translational contributions to the gas-phase Gibbs free energy at T = 298 K under a standard-state partial pressure of 1 atm.
In the liquid-phase the molecular association has been modelled using the thermodynamic cycle reported in Scheme 1, and the free energy change for the association reaction in solution in the 1 mol L 21 standard state has been computed using the following equation: Where G Ã X is the total free energy of the species X in the liquid at 298 K: In eqn (4), DG Ã solv is the solvation free energy of the solute corresponding to transfer from an ideal gas at a concentration of 1 M to an ideal solution at a liquid-phase concentration of 1 M, and DG oA * = 1.89 kcal mol 21 (T = 298.15K) is the free energy change of 1 mol of an ideal gas from 1 atm to 1 M. 25 This approach follows the recommendation by Ho et al. that free energies in solution should be obtained from separate gas-and solution-phase calculations. 26However, there are instances where stationary points in the solution do not correspond to stationary points in the gas-phase, making it impossible to compute relevant gas-phase vibrational, translational and rotational contributions (dG o VRT;gas ).In these circumstances an alternative approach is to perform the geometry optimization and frequency calculations of the species involved in the equilibrium in the liquid-phase, and compute the total free energy of the species X in the liquid using the following equation: where E Tot soln is given by the sum of the liquid-phase expectation value of the gas-phase Hamiltonian (E e,soln ), the electronic polarization contribution to the solvation free energy based on bulk electrostatic (DG EP ) and the contribution from cavitation, dispersion and solvent structural effects (G CDS ): 27 E Tot soln = E e,soln + DG EP + G CDS (6)   In the SMD continuum solvation model, which has been used in this study, the G CDS contribution to the free energy has been parameterized in order to take into account solvent effects specific to the first solvation shell. 28Therefore, it should be capable of describing the extra-stabilization in solution associated with the ability of the solute to form hydrogen bonds with the molecules of the solvent.If the calculations are conducted with the code Gaussian09 and the SMD polarizable continuum model, the term E Tot soln corresponds to the basic energy of a density functional theory calculation using the SMD model.The vibrational, rotational and translational contribution to the solution free energy (dG Ã VRT;soln in eqn ( 5)) is computed by applying the ideal gas partition functions to the frequencies calculated in the dielectric continuum and the 1 M standard state.This approach has been questioned by Ho et al. 26 because ideal gas partition functions are unlikely to be valid in solution, but Ribeiro et al. have demonstrated that vibrational contributions to a solute's free energy are in general insensitive to whether the solute vibrational free energy are computed in the gasphase or in solution. 27

Electronic structure calculations
All calculations were carried out with NWChem 6.1 29 and Gaussian09 30 codes.If the molecular association in solution was modelled using the thermodynamic cycle in Scheme 1, and the related eqn (3) and (4), then the following computa- tional protocol was employed to compute the free energies of association reaction: Gas-phase calculations were performed with the M06-2X hybrid-meta density functional theory (DFT) functional. 23We have chosen M06-2X because its assessment against representative databases showed that this method is one of the most accurate density functionals for a combination of main-group thermochemistry, kinetics and noncovalent interaction. 23,31ptimised structures and frequencies were computed using the 6-31++G(d,p) basis set with the temperature and pressure fixed at the values of 298 K and 1 atm, respectively, and gasphase vibrational, rotational and translational contributions to the free energy (dG o VRT;gas ) were evaluated on the basis of these frequencies and the ideal-gas partition functions q(p,T).
For the evaluation of the term E e,gas in eqn (2), more accurate energetics were derived from single-point energy calculations with the aug-cc-pVDZ basis set because for the M06-2X method this functional/basis set combination gives the best compromise between accurate binding energies and computational cost. 32he free energies of solvation of the species involved in the association process (DG Ã solv ) were calculated using the SMD solvation model, 28 along with the M06-2X functional and the aug-cc-pVDZ basis set [SMD/M06-2X/aug-cc-pVDZ] and the optimized gas-phase geometries.SMD is the recommended continuum solvation model by the Gaussian09 manual for computing the free energy of solvation.In fact, the SMD model was originally tested over a set of 2892 solvation free energies and transfer free energies for neutral solutes and ions in water and non-aqueous solutions, and the mean unsigned error (MUE) over 26 combinations of various basis sets and density functionals was 0.8 kcal mol 21 for neutral solutes, and 4.3 kcal mol 21 for ions. 33In particular, the MUE in calculated aqueous solvation free energies of carboxylic acids was only 0.25 kcal mol 21 using the M06-2X/6-31+G(d,p) level of theory.The SMD model together with the M06-2X density functional was also applied to predict the free energies of aqueous solvation for 61 drug-like molecules in the SAMPL1 test set, and the authors reported a MUE of 2.0 kcal mol 21 . 34The SMD/M06-2X level of theory was also used by Ribeiro and co-workers to compute the partition coefficients of nucleobases between chloroform and water with a MUE of 0.8 kcal mol 21 . 25or the zwitterionic form of mABA (mABA ¡ ), during the gasphase geometry optimization of (mABA ¡ ) 2 dimers we have observed inter-molecular H-transfer between the two monomers and the formation of (mABA) 2 dimers.Therefore, to model the formation of (mABA ¡ ) 2 , the geometry optimization and frequency calculations were conducted in solution using the SMD solvation model, the B97-D density functional, 35 and the 6-31++G(d,p) basis set.More accurate energetics were derived from single-point energy calculations at the SMD/M06-2X/aug-cc-pVDZ level of theory.The total free energies of mABA ¡ and (mABA ¡ ) 2 were then evaluated according to eqn ( 5) and (6).
Molecular systems like m-aminobenzoic acid and m-hydroxybenzoic can form complex mHBA/mHBA and mABA/mABA intermolecular interactions, i.e.XH/O and XH/p (X = O, N) hydrogen bonds and p-p interactions, all of which are important generally in polymorph formation.Therefore, in order to determine all the configurational minima of the dimers of TTA, mABA and mHBA, the starting structures for the geometry optimization procedure were obtained by chemical intuition, from the trajectories of ab initio molecular dynamics simulations (using the CP2K version 2.3.16 36and the PBE-D functional 35 ) of (mABA) 2 , (mABA)(mABA ¡ ), (mABA ¡ ) 2 , and (mHBA) 2 pairs.A simple algorithm was also developed to generate, from the structure of the lowest energy monomer, an arbitrary set of dimer structures.

Results and discussion
In this section we report the structure and stability of the selfassociates of tetrolic acid, m-aminobenzoic acid and m-hydroxybenzoic acid in the gas-phase and different solvation environments.

Tetrolic acid
Tetrolic acid is a simple carboxylic acid with two polymorphic forms: the metastable a-TTA form, which is based on a classic dimer motif and crystallizes from carbon tetrachloride or chloroform, and the stable b-TTA form, which is based on a catameric chain structure and can be recovered from alcoholic solutions. 4We have explored the configurational space of the (TTA) 2 dimer by generating 60 starting geometries, and the free energies of dimerization of the ''surviving'' candidate structures in the gas-phase (DG o ass ) and in polar (water and methanol) and nonpolar (cyclohexane and chloroform) polarizable solvents (DG Ã ass ) are reported in Table 2.All dimers are unstable with respect to the association of TTA molecules but c51_(TTA) 2 .This is the only dimer for which the association process is exoergonic in the gas phase (DG Ã ass = 24.52 kcal mol 21 ) and in the apolar solvents cyclohexane (DG Ã ass = 23.99 kcal mol 21 ) and chloroform (DG Ã ass = 20.64 kcal mol 21 ), but not in water and ethanol (DG Ã ass = 2.82 kcal mol 21 ).The structures of c51_(TTA) 2 and of other representative (TTA) 2 dimers are reported in Fig. 1.The species c51_(TTA) 2 corresponds to the classic dimeric structural synthon found in the metastable a-TTA polymorph, which crystallizes from chloroform but not from alcoholic solutions.This result agrees with the experimental interpretation of the IR spectrum of tetrolic acid solutions, which indicates that in chloroform TTA molecules can form stable carboxylic acid dimers, 8 and with previous classical MD simulations of tetrolic acid molecules in various solvents, 15,17 which showed that the formation of the classic dimeric c51_(TTA) 2 is favourable in chloroform and carbon tetrachloride but not in ethanol.In particular, despite the fact that continuum solvation models do not explicitly take into account the specific interactions between a solute and one or more first-shell solvent molecules, for example hydrogen bonding, it is important to notice that the free energies obtained in the present study for the formation of c51_(TTA) 2 in chloroform (20.64 kcal mol 21 ) and ethanol (+2.82 kcal mol 21 ) compare quite well with the free energy difference between the carboxylic acid dimer and the two fully solvated TTA monomers calculated by using the MD umbrella sampling method in chloroform (21.1 kcal mol 21 ) and ethanol (+5.6 kcal mol 21 ). 17Therefore, in agreement with what was previous suggested, our calculations support the hypothesis that in chloroform there is a link between the solvent-dependent most thermodynamically stable dimer of TTA [the classic R 2 2 (8) carboxylic dimer synthon c51_(TTA) 2 ] and the polymorph that crystallizes from solution (a-TTA).It is also interesting to notice that the second most stable dimer in ethanol, c8_(TTA) 2 in Fig. 1 (DG Ã ass;EtOH = 4.24 kcal mol 21 ), corresponds to the structural motif found in the b-TTA crystal form.
Starting from the carboxylic acid dimer c51, we have then computed the free energies for the formation of the trimers, (TTA) 3 , and tetramers, (TTA) 4 , of tetrolic acid in the gas-phase and chloroform.For the trimers, 60 candidate structures were generated using as building units the optimized gas-phase geometries of TTA and c51_(TTA) 2 .The free energies of association of these species have been computed according to the following association reaction: (TTA) 2 + TTA A (TTA) 3 (7)   The structures and free energies of the most stable trimers are reported in Fig. 2, whilst the free energies of reactions in the gas-phase and chloroform for the full set of generated trimers are listed in Table SI.1 of ESI.3The formation of a molecular aggregate (TTA) 3 is favourable when the interaction occurs between the carboxylic groups of the tetrolic acid molecules [c10_(TTA) 3 , c37_(TTA) 3 and c39_(TTA) 3 ], but the most stable configuration of (TTA) 3 is c60_(TTA) 3 , where the negatively charged sp hybridized carbon atoms of the TTA monomer interact with the positively charged hydrogen atoms of the hydroxyl groups of the dimer (C sp … H-O).
For the tetramers, starting from the structure of the dimer [c51_(TTA) 2 ], we have generated and optimized 50 candidate structures, and the free energies of formation of (TTA) 4 in the gas-phase and chloroform have been computed using the following reactions: Fig. 1 Optimized structures of representative (TTA) 2 dimers.Distance in Å.
In chloroform, the free energies of reactions computed according to reaction (8) are positive for all tetramers (see Table SI.2 in the ESI3), whereas for reaction (9) we could identify two stable tetramers in the liquid-phase: c18_(TTA) 4 , DG Ã ass = 21.55 kcal mol 21 , and c35_(TTA) 4 , DG Ã ass = 20.33 kcal mol 21 ) (see Table SI The results of the molecular association of TTA in chloroform are summarized in Fig. 3, where we have reported the structures and free energies of association in solution of the most stable dimer [c51_(TTA) 2 ], trimer [c60_(TTA) 3 ] and tetramer [c18_(TTA) 4 ].The results reported in Fig. 3 would suggest that the tetramerization of TTA molecules occurs from the self-association of the carboxylic acid dimers c51_(TTA) 2 Fig. 2 Optimized structures of representative (TTA) 3 trimers and free energies of formation in the gas-phase and chloroform.Distances in Å and free energies in kcal mol 21 .
Fig. 3 The most stable monomer, dimer, trimer and tetramer of TTA in chloroform.Structures optimised in the liquid-phase at the SMD/M06-2X/6-31++G(d,p) level of theory.Free energies in kcal mol 21 .
and not through the association of the TTA monomers and trimers.However, it is important to notice that the approach used in the present study to investigate the formation of trimers and tetramers excludes the possibility that trimers could also form from the association of solvated monomers: 3(TTA) A (TTA) 3 (10)   or that tetramers could form from the association of dimers with two monomers: (TTA) 2 + 2(TTA) A (TTA) 4 (11)   For example, chain-like trimers could become energetically competitive with the carboxylic acid dimer and the process of molecular-aggregation in solution could differ from the one outlined in Fig. 3. On the other hand, the configurational space that must be explored to simulate the formation of (TTA) 3 and (TTA) 4 according to eqn ( 10) and ( 11) is computationally very demanding for the full DFT-based approach employed in the present study.

m-Aminobenzoic acid
In the solution state m-aminobenzoic acid (mABA) can exists in both zwitterionic and non-zwitterionic forms.The nucleation is expected to be dependent on the distribution between nonzwitterionic and zwitterionic molecules, which is expressed by the equilibrium constant K Z = [mABA ¡ ]/[mABA]; the value of K Z has been found to be solvent-dependent. 37Also, this system exhibits abundant polymorphism, with five reported forms: 38 in form I, III and IV the m-aminobenzoic acid exists in zwitterionic form, whereas in forms II and V the molecules are non-zwitterionic.Moreover, the nucleation of the mABA polymorphs depends chiefly on the solvent: in water and methanol the form I is obtained whereas form II crystallizes from acetonitrile. 37n this study we have computed the formation of nonzwitterionic-non-zwitterionic [(mABA) 2 ], non-zwitterioniczwitterionic, [(mABA)(mABA ¡ )], and zwitterionic-zwitterionic [(mABA ¡ ) 2 ] dimers in the gas-phase and in water, methanol and acetonitrile.The dimerization free energies and structures of a representative set of non-zwitterionic dimers (mABA) 2 are reported in Table 3 and Fig. 4, respectively.The free energies of formation of the full set of dimers generated to explore the configurational space of (mABA) 2 are reported in Table SI.4 of ESI.3Only two species [c60_(mABA) 2 and c61_(mABA) 2 ] are stable in the gas-and liquid-phases (see Table 3) and, as shown in Fig. 4, these dimers correspond to the classic carboxylic acid dimer.On the other hand, the formation of (mABA) 2 species where the monomers interact via other types of H-bonds [c3_(mABA) 2 , c41_(mABA) 2 , c48_(mABA) 2 ], p-p interactions [c70_(mABA) 2 ], or a combination of H-bonding and p-p interactions [c66_(mABA) 2 and c71_(mABA) 2 ], is endergonic in all solvation environments.The dimer c60_(mABA) 2 corresponds to the carboxylic dimeric unit that has been found in form II of mABA, 37  In aqueous solutions the distribution of the non-zwitterionic and zwitterionic forms of mABA is close to unity. 39We have therefore computed the free energies for the formation of (mABA)(mABA ¡ ) dimers in water and, for comparison, also in methanol and acetonitrile, and the results are reported in Table SI.5 of ESI. 3 We have not been able to find values of K Z in acetonitrile, but in other aprotic solvents like dioxane and chloroform it has been shown that mABA is largely present in its non-ionic form. 40Notwithstanding, in acetonitrile the process of association of mABA and mABA ¡ has been found to be endergonic, as for all generated dimers DG Ã solv .0 (see Table SI.5 of ESI3), which implies that under the simulated conditions molecular associates of the type (mABA)(mABA ¡ ) are unlikely to be stable in this solvent.In water and methanol only one dimeric structure, c37_(mABA)(mABA ¡ ), has been found to be stable with respect to the association of mABA and mABA ¡ molecules (DG Ã ass,H2O ~{3:17 kcal mol {1 and DG Ã ass,CH8OH ~{2:41 kcal mol {1 ).Its structure corresponds to a sandwich-like benzene dimer that allows the formation of a double H-bond (H 2 N + -H … NH 2 and C-O 2 … HO-C) and p-p interactions between the mABA and mABA ¡ molecules (see Fig. 5).
Because in aqueous and alcoholic solutions the zwitterionic form of mABA is also likely to form molecular associates, we have computed the free energy of formation of (mABA ¡ ) 2 dimers in water and methanol.The structures and free energies in water for a representative set of zwitterionic dimers (mABA ¡ ) 2 are reported in Fig. 6, whilst the free energies of formation in water and methanol of the full set of dimers that have been generated to explore the configurational space of (mABA ¡ ) 2 are reported in Tables SI. 6 and SI.7 of ESI,3 respectively.In water the most stable dimer, c30_(mABA ¡ ) 2 , is stabilized by a double H-bond (H 2 N + -H … 2 OCLO) and p-p interactions between the two rings (see Fig. 6).When the monomers are not in an optimal position to form p-p interactions, then the stabilization free energy reduces to approximately 27 kcal mol 21 [c6_(mABA ¡ ) 2 , c9_(mABA ¡ ) 2 , c52_(mABA ¡ ) 2 and c55_(mABA ¡ ) 2 ], whereas when they can only form a single H-bond [c31_(mABA ¡ ) 2 , c32_(mABA ¡ ) 2 and c34_(mABA ¡ ) 2 ] the stabilization free energy is in the range of 21/24 kcal mol 21 .Similar results have been found for (mABA ¡ ) 2 in methanol.Therefore, according to our results, the zwitterionic-zwitterionic dimers might be abundant in supersaturated aqueous and alcoholic solutions of m-aminobenzoic acid, a result that could be related to the abundant polymorphism exhibited by the zwitterionic form of mABA. 38Hydroxybenzoic acid m-Hydroxybenzoic acid (mHBA) crystallizes in two different polymorphic forms: the thermodynamically stable form I that contains the classic carboxylic dimer motif, and the metastable form II featuring intermolecular hydrogen bond chains between the COOH and OH groups of different molecules.41 Calculations at the M06-2X/aug-cc-pVDZ/6-31++G(d,p) level of theory indicated that mHBA exists mainly as four, almost isoenergetic, conformers, for which optimized geometries and relative energies are reported in Fig. 7. Another four stable forms were found by pointing the -OH of the carboxylic group towards the hydrogen of the benzene group, but their optimised structures were found to be 5.3-7.2kcal mol 21 higher in energy than the lowest conformer, and therefore less statistically important than conformers I-IV.The geometries of conformers I-IV were then used to generate and optimize the structures of 140 (mHBA) 2 dimers using the following combinations of mHBA conformers: conformer-I-conformer-I, conformer-II-conformer-II, conformer-III-conformer-III, conformer-VI-conformer-IV and conformer-II-conformer-III.The free energies of formation and structures for a representative set of (mHBA) 2 dimers are reported in Table 4 and Fig. 8, respectively, whereas the free energies for the full set of candidate dimers that were generated to explore the configurational space of (mHBA) 2 are reported in Table SI.8 of ESI. 3 Similarly to what was found for tetrolic acid and the nonzwitterionic form of m-aminobenzoic acid, the classic carboxylic acid dimers c52_(mHBA) 2 , c60_(mHBA) 2 , c89_(mHBA) 2 and c116_(mHBA) 2 (see Fig. 7) are the only stable species in acetonitrile and ethyl acetate, whereas in water and methanol the free energy of formation of these species is close to zero.In particular, the species c89_(mHBA) 2 corresponds to the classic dimeric structural synthon found in the thermodynamically stable form I. The formation of other (mHBA) 2 species, where the monomers interact via p-p interactions or single H-bonds between the COOH and OH groups, is endergonic in all solvation environments (see Tables 4 and SI.8 in ESI 3 ).Therefore if the ''link hypothesis'' is correct, then the polymorph containing the carboxylic dimer motif should preferentially crystallize from the solvent that mostly stabilizes the centrosymmetric carboxylic acid dimer c52_(mHBA) 2 , that is ethyl acetate (see Table 5).Sva ¨rd and Rasmuson have recently reported nucleation experiments of mHBA in several different pure solvents: in alcoholic solutions (ethanol and methanol) form I appeared to be favoured over form II; in water, acetonitrile and ethyl acetate the percentage of nucleations resulting in form II was more than 80%. 42The preferential nucleation from ethanol of form I, the polymorph containing the carboxylic dimer motif, partially agrees with the computed free energies of formation of carboxylic acid dimers in ethanol, which are close to zero and significantly lower than the free energies of other dimer structures (see Table 5).On the other hand, our theoretical prediction that in acetonitrile and ethyl acetate the carboxylic dimers are the most stable species in solution does not correlate with the preferential crystallization of form II from these solvents.Moreover, this result also contrasts to what was found for TTA and mABA, where the polymorph containing the carboxylic dimeric structural synthon nucleates from the solvent that mostly stabilizes the carboxylic acid dimer, chloroform and ethyl acetate, respectively.This suggests that there is not always a connection between the most thermodynamically stable molecular self-associate in a certain solvent and the polymorph that crystallizes from that particular solvent, as dictated by the ''link hypothesis''.Therefore, this study suggests that it is not possible to predict the formation of a specific polymorph only from the knowledge of the most thermodynamically stable molecular self-associate in solution.The observed effect of the solvent on the free energy of formation of molecular associates, and consequently on the population statistics of molecular clusters in solution, is not the only mechanism dictating the outcome of a crystallization process.Other mechanisms, occurring for example at the crystal-liquid interface, could be also responsible for the preferential growth of a specific polymorph of mHBA over another.
In order to verify the accuracy of our calculations, the free energies of formation of one carboxylic acid dimer, c52_(mHBA) 2 , and those of the species c38_(mHBA) 2 and c59_(mHBA) 2 were evaluated using different basis sets [6-31+G(d,p), 6-31++G(d,p) and aug-cc-pVDZ] and density functionals (M06-2X, PBE, PBE-D, B97D and B3LYP).Species c38_(mHBA) 2 was chosen because the mHBA molecules interact via p-p and hydrogen bonding in this dimer (see Fig. 8), and c59_(mHBA) 2 was chosen because it contains the intermolecular H-bond motif found in the polymorphic form-II of mHBA.The results in Table 5 show that all methods agree in predicting the carboxylic acid dimer c52_(mHBA) 2 as the most stable species in all solvation media.It is also important to point out that the gas phase interaction energies of the structure c38_(mHBA) 2 computed using the meta-hybrid DFT method M06-2X and the generalized gradient DFT methods with dispersive correction, PBE-D and B97-D, differ by less than 0.3 kcal mol 21 .The PBE and B3LYP stabilization energies computed on the optimised M06-2X geometries, however, are positive and more than 10 kcal mol 21 higher than the free energies obtained at the M06-2X level of theory.This is due to the neglecting dispersion at the PBE and B3LYP calculations.Moreover, starting from the M06-2X/6-31++G(d,p) optimized geometry of c38_(mHBA) 2 , we have re-optimized the structure at the PBE, PBE-D, B97D and B3LYP levels of theory (see Fig. 9).The structures obtained at the PBE-D and B97D levels are similar to the M06-2X optimised geometry.On the other Fig. 5 Optimized structure free energies of formation of the most stable (mABA)(mABA ¡ ) dimer in water and methanol.Distance in Å and free energies in kcal mol 21 .Fig. 6 Optimized structures of representative zwitterionic-zwitterionic (mABA ¡ ) 2 dimers and free energies of formation in water.Distances in Å and free energies in kcal mol 21 .
hand, PBE generates a structure where the two monomers interact through H-bonds between the COOH and OH groups, whereas in the B3LYP structure the rings of the two mHBA monomers are more than 4.5 Å apart; this result shows that these functionals utterly fail to treat the stacking interaction. 31,43

Conclusions
We have reported a computational investigation of the process of molecular self-association in different solvation environments of three selected monocarboxylic acids that crystallize in different polymorphic forms and exhibit selective crystallization depending on the solvent: tetrolic acid, m-aminobenzoic acid and m-hydroxybenzoic acid.The free energies of formation, both in the gas-phase and in solution (water and organic solvents), were computed at the DFT (M06-2X) level with the SMD polarizable model to simulate the solution environment.One of the aims of this study was to verify whether a link exists between the most stable molecular associates of TTA, mABA and mHBA in solution and the polymorph obtained from crystallization.Based on the results of our calculations and comparison with nucleation experiments, we conclude the following.For all three selected monocarboxylic acids the most stable dimer in solution is the classic carboxylic acid dimer, and its stability decreases on going from nonpolar solvents, such as chloroform or acetonitrile, to polar environments, like water and alcoholic solvents.
For TTA and mABA our calculations support the hypothesis that there is a connection between the carboxylic dimer synthon in solution and the structural synthon packed in the metastable a-TTA polymorph, which crystallizes from chloroform, and in the crystal form-II of mABA, which crystallizes from acetonitrile, in agreement with the results proposed by the ''link'' hypothesis.However, for mHBA, the stabilization of the centrosymmetric carboxylic acid dimer in ethyl acetate and acetonitrile does not correspond with recent experimental observations that the crystal form containing the classic carboxylic dimer motif (form I) is not the dominant polymorph that crystallizes from these solvents. 42his study shows that for carboxylic acids the structure of the most stable species in solution (the classic carboxylic acid dimer) is not always linked to the structural synthon of the polymorph that crystallizes from solution.Consequently, the formation of a specific polymorph is not only dependent on the population statistics of molecular clusters in solution.
Other factors, such as the interactions occurring at the crystalliquid interfaces of the polymorphs of mHBA, could be important in directing the polymorphic outcome of a crystallization process.
Trimers of TTA are stable in chloroform, but the formation of tetramers occurs from the self-association of the carboxylic acid dimers and not through the association of the TTA monomers and trimers.However, we have not considered the formation of trimers and tetramers in terms of the association of solvated monomers, as the configurational space to be explored would be too large and computationally demanding for the DFT-based approach employed in the present study.Therefore, we cannot exclude that the formation of trimers and tetramers could also occur through other alternative molecular-aggregation pathways.
The free energy formation of the ionized forms of the mABA dimer (non-zwitterionic-zwitterionic and zwitterionic-zwitterionic) has been evaluated and, according to our results, zwitterionic-zwitterionic dimers would be abundant in supersaturated aqueous and alcoholic solutions of m-aminobenzoic acid.

Table 1
Scheme 1 and its stability increases significantly on going from water (DG Ã solv = 20.35kcal mol 21 ) and methanol (DG Ã solv = 20.06kcal mol 21 ) to acetonitrile (DG Ã solv = 22.92 kcal mol 21 ), from where the polymorphic form II of mABA crystallizes.Therefore, similarly to what was found for tetrolic acid, the solvent controls the stability of the dimers in solution and, as suggested by the link hypothesis, the most stable dimer of mABA in acetonitrile [the carboxylic dimer synthon c60_(mABA) 2 ] corresponds to the structural synthons packed in the polymorph that crystallizes from this solvent (form II).

Fig. 9
Fig.9Structures of the c38_(mHBA) 2 dimer and interaction energies as computed using different DFT methods.Distances in Å and gas-phase interaction energies in kcal mol21 .

Table 2
Dimerization of TTA.Gas phase interaction energies (DE e,gas ), standard state (1 atm) gas-phase free energies of association (DG o ass ) at 298 K, and standard state (1 M) free energies of association in the liquid-phase (DG Ã ass ).Values in kcal mol 21 DE e,gas DG o (TTA) 2 + (TTA) 2 A (TTA)4

Table 3
21merization of the non-zwitterionic form of mABA.Gas phase interaction energies (DE e,gas ), standard state (1 atm) gas-phase free energies of association (DG o ass ) at 298 K, and standard state (1 M) free energies of association in the liquid-phase (DG Ã ass ).Values in kcal mol21

Table 4
Dimerization of mHBA.Gas phase interaction energies (DE e,gas ), standard state (1 atm) gas-phase free energies of association (DG o ass ) at 298 K, and standard state (1 M) free energies of association in the liquid-phase (DG Ã ass ).Values in kcal mol 21 DE e,gas DG o Materials Chemistry Consortium (EPSRC grant no.EP/ D504872).Dr Simon Black (AstraZeneca) is thanked for useful discussions.

Table 5
21merization of mHBA.Gas phase interaction energies (DE e,gas ), standard state (1 atm) gas-phase free energies of association (DG o ass ) at 298 K, and standard state (1 M) free energies of association in the liquid-phase (DG Ã ass ) for different basis sets and density functional theory methods.Geometries and frequencies computed at the M06-2X/6-31++G(d,p) level of theory.Values in kcal mol21