 Open Access Article
 Open Access Article
      
        
          
            Prantar 
            Dutta
          
        
      , 
      
        
          
            Deepak 
            Jain
          
        
      , 
      
        
          
            Rakesh 
            Gupta
          
        
       * and 
      
        
          
            Beena 
            Rai
* and 
      
        
          
            Beena 
            Rai
          
        
      
      
Physical Sciences Research Area, Tata Research Development and Design Centre, TCS Research, 54-B, Hadapsar Industrial Estate, Pune 411013, India. E-mail: gupta.rakesh2@tcs.com;   Tel: + 91-20-66086422
    
First published on 21st December 2022
Calculating the free energy of drug permeation across membranes carries great importance in pharmaceutical and related applications. Traditional methods, including experiments and molecular simulations, are expensive and time-consuming, and existing statistical methods suffer from low accuracy. In this work, we propose a hybrid approach that combines molecular dynamics simulations and deep learning techniques to predict the free energy of permeation of small drug-like molecules across lipid membranes with high accuracy and at a fraction of the computational cost of advanced sampling methods like umbrella sampling. We have performed several molecular dynamics simulations of molecules in water and lipid bilayers to obtain multidimensional time-series data of features. Deep learning architectures based on Long Short-Term Memory networks, attention mechanisms, and dense layers are built to estimate free energy from the time series data. The prediction errors for the test set and an external validation set are much lower than that of existing data-driven approaches, with R2 of the best model around 0.99 and 0.82 for the two cases. Our approach estimates free energy with satisfactory accuracy using deep learning models within an order-of-magnitude less computational time than required by extensive simulations. This work presents an attractive option for high-throughput virtual screening of molecules based on their membrane permeabilities, demonstrates the applicability of language processing techniques in biochemical problems, and suggests a novel way of integrating physics with statistical learning to great success.
Drug transport across membranes occurs mainly in three ways – passive, carrier-facilitated, and active. Passive diffusion is the simple movement of molecules from a higher concentration region to a lower concentration region without any energy expenditure, driven by the concentration gradient. Carrier-facilitated transport relies on membrane proteins that bind to the drug molecules at one side of the membrane, undergo a conformational change, and release the drug on the other side. This mechanism follows the concentration gradient and does not require cellular energy but is useful for polar molecules with a low affinity towards the hydrophobic membrane core. Active transport refers to the protein-mediated movement of molecules from a lower concentration region to a higher concentration region that exploits energy from adenosine triphosphate (ATP) hydrolysis. As passive diffusion predominantly governs the permeation of most available drugs, its study is crucial for gaining mechanistic insights into a fundamental biological phenomenon and designing novel formulations in the pharmaceutical industry. Passive permeability depends on physicochemical properties like size and hydrophobicity for small organic molecules. It can be measured by both experiments5,6 and molecular simulations.7,8
According to the inhomogeneous solubility-diffusion model, the permeability is expressed as:
|  | (1) | 
While experimental methods are available for permeability calculations, they are not conducive to high throughput screening of potential drug candidates due to the enormous time and cost of sampling the vast small-molecule chemical space of more than 1060 compounds.13 Molecular dynamics (MD) simulations offer an alternative, in silico approach for calculating PMF and diffusivity. MD-based enhanced sampling algorithms like umbrella sampling (US)14,15 and metadynamics16 have been widely used for obtaining free energy profiles of drug-membrane permeations as a function of intermolecular and intramolecular coordinates.7,17–21 However, the computational expense of MD limits the applicability of these algorithms to studying only hundreds and thousands of molecules within a realistic time frame, even with coarse-grained (CG) modeling techniques.22 Recent developments in Machine Learning (ML) methods and tools permit high throughput computation of physicochemical and pharmaceutical properties.23 The pipeline for this process includes obtaining fingerprints and/or descriptors from cheminformatics packages to represent molecules, building predictive models using suitable statistical learning techniques, and evaluating the models on different datasets. Chen et al. followed this procedure to investigate the permeability of drug-like molecules across lipid membranes.24 Although the computational cost of ML algorithms is minuscule compared to MD simulations, they suffer from several drawbacks like moderate accuracy, low interpretability, sparse training data, and lack of transferability. Combining MD simulations and ML within an integrated computational framework can leverage the best of both worlds and minimize their individual limitations. Physical insights from MD inform ML models to make better predictions, leading to improved accuracy and interpretability, while handling thousands of molecules. Additionally, simulations generate training data when experiments are not available.
Riniker proposed molecular dynamics fingerprints (MDFPs) to featurize small molecules.25 To encode enthalpic and entropic information, the mean, standard deviation, and median of properties like potential-energy components, solvent-accessible surface area, and radius of gyration were extracted from short MD simulations of the molecules in water and vacuum. Combined with simple 2D counts, these fingerprints were used to train ML models to estimate solvation free energies and partition coefficients in various solvents. This approach increased prediction accuracy without being computationally expensive like MD-based methods and attracted a flurry of interest. MDFPs in diverse forms have been employed for identifying P-glycoprotein substrates,26 studying the impact of mutations on protein–ligand binding affinity,27 virtual screening of caspase-8 inhibitors against Alzheimer's disease,28 predicting water–octanol partition coefficients of small molecules,29 computing self-solvation free energies and limiting activity coefficients of chemicals,30 screening ligands for ERK2 Kinase inhibition31 and other applications. Bennett et al. predicted transfer free energies of small molecules from water to cyclohexane using convolutional neural networks, with both voxel-based and graph-based featurization from MD simulations.32 Although their models were trained on a large dataset, the accuracy was moderate, and the implementation was quite complex.
All the studies discussed above flatten the MD trajectories to obtain statistical quantities and hence do not utilize the enormous data generated by simulations. While this makes the data handling easier, a lot of the entropic contribution to free energy is ignored, leading to low model accuracies. Additionally, the features are calculated only from the simulations of the small molecules, and thus the models of free energy prediction of drug permeation fail to capture the differences in lipids.
Two modifications to existing approaches are necessary to capture the physics of permeation effectively: (i) inclusion of the interactions of drugs with both the lipid bilayer and the surrounding aqueous phase in the feature space, and (ii) accounting for the entropic information more rigorously. We accomplished both these refinements by considering MD trajectories as multidimensional time series, with each snapshot described by a vector of features based on drug-lipid and drug-water interactions. This treatment of trajectories allowed us to apply state-of-the-art sequence modeling techniques from natural language processing (NLP) and signal processing to simulation data. In view of this, Deep Learning (DL) algorithms developed for tasks such as machine translation, document classification, sentiment analysis, speech-to-text and text-to-speech conversions, and image captioning can be adapted for predicting properties from short MD runs. Berishvili et al. utilized this strategy for protein–ligand binding affinity prediction and observed better results for time series-based models than average feature-based models.33 Despite model overfitting, their work demonstrated the capability of DL in solving biochemical problems and aiding researchers in the high-throughput virtual screening of molecules. Tsai et al. contributed important theoretical insights and discussion on applying language models on MD trajectories to study the dynamics of complex systems.34
In this work, we developed regression models for predicting the free energy of permeation of small drug-like molecules from the aqueous phase to different lipid bilayers using DL. Free energy data of CG solutes in six phospholipid membranes were obtained from the database reported by Hoffmann and co-workers.35 We performed several short MD simulations of the molecules in water and within the bilayer to calculate time series features. Long short-term memory (LSTM)36 network, a specialized technique for sequence modeling, was used to build the DL architectures. LSTMs are said to resolve the vanishing gradient problem of traditional recurrent neural networks by using a gating mechanism. They learn long-term dependencies in sequences and can be particularly useful in analysing MD trajectories as the volume of data generated by MD simulations is enormous. We further investigate the applicability of the attention mechanism,37,38 the ubiquitous component of modern language models, for the first time in the context of property prediction from simulation data. This study generates highly accurate models of drug permeability and instantiates the potential of physics-informed statistical learning in biochemical sciences.
|  | ||
| Fig. 1 Overall model development workflow, including molecular dynamics simulations, data processing, model selection and evaluation. | ||
![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 000 small molecules. US simulations were performed for the 105 dimers in 6 single-component phospholipid membranes – 1,2-dipalmitoyl-sn-glycero-3-phosphocholine (DPPC), 1,2-dioleoyl-sn-glycero-3-phosphocholine (DOPC), 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine (POPC), 1,2-dilauroyl-sn-glycero-3-phosphocholine (DLPC), 1,2-diarachidonoyl-sn-glycero-3-phosphocholine (DAPC), and 1,2-dilinoleoyl-sn-glycero-3-phosphocholine (DIPC), leading to a total of 630 drug-membrane systems. All the PMFs were calculated along the z-axis (bilayer normal) using 24 US simulations for each system, followed by reconstruction with the weighted histogram analysis method.40,41 The detailed simulation and analysis protocols can be found in the original paper of this database. The available 15
000 small molecules. US simulations were performed for the 105 dimers in 6 single-component phospholipid membranes – 1,2-dipalmitoyl-sn-glycero-3-phosphocholine (DPPC), 1,2-dioleoyl-sn-glycero-3-phosphocholine (DOPC), 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine (POPC), 1,2-dilauroyl-sn-glycero-3-phosphocholine (DLPC), 1,2-diarachidonoyl-sn-glycero-3-phosphocholine (DAPC), and 1,2-dilinoleoyl-sn-glycero-3-phosphocholine (DIPC), leading to a total of 630 drug-membrane systems. All the PMFs were calculated along the z-axis (bilayer normal) using 24 US simulations for each system, followed by reconstruction with the weighted histogram analysis method.40,41 The detailed simulation and analysis protocols can be found in the original paper of this database. The available 15![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 120 trajectories (630 systems × 24 simulations) have biased potentials due to harmonic restraints applied on the solute and are not suitable for feature generation. Consequently, we only take starting structures from the trajectories. The output labels for our dataset are the free energies of drug permeation in membranes. We obtain these values from the PMF profiles as the difference between the free energies at the lipid bilayer midplane (z = 0.0 nm) and in water away from the interface (z = 4.1 nm). Fig. 2 summarizes the information present in the database.
120 trajectories (630 systems × 24 simulations) have biased potentials due to harmonic restraints applied on the solute and are not suitable for feature generation. Consequently, we only take starting structures from the trajectories. The output labels for our dataset are the free energies of drug permeation in membranes. We obtain these values from the PMF profiles as the difference between the free energies at the lipid bilayer midplane (z = 0.0 nm) and in water away from the interface (z = 4.1 nm). Fig. 2 summarizes the information present in the database.
        |  | ||
| Fig. 2 Overview of the information present in the database by Hoffmann and co-workers.35 Potential of Mean Force (PMF) profiles of 105 neutral Martini dimers in six phospholipids are reported. Purple beads in the lipid tails indicate unsaturation. The free energies of permeation are extracted from the PMF profiles, as shown. See ESI† for details on the bead types. | ||
We further tested and evaluated our models on an out-of-distribution dataset. In another study, Hoffmann et al. published the free energies of permeation of many linear trimers and tetramers of Martini beads from aqueous phase to DOPC bilayer.42 We selected 50 linear trimers for testing from the reported 694 in the dataset. The subset was chosen so that its distribution was similar to that of the entire set. This validation was performed to ensure the generalizability of our deep learning models to molecules of relatively bigger sizes, thus ensuring that a greater portion of the small-molecule chemical space is covered.
Three out of eight features (lj-mol-wat, lj-mol-lip, sasa) were normalized by the number of CG beads making up the drug (two in the case of dimers) to make our model scalable and generalizable. In total, 350 snapshots were saved from each MD trajectory, and a vector of features described each snapshot. The vectors from drug-lipid and drug-water simulations were concatenated at every timestep. Inspired by Berishvili et al.,33 we represent the input data for one sample as a 2D array with 350 rows (timesteps) and 8 columns (features). Table 1 lists the features used along with their brief descriptions. While choosing them, we took hints from existing literature on ML models involving biomolecules, as well as the theoretical conception of statistical mechanics and molecular simulations. A feature was selected considering two conditions – it should be easily calculable using GROMACS without any additional functionality, and it should at least have a qualitative physical or thermodynamic relation with the free energy. Our procedure encodes the simulation data in terms of collective variables. It thus captures the physics of the systems without storing and handling entire trajectories consisting of particle positions and velocities at all time steps.
| Feature | Description | 
|---|---|
| Bondener | Bond energy of a molecule in water | 
| lj-mol-wat | Lennard–Jones interaction energy between molecule and water; normalized by the number of beads making up the molecule | 
| Molwat-enthalpy | Enthalpy of the molecule-in-water system | 
| Sasa | Solvent accessible surface area of molecule in water; normalized by the number of beads making up the molecule | 
| lj-mol-lip | Lennard–Jones interaction energy between molecule and lipid; normalized by the number of beads making up the molecule | 
| Apl | Area per lipid of bilayer | 
| rmsd-lip | Root mean square deviation of lipid | 
| rmsd-mollip | Root mean square deviation of molecule + lipid | 
The dataset was randomly split into training and test sets with a train-test ratio of 90![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) :
:![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 10. Furthermore, 10% of the data from the training set was taken to be the validation set, and the rest was used for model building. Hence, there were 510 training examples, 57 validation examples, and 63 test examples. Multiple training, validation, test sets were created with different random splits to verify the model reliability (please see ESI† for details). We applied min–max normalization to the time-series features such that the rescaled features were between 0 and 1. The data pre-processing was performed with the Scikit-learn library.51
10. Furthermore, 10% of the data from the training set was taken to be the validation set, and the rest was used for model building. Hence, there were 510 training examples, 57 validation examples, and 63 test examples. Multiple training, validation, test sets were created with different random splits to verify the model reliability (please see ESI† for details). We applied min–max normalization to the time-series features such that the rescaled features were between 0 and 1. The data pre-processing was performed with the Scikit-learn library.51
With the conjecture that DL techniques developed for NLP work well with time series, we leveraged the attention mechanism to learn from our trajectory data. Earlier, neural network-based machine translation relied on encoder-decoder architectures where the encoder transforms the input sentences into fixed-length vectors from which the decoder produces the translated output. Bahdanau et al. hypothesized that the fixed length encoding vector is a bottleneck because the decoder has restricted knowledge of the input information.37 They proposed the attention mechanism to overcome this bottleneck, which identifies parts of the input sentence pertinent to an output word.
We adopted the Bahdanau attention mechanism to sequentially compute the alignment score, weights, and context vector. In translation tasks, the context vector is fed to the decoder at each time step. However, our network has no decoder; we simply replace the second LSTM layer in our first model with the attention mechanism. Therefore, the output of the first LSTM layer is fed to the attention layer, which generates the context vector that acts as an input to the dense layer. Fig. 3b shows the architecture of this hybrid model. The training protocol and settings were kept the same as the first model. Henceforth, the first model with two LSTM layers and a dense layer will be referred to as Model-L, and the second model with an attention mechanism in place of an LSTM layer will be referred to as Model-LA. A magnified view of the architecture, along with technical details about LSTM and attention mechanism, is presented in Section S4 (ESI†).
We also built a baseline deep neural network (DNN) model, also known as multilayer perceptron, to compare the results with those of Model-L and Model-LA. The eight features were averaged over the entire 21 ns trajectory, and an input vector, instead of an input matrix, was obtained corresponding to each free energy value. The data splitting and training protocols of the DNN model were kept similar to those of Model-L and Model-LA. The optimal model consisted of the input layer, two hidden layers with 128 neurons each, and the output neuron. The rectified linear unit (ReLU) activation function was used for both the hidden layers, along with the dropout technique with a probability of 0.3. All the models were implemented using the Keras API of TensorFlow 2.52
The dataset of free energies of permeation of linear trimers of CG beads from the aqueous phase to the DOPC bilayer consists of 694 entries. We chose a subset of 50 representatives among them in a quasi-random way for additional testing and validation of our DL models. A series of subsets were generated using different random states and subjected to the two-sample Kolmogorov–Smirnov (KS) test for goodness of fit with the complete set, and the one with the lowest value of KS statistic was selected. The kernel density estimation plots of the entire trimer dataset and the final test set of 50 overlap almost entirely, as evident from Fig. 4b. This sampling method maximizes the likelihood that the performance metrics on this test data, hereafter referred to as trimer-test set, represent our model's general relevance to linear trimers. If the prediction errors are within an acceptable range, the applicability of the model increases to a few hundred thousand more small molecules.
          Table 2 summarizes the performance metrics of Model-L and Model-LA on the training, validation, and test sets. The results show that both models are powerful in predicting the free energy of permeation from the multidimensional time series generated using MD simulations. The MAEs and RMSEs on all datasets are lower than standard errors in US, with R2 being around 0.99. The comparable results in all three cases indicate the generalizability of our models. Model-LA performed marginally better than Model-L on the training and validation data, while the accuracy was similar for test data for both the models. To the best of our knowledge, this work presents the most accurate statistical approach for predicting permeation free energies of small molecules in membranes to date. More model refinements are not meaningful since our test errors are already less than the magnitude of energy fluctuations. We also compared the performances of Model-L and Model-LA with a baseline case of the DNN model trained on trajectory-averaged features. The MAEs for the training, validation, and test sets are 0.87 kcal mol−1, 0.91 kcal mol−1, and 0.95 kcal mol−1, respectively, which are more than twice of the other two models. Fig. 5 plots the predicted free energies of the two sequential models with the actual free energies from the dataset. No significant outliers can be visually detected in either case. The maximum deviations of the test predictions from the actual values were 1.52 kcal mol−1 (∼2.5 kBT) and 1.68 kcal mol−1 (∼2.8 kBT) for Model-L and Model-LA, respectively. Hence, these models can estimate the free energy of permeation of the ∼400![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 000 small molecules that maps to two Martini beads in various phospholipid membranes, using a combination of CG MD and DL, at a of the computational expense of US, with high accuracy.
000 small molecules that maps to two Martini beads in various phospholipid membranes, using a combination of CG MD and DL, at a of the computational expense of US, with high accuracy.
| Model | Performance metrics | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Training set | Validation set | Test set | |||||||
| MAE | RMSE | R 2 | MAE | RMSE | R 2 | MAE | RMSE | R 2 | |
| Model-L | 0.30 | 0.39 | 0.993 | 0.44 | 0.54 | 0.988 | 0.45 | 0.56 | 0.987 | 
| Model-LA | 0.24 | 0.32 | 0.996 | 0.34 | 0.46 | 0.992 | 0.46 | 0.61 | 0.984 | 
Neural network architectures like LSTMs are regularly applied for diverse time series problems in science and engineering with reasonable success. Hence, their superior performance with MD simulation data comes as no surprise. However, in this case, the key challenge is the representation of complex trajectories in terms of a few variables. Unlike other problems where the raw time series is modeled, feature selection and engineering play a critical role in determining model performance. We studied the inclusion of several additional features like the radius of gyration of the small molecule, LJ interaction between the lipid molecules, and LJ interaction between the lipid and water molecules, but they did not improve the models. The features are further discussed in Section 3.4 in terms of model generalizability. Continuing with the assumption that sentences in natural languages and time series are both sequences and hence analogous, the attention mechanism becomes an obvious choice for building a network to model the latter. We implemented a simple block in Model-LA, similar to the encoder section of Bahdanau attention, for computing the context vector from the output of our first LSTM layer. The results show that Model-LA achieves comparable or better performance than Model-L, thus justifying the analogy. Moreover, the training time per epoch for Model-LA is half of that for Model-L, making it cheaper to train.
| Model | Simulation Time | Performance metrics | |||||
|---|---|---|---|---|---|---|---|
| Training set | Validation set | Test set | |||||
| MAE | R 2 | MAE | R 2 | MAE | R 2 | ||
| Model-L | 6 ns | 0.30 | 0.994 | 0.33 | 0.993 | 0.42 | 0.988 | 
| 12 ns | 0.40 | 0.988 | 0.47 | 0.986 | 0.50 | 0.983 | |
| 18 ns | 0.35 | 0.992 | 0.39 | 0.990 | 0.49 | 0.983 | |
| Model-LA | 6 ns | 0.27 | 0.994 | 0.34 | 0.990 | 0.40 | 0.989 | 
| 12 ns | 0.23 | 0.995 | 0.31 | 0.993 | 0.47 | 0.984 | |
| 18 ns | 0.22 | 0.996 | 0.36 | 0.992 | 0.39 | 0.986 | |
This lesser data requirement can be attributed to the simpler physics of CG systems whose interactions are reliably represented with short trajectories. We don't need to capture any biophysical phenomena with slow kinetics and long relaxation times. Furthermore, the energy minimization and equilibration of the systems before the production runs ensure high-quality data right from the start of the simulations. Simulations longer than what is used in our work may have a detrimental effect on the model quality, in addition to being computationally expensive. Molecules with extreme degrees of hydrophobicity or hydrophilicity may spontaneously diffuse from the lipid phase to the aqueous phase or vice versa during the simulation, leading to undesirable data points in the time series. However, an exact interpretation of neural network performances based on the physics of the system is still elusive. Although further optimization may be possible, we consider both 21 ns and 6 ns CG MD of small drug-water and drug-lipid systems to be extremely viable tasks for high-throughput free energy calculations, especially using modern computing resources with hardware accelerators and massive parallelization.
It is often more vital in virtual screening applications to correctly rank molecules based on their relative free energy of permeation even if the absolute predictions are not too accurate. Generally, a few top candidates are chosen from hundreds or thousands after screening and subjected to explicit simulations or in vitro experiments. The Pearson correlation coefficient and Spearman's rank correlation coefficient for Model-LA with the trimer-test set were calculated to be 0.969 and 0.967, respectively. Fig. 6 also reveals the linear, monotonous relationship between the actual and predicted free energy values. Hence, this model can be reliably deployed to screen small drug-like molecules represented by three beads, even though the training data only involved two-bead molecules.
The normalization of the three properties – LJ interaction between molecule and water, solvent accessible surface area, and LJ interaction between molecule and lipids, by the number of beads constituting the molecules, enables the models' scale-up. The model predictions are especially sensitive to the two LJ interaction features. We studied models without this normalization and observed errors over 5 kcal mol−1 and negative R2 for the trimer-test set. If we compare a dimer and a trimer made of a single bead type, the permeation free energies should ideally be similar because of their comparable hydrophobicity or hydrophilicity. But the average LJ interaction values for the trimer are about one and a half times that of the dimer due to their bead counts. As the models are trained on dimers only, they do not generalize to trimers without the normalization. The radius of gyration, a popular choice as a feature for studying small molecules, caused no significant change in the model performance for dimers but considerably worsened the trimer predictions and was omitted.
Our approach can be best utilized in combination with the Auto-martini tool,57 which constructs Martini structure and topology file for simulation in GROMACS from simplified molecular-input line-entry system (SMILES) notation of small organic molecules. More than 500![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 000 molecules fall within the scope of our DL models and hence can be easily converted to CG representations. Using our method, high-performance computing systems can potentially screen hundreds to thousands of molecules per day, depending on their configuration. End-users with limited computational resources too can compute free energies of hundreds of molecules within a realistic time frame using standard personal computers.
000 molecules fall within the scope of our DL models and hence can be easily converted to CG representations. Using our method, high-performance computing systems can potentially screen hundreds to thousands of molecules per day, depending on their configuration. End-users with limited computational resources too can compute free energies of hundreds of molecules within a realistic time frame using standard personal computers.
The training times per epoch using the same processor were 6 seconds and 3 seconds for Model-L and Model-LA, respectively, with 350 data points from the entire 21 ns trajectory as input. They were reduced to 2 seconds and 1 second, respectively, when only 6 ns simulation data was used. Therefore, replacing the second LSTM layer in Model-L with an attention mechanism halves the training duration. This decrease can be particularly advantageous when the amount of training data is enormous. The errors can also be minimized below desired levels by only training for 200 to 300 epochs. Hence, our architectures, especially Model-LA, can be easily retrained with more data to cover an even larger section of the small molecule chemical space.
The primary motive behind integrating MD simulations and ML is enormous data being generated in form of trajectory, while the power of ML models increase as more data is fed into it. Hence, combining the two techniques can help in generating deeper insights and making better predictions at low computational costs, along with acceleration and automation of molecule and material design pipelines. In this work, we use data from entire trajectories instead of compressing them into average quantities to leverage the power of advanced sequence modeling techniques like LSTM and attention mechanism. Calculation of free energy from simulations is challenging since the entropy contribution to free energy cannot be estimated simply by averaging over the snapshots of a trajectory, unlike other thermodynamic quantities like temperature, pressure, density, and enthalpy. For accurate computation, MD-based methods like free energy perturbation, Bennett acceptance ratio, umbrella sampling, and metadynamics rely on extensive sampling of the system to capture various relevant configurations at different states of interest. By representing the trajectories as multidimensional series of features, we similarly incorporate information regarding multiple microstates during the temporal evolution of the system along a low dimensional space of eight parameters. Properties like LJ interactions, bond energy, and solvent-accessible surface area are not functions of time themselves, but their fluctuating values over a trajectory help us in constructing time series to encode vital entropic information. Moreover, free energy is a thermodynamic quantity whose calculation using simulations relies on extensive sampling of a molecular system and our approach incorporates multiple relevant microstates in the model. As a result, the two models developed in this paper performs better than traditional neural networks trained on static or average data. Our approach also captures the exact physics required to study permeation–the interaction of the small molecule with water and with the lipid bilayer, unlike the existing works discussed previously where bulk simulation of the molecule or molecule in water simulation is performed.
Application of ML in MD simulations is a growing field of inquiry, especially with the rise in deep learning and high-performance computing. Researchers have applied data-driven techniques, including advanced neural networks, for a wide range of fundamental and applied tasks like force field development, coarse graining of molecules, prediction of thermodynamic and physicochemical properties, improvement of sampling efficiency, and acceleration of the simulations. An excellent review of the state-of-the-art can be found in the article by Frank Noé and co-workers.58 Simultaneously, development of open-source software tools like TorchMD59 is critical for easier and faster implementation and benchmarking of new ideas. The simulation design, trajectory handling, and modeling approaches proposed in our work can generalize to other applications as well, if aided by relevant domain expertise in problem formulation and feature selection.
Although our models are powerful, their scope and robustness can be enhanced by training with datasets of trimers, tetramers, or even larger molecules. The initial simulations to generate data may be computationally expensive because the number of possible combinations of beads blows up as the bead count increases. Still, eventually, it can replace US as the go-to alternative when only the free energy value is needed and not the PMF profile and mechanistic details. Recently, Kadupitiya et al. showed that the statistical errors associated with MD trajectories can be leveraged to improve neural network predictions—a noteworthy contribution that can influence the development of more generalizable DL models based on simulation data in the future.60 This work does not cover free energy barriers at the lipid–water interface for permeation, which may play a crucial role in some instances. Building models which learn from simulation of molecules at interfaces and can simultaneously predict the free energy of permeation and interface barrier can be a topic of future study. Developing an automated framework that integrates coarse-graining tools, MD simulations using GROMACS, and DL models will be hugely beneficial for practical purposes. Such a framework would require the SMILES string and the lipid type as an input from the user and furnish the predicted free energy as the output after carrying out the intermediate operations. This work also demonstrates the relevance of adopting recent progress in NLP to problems in biochemistry and provides insights that can be useful for studying other biomolecular systems and processes.
| Footnote | 
| † Electronic supplementary information (ESI) available: Martini bead types; alternate training-validation-test data splits; molecules in trimer-test set; Unravelling LSTM and Attention mechanism; Loss history plots. See DOI: https://doi.org/10.1039/d2dd00119e | 
| This journal is © The Royal Society of Chemistry 2023 |