Advances in biomolecular processes, materials science, and nanotechnology are hindered because the scales that are used in atomistic systems are not parallel. Resolving local oscillations in robust all-atom molecular dynamics simulations requires time steps in the order of femtoseconds (1.0e-15 s), while relevant biochemical processes take place on timescales that exceed several milliseconds (1.0e-3 s).
This discrepancy – of more than 12 orders of magnitude between the simulation time horizons and the required molecular dynamics time-step resolution (femtoseconds) – results in a prohibitive number of simulation steps. Even so, prevalent spatiotemporal limitations can be overcome by using simulation tools that overarch multiple scales.
In the context of equilibrium statistical mechanics, this project involved developing data-driven and variational coarse-graining approaches based on an atomistic scale. Ultimately, we offer a novel approach to mapping between scales. While existing methodologies rely on many-to-one, fine-to-coarse mappings (e.g., defined by summarizing atoms to macromolecules), we introduced a probabilistic coarse-to-fine map. This approach corresponds to a directed probabilistic graphical model, wherein coarse-grained variables are implicitly defined by the introduced probabilistic coarse-to-fine map. Hence, the coarse-grained variables, which are latent variables, serve as generators of fully atomistic representations.
Essentially, we reformulated the approaches to a likelihood-based maximization problem that is embedded in a consistent Bayesian framework. This approach allowed for the reconstruction of a fully atomistic scale, which enabled estimations of macroscopic observables that are governed by fine-scale interdependencies. Further, this allowed for the determination of posterior distributions of model parameters, which express uncertainties due to limited training data. The prevalent uncertainties were propagated to a predictive posterior distribution over relevant quantities. More broadly, the predictive distributions reflect the credibility of the coarse-grained model and quantify uncertainties in the available data. In the end, we were left to either focus exclusively on low amounts of training data (50–1000) or completely circumvent the production of training data by adopting variational approaches.
In addition to developing predictive machine-learned coarse-graining frameworks, we sought to discover physical insights on the absence of any system-dependent knowledge. One component of this work focuses on obtaining sparse, physically interpretable solutions for the interactions of coarse-grained variables. Another seeks to reveal parsimonious lower-dimensional representations through expediting the discovery of collective variables in the reference system.
Overall, the sparse learning methods provide robust and interpretable models that can be readily generalized for further unsupervised learning problems. The framework's capabilities are demonstrated through the coarse-graining of physically-relevant reference systems (i.e., the Ising model, SPC/E water, alanine dipeptide, and ALA-15).
«
Advances in biomolecular processes, materials science, and nanotechnology are hindered because the scales that are used in atomistic systems are not parallel. Resolving local oscillations in robust all-atom molecular dynamics simulations requires time steps in the order of femtoseconds (1.0e-15 s), while relevant biochemical processes take place on timescales that exceed several milliseconds (1.0e-3 s).
This discrepancy – of more than 12 orders of magnitude between the simulation time horizon...
»