Sonification for Research: Techniques, Efficacy, and Applications (SoniTEA)
Flatiron Institute, Center for Computational Astrophysics
20-22 October 2025
Preprints
“Auditory Analytics for pattern discovery in protein folding dynamics,” submitted to the Journal of Chemometrics | CONTRIBUTORS: Carla Scaletti*; Kurt J. Hebel; Martin Gruebele*
We introduce Auditory Analytics, a methodological framework that utilizes data sonification for scientific discovery. Auditory Analytics describes a cycle of collecting and deriving datasets, mapping data to audible signals (sonification), analytical listening, hypothesis formulation, and tool building, where human insights from any stage of the cycle can feed back into further iterations of the cycle in the form of new datasets, alternative mappings and new models of the original phenomenon. In Auditory Analytics, the remarkable capacities of the human auditory system to monitor complex soundscapes, track multiple sources, and extract meaningful information across multiple timescales are repurposed for exploring, interpreting, and analyzing data. To demonstrate its potential for uncovering relationships and dynamics in physical systems, we apply the Auditory Analytics methodology to the domain of protein-folding, investigating state transitions in a molecular dynamics simulation of the GTT WW domain protein. Auditory Analytics led to the identification of distinct hydrogen bonding patterns that occur as the protein transits between folded and unfolded states and thus to a deeper understanding of the process of protein folding. A single, isolated data mapping — whether visual, auditory, haptic, mathematical, or verbal — provides an incomplete picture of reality; by adding the Auditory Analytics cycle to our portfolio of data interpretation tools, we can build a more complete picture of physical phenomena.
“Water-mediated hydrogen bonds and local side chain interactions in the cooperative collapse and expansion of PNIPAM oligomers,” submitted to the Proceedings of the National Academy of Sciences | CONTRIBUTORS: Wanlin Chen, Martin Gruebele, Martina Havenith, Kurt J. Hebel, Carla Scaletti. Supporting Information (includes links to sonification/animations on YouTube)
Poly(N-isopropylacrylamide) (PNIPAM), a thermoresponsive homopolymer, is a well-established model for investigating coil-to-globule transitions. Here, we combine molecular dynamics (MD) simulations, data sonification, and graph-theory analysis to elucidate the roles of intramolecular and PNIPAM–solvent hydrogen-bond (H-bond) patterns in PNIPAM compaction dynamics. Our analysis separates the driving forces for compaction into two contributions: the entropic gain from the loss of hydration water around hydrophobic patches, and the enthalpic stabilization from water H-bonded to PNIPAM. We find that the role of the solvent in polymer compaction is more active and complex than has been previously assumed. Our observations indicate that direct, intra-chain hydrogen bonds between amide groups (N-H···O=C) are not the primary stabilizing force. Instead, the collapsed globule contains an N..HN network of local side chain interactions and is stabilized by a dynamic network of persistent, long-distance water bridges, where single water molecules form hydrogen bonds with multiple parts of the polymer chain.
Publications
“Hydrogen bonding heterogeneity correlates with protein folding transition state passage time as revealed by data sonification,” Proceedings of the National Academy of Sciences 2024-05-28 | Journal article DOI: 10.1073/pnas.2319094121 | CONTRIBUTORS: Carla Scaletti*; Premila P. Samuel Russell; Kurt J. Hebel; Meredith M. Rickard; Mayank Boob; Franz Danksagmüller; Stephen A. Taylor; Taras V. Pogorelov; Martin Gruebele*
Protein–protein and protein–water hydrogen bonding interactions play essential roles in the way a protein passes through the transition state during folding or unfolding, but the large number of these interactions in molecular dynamics (MD) simulations makes them difficult to analyze. Here, we introduce a state space representation and associated “rarity” measure to identify and quantify transition state passage (transit) events. Applying this representation to a long MD simulation trajectory that captured multiple folding and unfolding events of the GTT WW domain, a small protein often used as a model for the folding process, we identified three transition categories: Highway (faster), Meander (slower), and Ambiguous (intermediate). We developed data sonification and visualization tools to analyze hydrogen bond dynamics before, during, and after these transition events. By means of these tools, we were able to identify characteristic hydrogen bonding patterns associated with “Highway” versus “Meander” versus “Ambiguous” transitions and to design algorithms that can identify these same folding pathways and critical protein–water interactions directly from the data. Highly cooperative hydrogen bonding can either slow down or speed up transit. Furthermore, an analysis of protein–water hydrogen bond dynamics at the surface of WW domain shows an increase in hydrogen bond lifetime from folded to unfolded conformations with Ambiguous transitions as an outlier. In summary, hydrogen bond dynamics provide a direct window into the heterogeneity of transits, which can vary widely in duration (by a factor of 10) due to a complex energy landscape.