Researchers Teach Autofill-like Algorithms To Predict How and When Proteins Would Assume Different Shapes
University of Maryland researchers used an artificial intelligence system to create an abstract language from the constant motion of biological molecules, such as the lysozyme molecule shown here. This language describes the multiple shapes a protein molecule can take and how and when it transitions from one shape to another—key information for understanding disease and developing therapeutics.
The same artificial intelligence technology that guesses the next word as you type an email on your smartphone can be used to decipher a language spoken by the molecules of life, according to new research by University of Maryland scientists.
By applying natural language processing tools to the movements of protein molecules—a central element in countless biological processes—the researchers for the first time created an abstract language that describes the multiple shapes a protein molecule can take and how and when it transitions from one shape to another.
This insight into the dynamics that control the shape and structure of proteins can open a door to understanding everything from how such molecules function to the causes of disease and the best way to design targeted drug therapies.
“We show that the movement of these molecules can be mapped into an abstract language, and that AI techniques can be used to generate biologically truthful stories out of the resulting abstract words,” said chemistry and biochemistry Assistant Professor Pratyush Tiwary, senior author of a research paper published Friday in Nature Communications.
Biological molecules are constantly in motion, jiggling around their environments. Their shape is determined by how they are folded and twisted, and may remain constant for seconds or days before suddenly springing open and refolding in a process that occurs in picoseconds (trillionths of a second) or faster.
This rapidity makes it difficult for experimental methods such as high-powered microscopy and spectroscopy to capture exactly how the unfolding happens, what parameters affect the process and what different shapes are possible. The answers to those questions form the biological story that Tiwary’s new method can reveal.
He his team applied Newton’s laws of motion—which can predict the movement of atoms within a molecule—using powerful supercomputers, including UMD's Deepthought2, to develop statistical physics models that simulate the shape, movement and trajectory of individual molecules.
Then they fed those models into a machine learning algorithm, like the one Gmail uses to automatically complete sentences as you type. The algorithm approached the simulations as a language in which each molecular movement forms a letter that can be strung together with others to make words and sentences. By learning the rules of syntax and grammar that determine which shapes and movements follow one another and which don’t, the algorithm predicts how the protein untangles as it changes shape and the variety of forms it takes along the way.
To demonstrate that their method works, the team applied it to a small biomolecule called riboswitch, which had been previously analyzed using spectroscopy. The results, which revealed the various forms the riboswitch could take as it was stretched, matched the results of the spectroscopy studies.
“One of the most important uses of this, I hope, is to develop drugs that are very targeted,” said Tiwary, who has an appointment in Institute for Physical Science and Technology. “You want to have potent drugs that bind very strongly, but only to the thing that you want them to bind to. We can achieve that if we can understand the different forms that a given biomolecule of interest can take, because we can make drugs that bind only to one of those specific forms at the appropriate time and only for as long as we want.”
Equally important is the knowledge Tiwary and his team gained about the language processing system they used. The researchers analyzed the mathematics underpinning the network as it learned the language of molecular motion, and found that the network used a kind of logic that resembles an important concept from statistical physics called “path entropy.”
Understanding this opens opportunities for improving artificial intelligence in the future.
“Now that we know this, it opens up more knobs and gears we can tune to do better AI for biology and perhaps, ambitiously, even improve AI itself,” Tiwary said. “Anytime you understand a complex system such as AI, it becomes less of a black box and gives you new tools for using it more effectively and reliably.”
Additional authors of the paper from UMD include Department of Physics graduate students En-Jui Kuo and Sun-Ting Tsai.
Maryland Today is produced by the Office of Marketing and Communications for the University of Maryland community on weekdays during the academic year, except for university holidays.
Faculty, staff and students receive the daily Maryland Today e-newsletter. To be added to the subscription list, sign up here:Subscribe