- October 29, 2025
- By John Tucker
ABBA enthusiasts instantly recognize the 1976 hit “Money, Money, Money” when they hear its quick-tempo staccato piano intro, while pop music devotees with musical training might also note the 4/4 time signature in the key of A.
But would the Swedish band’s superfans register that the song isn’t just pop, but a disco-funk-pop hybrid brightened by frequent major-chord lifts? Or that its four-on-the-floor acoustic drumbeat complements a syncopated electric bass and full brass section, or that lead vocalist Anni-Frid Lyngstad is a mezzo-soprano whose slightly nasal timbre evokes sassiness?
Such details are tracked by Music Flamingo, a new artificial intelligence (AI) model trained by University of Maryland computer scientists in collaboration with Nvidia to experience music as much like human listener as possible. The technology builds on years of work to make audio and speech understandable to AI models, and could ultimately enable a tool that makes song and playlist recommendations that aren’t just determined by listening habits, but also factor in the mood of the moment.
Popular streaming platforms like Spotify use algorithms that convert metadata into broad labels—Top 100 rap hits of the 1980s, for instance—but don’t break down a song’s emotions or musical structure, according to Sreyan Ghosh, a UMD doctoral student in computer science who describes the system in a technical paper posted this week.
“It’s based on clicks rather than deep understanding of the music, but multiple elements go into a song that makes us feel the way we feel,” he said.
An app based on Music Flamingo, in contrast, could start with both a listener’s baseline preferences pegged to listening history—like a fondness for gravelly vocals or waltz crescendos—then adjust its recommendations following ChatGPT-like prompts.
“You could tell the model, ‘’I’m feeling low. Can you cheer me up?’” said Ghosh.
Music Flamingo is designed so that it doesn’t just recognize a guitar; it recognizes a nylon-string guitar plucked in the flamenco style. Lyrically, the indie ballad “Jim and Pam,” in which the vocalist croons, “I fell in love with my best friend… I’ll leave you sleeping on my shoulder,” teaches the model about affectionate friendship and unwavering support, so it can respond to requests mentioning those things.
Ghosh plans to release the open-source technology within months through Nvidia, who funded his Ph.D. through a graduate fellowship. The tool could ultimately be packaged into a music listening app by a third party, he said.
As generative AI has spread across language and vision, from composing digital poetry to helping recognize faces in crowds, audio technology has been slower to adapt, largely because of the complex physics of how sound travels into the ear to create signals and evoke feelings, explained UMD Professor of computer science Ramani Duraiswami, a paper coauthor and Ghosh’s Ph.D. co-adviser.
“Audio has always been on peripheral devices like phones and speakers, but not connected to massive computing,” he said.
While AI applications in language and video tools have traditionally been part of computer science research, audio processing and digitization are commonly pursued by electrical engineers and neuroscientists.
In recent years, however, AI developers have crashed the audio space, building smart devices that can detect the breaking of glass to signal an accident, and therapy robots that can sense depression based on a speaker’s tone.
But music understanding presents a greater challenge because it requires comprehension of both lyrics and instrumentation that can whipsaw across six minutes, if we’re talking about “Bohemian Rhapsody.”
The shifting landscape has created an “arms race” among major tech companies to build music-understanding models, Duraiswami said, but he and Ghosh say Music Flamingo is currently the most advanced.
To train the system over the last three months, the UMD researchers loaded as many as 20 million songs representing 50 international subcultures into their model, annotating information as broad as beats per minute and refined as the level of sarcasm in a singer’s voice. Sometimes musicians joined the efforts.
A trained AI computer scientist, Ghosh sees his system for recommending the right tune at the right time as his best opportunity to bring joy to people.
“AI understanding coding will only help coders, but understanding good music will help average humans,” he said.
AI at Maryland
The University of Maryland is shaping the future of artificial intelligence by forging solutions to the world’s most pressing issues through collaborative research, training the leaders of an AI-infused workforce and applying AI to strengthen our economy and communities.
Read more about how UMD embraces AI’s potential for the public good—without losing sight of the human values that power it.