- October 30, 2025
- By Georgia Jiang
When you make a voice call through Zoom, FaceTime or WhatsApp, you’re not just having a conversation, but revealing everything from your age and gender to your emotional state and social background. And increasingly, artificial intelligence is listening.
Voice data can be dangerous in the wrong hands, enabling targeted phishing attacks, deepfake generation, biometric theft and even sophisticated social engineering, said a University of Maryland researcher working to address the threat.
“We already see phishing based on our online activities and what we type in emails,” said Nirupam Roy, an associate professor of computer science. “Now, a significant amount of our voice communications flow through digital platforms, so there's an unprecedented vulnerability in privacy when it concerns our own speech.”
To protect human voice data from being stolen and used by malicious third parties, Roy and his research group at UMD designed VoiceSecure, an innovative system that essentially uses AI to fight AI—obscuring speech from artificial intelligence while keeping conversations crystal clear to human ears.
According to Roy, the greatest challenge in addressing privacy concerns is not the content of the conversation, but the “meta-linguistic” information that human voices carry: emotions, biological characteristics and stress patterns.
“Government and military conversations often require strong protection against voice eavesdropping, but even low-stakes conversations can reveal a ton of information,” Roy said. “A mother’s FaceTime conversation with her son can reveal crucial personal details that can be used for creating anything from targeted ads to voice cloning for use in fraud.”
Scammers and deepfake creators use AI-generated voices to make their schemes more convincing. Biometric theft allows unauthorized access to voice-authenticated systems, such as bank accounts or patient health records. And sophisticated social engineering attacks—such as the impersonation of a loved one in trouble as part of a scam—become far more effective when attackers use detailed profiles built from genuine human speech patterns and biometric details.
Companies and platforms tend to already have procedures in place to keep user data safe, Roy said, but these strategies often fall short in practice. Some solutions involve adding obscuring noise to audio conversations, which can degrade call quality for users.
Traditional encryption, the most commonly used technique, also faces significant challenges, including the need for both ends to encrypt and decrypt content in real time—consuming large quantities of computing power that not every device can comfortably sustain. Incompatibility of users’ devices, such as a desktop computer versus a mobile device, may also create security weak spots that adversaries can exploit.
“When communication systems become more complicated, end users lose control over their own data,” Roy said. “Even when we have end-to-end encryption on many platforms, these protections are often optional, difficult to implement or simply not followed. And it becomes easier for bad actors with tools like AI to exploit these weaknesses.”
His VoiceSecure system aims to address those constraints and fight malicious attacks by leveraging one key difference between humans and machines: how they both process sound. Human hearing has built-in limitations, and people aren’t sensitive to every sound frequency equally, Roy explained. For example, two sounds close together at higher frequencies often cannot be deciphered as different.
“Psychoacoustic effects shape how our brains understand sound—it's not just about frequency, but also sensitivity and context,” he said. “By contrast, machines treat all frequencies as individual data points with mathematical precision. They analyze every acoustic feature to identify speakers and extract information.”
Using AI-powered reinforcement learning, the VoiceSecure system optimizes voice signals to suppress features that machines rely on for recognition and profiling, while still preserving the characteristics of speech our brains use to understand speech and recognize people. VoiceSecure, which works as a microphone module operating at a firmware or driver level, captures and transforms voice data at the earliest possible point in the communication pipeline before it even reaches a device’s operating system.
That delicate balance between human and machine listening could be a protective barrier between a private conversation and an unwanted AI eavesdropper, Roy noted.
“Voice communication is very personal, so we wanted to maintain that human quality in our system,” Roy said. “A mother should still be able to recognize her son’s voice during a call, but automated AI surveillance systems should fail to identify the speaker or extract sensitive biometric data.”
Roy and his team have already successfully tested altered audio from VoiceSecure on real users, confirming that conversations remain intelligible to humans while impenetrable to machines. Users can also customize their preferred privacy levels and maintain control of their voices without relying on the actions or technology of other parties, including their conversation partners and the communication platform. The team hopes to work with engineers and industry partners to package the system as an installable software that can be applied to all computers and smart devices.
In the meantime, Roy notes that human vigilance is just as vital as technological defense in protecting digital systems and our privacy.
“Awareness is the key to ensuring security when humans are in the loop,” he said.
AI at Maryland
The University of Maryland is shaping the future of artificial intelligence by forging solutions to the world’s most pressing issues through collaborative research, training the leaders of an AI-infused workforce and applying AI to strengthen our economy and communities.
Read more about how UMD embraces AI’s potential for the public good—without losing sight of the human values that power it.