Inside AI’s “Black Box”
New Research Helps Explain How Learning Algorithms (Like Your Phone’s Autocorrect) Generate Nonsense
Whether it’s helping you quickly write text messages on your smartphone or recommending new artists and songs you’ll actually like, artificial intelligence can help make life easier. But as anyone who’s been the victim of an autocorrect disaster can tell you, AI can also make mistakes.
It can be challenging for computer scientists to figure out what went wrong in such cases. This is because many of these state-of-the art machine-learning algorithms assimilate information and make their predictions inside a virtual “black box,” leaving few clues for researchers to follow.
But now, computer scientists at the University of Maryland have developed a promising new approach for interpreting how machine-learning algorithms “think.” The researchers presented their work yesterday at the 2018 Conference on Empirical Methods in Natural Language Processing in Brussels.
“Black-box models do seem to work better than simpler models, such as decision trees, but even the people who wrote the initial code can’t tell exactly what is happening,” said Jordan Boyd-Graber, senior author of the study and an associate professor of computer science at UMD. “When these models return incorrect or nonsensical answers, it’s tough to figure out why.”
Unlike previous efforts, which typically sought to “break” the algorithms by removing key words from inputs to yield the wrong answer, the UMD group instead reduced the inputs to the bare minimum required to yield the correct answer. On average, the researchers got the correct answer with an input of fewer than three words—and in some cases, they only needed one.
In one example, the researchers entered a photo of a sunflower and the text-based question, “What color is the flower?” into a model algorithm. These inputs yielded the correct answer of “yellow.” After rephrasing the question into several different shorter combinations of, they found that just “flower?” yielded the same answer.
In a more complex example, the researchers used the prompt, “In 1899, John Jacob Astor IV invested $100,000 for Tesla to further develop and produce a new lighting system. Instead, Tesla used the money to fund his Colorado Springs experiments.”
They then asked the algorithm, “What did Tesla spend Astor’s money on?” and received the correct answer, “Colorado Springs experiments.” Reducing this input to the single word “did” yielded the same correct answer.
The work reveals important insights about the rules that machine learning algorithms apply to problem solving, said Boyd-Graber, who has co-appointments at the Institute for Advanced Computer Studies (UMIACS), the College of Information Studies and the Language Science Center.
Many real-world issues with algorithms result when an input that makes sense to humans results in a nonsensical answer. By showing that the opposite is also possible—that nonsense inputs can also yield correct, sensible answers—Boyd-Graber and his colleagues could help computer scientists build more effective algorithms that can recognize their own limitations.
“The bottom line is that all this fancy machine learning stuff can actually be pretty stupid.” said Boyd-Graber. “When computer scientists train these models, we typically only show them real questions or real sentences. We don’t show them nonsensical phrases or single words. The models don’t know that they should be confused by these examples.”
Most algorithms will force themselves to provide an answer, even with insufficient or conflicting data, he said. This could be at the heart of some of the incorrect or nonsensical outputs generated by machine-learning algorithms, and greater understanding could help computer scientists find solutions and build more reliable algorithms.
“We show that models can be trained to know that they should be confused,” Boyd-Graber said. “Then they can just come right out and say, ‘You’ve shown me something I can’t understand.’”