Produced by the Office of Marketing and Communications
Researcher Puts Natural Language Processing to the Test With Sci-Fi Language
Image by Alamy
From the Elvish and other languages spoken in “Lord of the Rings” to Dothraki in “Game of Thrones,” successful fantasy and science fiction franchises frequently feature their own real, but constructed, languages. These creations often have many of the same syntactic or semantic features as commonly spoken languages, and some—such as Klingon from “Star Trek”—have been extensively developed, complete with online dictionaries and translators.
Now, a leading University of Maryland expert in natural language processing (NLP)—a subfield that combines linguistics, computer science and artificial intelligence to better understand the interactions between computers and languages—is giving another “Star Trek” language the NLP treatment in the first study of its type.
Computer science Associate Professor Jordan Boyd-Graber, a lifelong Trekkie known for incorporating Klingon into class NLP assignments, collaborated with University of Arizona Assistant Professor Peter A. Jansen to investigate machine translation of Tamarian with a collection of translated English-Tamarian phrases.
Like the fictional language itself, it’s anything but a straightforward task. Instead of direct references, “Star Trek’s” Tamarians speak in metaphorical references grounded in stories that—like symbols—have learned associations with their true meaning. For example, instead of saying, “I want to give this to you,” a Tamarian would say, “Temba, his arms wide.”
This unusual structure poses a challenge for both the characters and the automated translation systems onboard the Enterprise. Likewise, the Tamarians cannot understand starship Capt. Jean-Luc Picard’s straightforward use of language.
First, the researchers created a dictionary of 50 Tamarian phrases paired with 456 parallel English phrases that captured the inferred meaning of each Tamarian expression. Almost half of them were gleaned from a Reddit thread, while the rest came from context clues in tie-in novels from the “Star Trek” universe.
They discovered that their machine translation system had a 76% accuracy rate in translating English phrases to Tamarian metaphorical utterances.
“Our results suggest that automatically translating metaphor-grounded languages may be feasible, but it is extremely difficult,” said Boyd-Graber, who has appointments in the College of Information Studies (iSchool) and University of Maryland Institute for Advanced Computer Studies.
While Tamarian is a fictional language, the researchers said that their paper demonstrates large language models’ abilities and limitations. They also discuss what it would take to grow Tamarian—or a similar language—into a more complete artificial language like Klingon, and how their work can help computers—which work best with literal language—better understand metaphors like “between a rock and a hard place.”
College of Computer, Mathematical, and Natural Sciences College of Information Studies
Maryland Today is produced by the Office of Marketing and Communications for the University of Maryland community on weekdays during the academic year, except for university holidays.
Faculty, staff and students receive the daily Maryland Today e-newsletter. To be added to the subscription list, sign up here:Subscribe