Language Applied Differently by Gender, According to Machine Learning Analysis of 3.5M Books
An analysis of 3.5 million books showed women were described more frequently based on appearance, while descriptions of men focused on behavior or character. Below, a list of the most frequently occurring adjectives for both men and women.
As the adage goes, it’s your actions that define you … unless you’re a woman. Because then, according to a mountain of English-language books, it’s your looks that define you.
Computer scientists from the University of Maryland, the University of Copenhagen and elsewhere deployed machine learning to analyze 3.5 million books published from 1900 to 2008, and found that men are typically described by words that refer to behavior, while adjectives ascribed to women tend to be associated with physical appearance. The research was recently presented at the 2019 meeting of the Association for Computational Linguistics.
“Beautiful” and “sexy” were two of the adjectives used most frequently for women, while “righteous,” “rational” and “brave” were common descriptors for men.
The research team, which includes first author Alexander Hoyle, a doctoral student in computer science, trawled through an ocean of fiction and non-fiction literature—in all, the books contained about 11 billion words—extracting adjectives and verbs associated with gender-specific nouns in combinations such as “sexy stewardess” or “girls gossiping.”
One of the project’s key aspects is that it takes into account whether descriptors are positive or negative, Hoyle said.
“What really makes this novel is we’re able to incorporate sentiment,” he said. “Words like ‘pregnant’ or ‘bearded’ might be neutral, but others like ‘hysterical,’ ‘shrewish’ or ‘chaste’ for women, are not. Scoring them gives us the ability to make quantitative comparisons in the paper.”
The researchers found that negative verbs associated with body and appearance are used five times more often for females than males. Meanwhile, positive and neutral body-appearance adjectives occur approximately twice as often in descriptions of females, while males are most frequently described by references to behavior and personal qualities.
While it’s perhaps not news that stereotypes and sexism exist in literature, the study puts it in stark, clear terms, said principal investigator Isabelle Augenstein, an assistant professor in the University of Copenhagen's Department of Computer Science.
“Thus, we have been able to confirm a widespread perception, only now at a statistical level," she said.
Although many of the books were published several decades ago, she said, they still play an active role in our lives, because algorithms used to create machines and applications that can understand human language—smartphones, for instance—are fed such text available online. This is the technology that allows smartphones to recognize our voices and enables Google to provide keyword suggestions.
As artificial intelligence and language technology become more prominent across society, it is important to be aware of gendered language.
"The algorithms work to identify patterns, and whenever one is observed, it is perceived that something is 'true',” she said. “If any of these patterns refer to biased language, the result will also be biased. The systems adopt, so to speak, the language that we people use, and thus, our gender stereotypes and prejudices.”
In addition to Hoyle and Augenstein, the research group includes Lawrence Wolf-Sonkin of Google Research, Ryan Cotterell of Johns Hopkins University and the University of Cambridge and Hanna Wallach at the University of Massachusetts-Amherst and Microsoft Research.
This article was adapted from a news release by the University of Copenhagen.
Maryland Today is produced by the Office of Marketing and Communications for the University of Maryland community on weekdays during the academic year, except for university holidays.
Faculty, staff and students receive the daily Maryland Today e-newsletter. To be added to the subscription list, sign up here:Subscribe