Natural language processing models use statistics to collect a wealth of information about the meaning of words.
In “Through the Looking Glass”, Humpty Dumpty contemptuously says, “When I use a word, it means exactly what I choose it to mean – no more and no less.” Alice replies, “The question is whether you can make the words mean so many different things.”
The meaning of words has long been the subject of research. To understand their meaning, the human mind must sort through a complex web of flexible and detailed information.
Now a more recent issue with the meaning of words has surfaced. Researchers are investigating whether machines with artificial intelligence would be able to mimic human thought processes and understand words in the same way. Researchers from UCLA, MIT and the National Institutes of Health have just published a study that answers this question.
The study, published in the journal Nature Human behavior, demonstrates that artificial intelligence systems can really grasp very complex word meanings. Researchers have also found a simple method to access this sophisticated information. They found that the AI system they examined represents the meaning of words in a way that closely resembles human judgment.
The AI system explored by the authors has been widely used to analyze the meaning of words over the past decade. It captures the meaning of words by “reading” huge amounts of material on the Internet, which contains tens of billions of words.
When words frequently appear together – “table” and “chair”, for example – the system learns that their meanings are related. And if pairs of words very rarely appear together – like “table” and “planet”, – he learns that they have very different meanings.
This approach seems like a logical starting point, but consider how well humans would understand the world if the only way to understand meaning was to count how often words occur next to each other, with no ability to interact with others and our environment.
Idan Blank, assistant professor of psychology and linguistics at UCLA and co-senior author of the study, said the researchers sought to find out what the system knows about the words it learns and what kind of “good sense” he possesses.
Before the search began, Blank said, the system seemed to have one major limitation: “As far as the system is concerned, every two words only have one numerical value that represents how similar they are.”
In contrast, human knowledge is much more detailed and complex.
“Consider our knowledge of dolphins and alligators,” Blank said. “When we compare the two on a scale of size, from ‘small’ to ‘large’, they are relatively similar. In terms of intelligence, they are somewhat different. In terms of how dangerous they are to us, on a scale from ‘safe’ to ‘dangerous’, they differ greatly. Thus, the meaning of a word depends on the context.
“We wanted to ask if this system actually knows about these subtle differences – if its idea of similarity is flexible in the same way that it is for humans.”
To find out, the authors have developed a technique they call “semantic projection”. One can draw a line between the model’s representations of the words “big” and “small,” for example, and see where the representations of the different animals fall on that line.
Using this method, scientists studied groups of 52 words to see if the system could learn to sort out meanings – like judging animals by their size or how dangerous they are to humans, or ranking US states by weather. or by overall wealth.
Other word groups included terms related to clothing, professions, sports, mythological creatures and first names. Each category was assigned several contexts or dimensions – size, danger, intelligence, age and speed, for example.
The researchers found that, across these many objects and contexts, their method proved to be very similar to human intuition. (To make this comparison, the researchers also asked cohorts of 25 people each to make similar ratings on each of the 52-word groups.)
Remarkably, the system learned to perceive that the names “Betty” and “George” are similar in terms of being relatively “old”, but they represent different genders. And that ‘weightlifting’ and ‘fencing’ are similar in that both generally take place indoors, but differ in the amount of intelligence they require.
“It’s such a simple and completely intuitive method,” Blank said. “The line between ‘big’ and ‘small’ is like a mental scale, and we place animals on that scale.”
Blank said he didn’t expect the technique to work, but was thrilled when it did.
“It turns out that this machine learning system is a lot smarter than we thought; it contains very complex forms of knowledge, and that knowledge is organized in a very intuitive structure,” he said. “Just by keeping track of the words that coexist in the language, you can learn a lot about the world.”
Reference: “Semantic projection retrieves rich human knowledge of multiple object features from word incorporations” by Gabriel Grand, Idan Asher Blank, Francisco Pereira and Evelina Fedorenko, April 14, 2022, Natural human behavior.
The study was funded by the Office of the Director of National Intelligence, Advanced Intelligence Research Projects Activity through the Air Force Research Laboratory.