The Neology Observatory at UPF launches Garbell, a digital tool that evaluates words that are not in the dictionary

Garbell, developed at the Neology Observatory at Pompeu Fabra University, within the framework of the Research Program of the Catalan Studies Institute, analyses words that are not in the standard dictionary but we use on a daily basis and tells us if they are more or less dictionarizable. This technological tool is the first in the world capable of making this assessment and could inspire similar tools for other languages. It is now operating in open access for Catalan.

Hiperactivitat, flipar, xarxa social, metavers, sostenibilitat, canvi climÓtic, guacamole, parafarmÓcia, sudoku, hip-hop or poliamor (hyperactivity, flip out, social networking, metaverse, sustainability, climate change, guacamole, parapharmacy, sudoku, hip-hop or polyamory) are examples of the thousands of words we use in our daily communication, whether formal or informal, and are not found in the standard dictionary. While the Philological Section of the Institute of Catalan Studies (IEC) does update this dictionary, Garbell helps us to know if the words that are not yet in it have a higher or lower probability of being included in the dictionary in the future. It will be up to the IEC to decide which words are, and when, but the Neology Observatory has developed a tool that assesses this likelihood based on criteria related to use, language structure and other lexicographic aspects.

Why the name Garbell?

Just as the garbell (sieve) has traditionally served to separate grain from straw, UPF’s Garbell analyses neologisms documented in recent years in texts in the press, magazines, on the radio and television and on social networks, and separates them according to whether they are more or less dictionarizable, that is, whether they are more or less eligible for inclusion in the dictionary. To do this, the program applies twenty criteria grouped into blocks.

Judit Freixa, director of the UPF Neology Observatory: "we wanted (...) to focus on developing an innovative, sustainable and, above all, useful tool, both for speakers, helping them to value the words they use and are not in the dictionary, and for the IEC, providing them with our expertise in neology" "We wanted to get away from the discussion as to whether the IEC incorporates too many or too few words into the dictionary and focus on developing an innovative, sustainable and, above all, useful tool, both for speakers, helping them to value the words they use and are not in the dictionary, and for the IEC, providing them with our expertise in neology". So sums up Judit Freixa, director of the Neology Observatory and principal investigator of the project.

How can you consult Garbell?

Anyone can access the public version of Garbell ( http://garbell.upf.edu/ ) and search for specific words or segments (starting with, ending in, containing...). At the moment there are almost 10,000 revised units, which are fairly evenly split into three categories: dictionarizable units (marked green), pre-dictionarizable units (unmarked), and undictionarizable units (marked red). So, when you consult a word, you are shown the dictionarizability status and a summary of the score obtained in each block of criteria. In addition, you can see the score of each specific criterion and also a diagram with the frequency and stability of the word over the years.

How has Garbell been developed?

Garbell has been developed by researchers of the Neology Observatory, of the research group at the Institute of Applied Linguistics (IULA-IULATERM) at Pompeu Fabra University. It is a project of the IEC Research Program. The team, led by Judit Freixa, is made up of specialists in lexicography, neology and computer science, and it has worked on this project intensively since 2018. In an initial stage, the team of linguists worked to identify the criteria that could be applied automatically, starting from NADIC (" Neologisms to update the standard [Catalan] dictionary ", https://www.upf.edu/web/nadic), a project carried out previously by the same group with the participation of the network of Catalan language neology observatories (NEOXOC) and also funded by the IEC.

In its criteria for use, Garbell takes into account the frequency of the word, its length of existence, its stability over time, and its use in different types of texts, among others. From a linguistic point of view, the program assesses the type of neologism, the regularity of the formation and formal variability. And, in addition, the final score is established also taking into account the presence of the word in other Catalan dictionaries or of other Romance languages. The twenty criteria that are applied do not have the same weighting and neither are they applied linearly, but the final score of each word is attributed with an algorithm especially designed for this project which is adjusted as the program sieves.

In this initial phase of Garbell’s public operating, the results have been reviewed by the team of linguists, who are still working to increase the number of neologisms that can be consulted.

How is Garbell innovative?

Judit Freixa (UPF Neology Observatory): "We don’t know of any tool that performs a task similar to Garbell’s. We have not been inspired by any known tool. We have innovated simply by following the path that technology allows and trying to solve the linguistic needs of Catalan" "We don’t know of any tool that performs a task similar to Garbell’s. We have not been inspired by any known tool. We have innovated simply by following the path that technology allows and trying to solve the linguistic needs of Catalan", Judit Freixa asserts.

At the moment Garbell works from the data of BOBNEO, the database where the Neology Observatory has been collecting the neologisms detected in the general field since 1989. But as the program incorporates artificial intelligence strategies, it will be able to work without the database already collected and analysed and will be able to be developed in other languages.

The most innovative thing about this tool is that it goes beyond the dichotomy between right and wrong, and what it does is separate the units into three clearly differentiated categories: neologisms that meet all or most of the requirements proposed by Garbell are dictionarizable, which already have a track record of use in society and do not demonstrate any characteristic that might question its lexicographic sanctioning. However, neologisms that are not yet sufficiently used and, at the same time, do not present any characteristic that prevents them from being so in the near future are pre-dictionarizable. Finally, neologisms that are at the start of the process (and, therefore, do not meet the minimum requirements) or have already advanced along the process and have already shown characteristics that make them unsuitable candidates are not dictionarizable.

Garbell has already been presented at several international forums, such as the CINEO, the Congress of Neology in Romance Languages (Genoa, 2 September 2022), and has been very well received. An article has also just been published in the journal TerminÓlia presenting the operation and results of Garbell.

Garbell, a complementary, progressive dictionary that guides as to use

In short, Garbell is also a linguistic resource that serves as a complementary, descriptive and progressive dictionary and guides us about the use of neologisms. It is complementary because it includes precisely what is not included in the normative dictionary; progressive, because it places candidates for neologisms at different points further from or nearer to the goal of inclusion in the dictionary; and guides the use, because it also informs about the excluded units.