A vast majority of Americans believe that the tone and nature of political debate in the United States has become more negative in recent years [1] and more than half have the impression that Donald Trump is responsible.
But do these subjective impressions reflect the true state of US political discourse? As politics impacts nearly every aspect of our personal lives the answer to this question comes with important societal implications, including the level of support for, and perceived legitimacy of, political institutions resulting in a decrease of trust in political processes.
To date, data-driven evidence regarding the perceived shift towards a more negative political tone has been scarce, partly due to the difficulty of obtaining a comprehensive, longitudinal record of what politicians say. Now, researchers in the Data Science Lab (DLAB), part of the School of Computer and Communication Sciences , have developed a new database, Quotebank , and they used it to analyze how the tone of US politicians’ language, as reported in online media, evolved between 2008 and 2020.
Quotebankcontains a corpus of nearly a quarter-billion (235 million) unique quotes extracted from 127 million online news articles published by a comprehensive set of online news sources over the course of nearly 12 years, and a machine learning algorithm then automatically attributes the quotes to the speakers who likely uttered them.
In the study , published in Nature Scientific Reports, the researchers focused on US politicians, deriving a subset of 24 million quotes by 18,627 speakers, enriched with biographic information from the Wikidata knowledge base. As no comparable dataset of speaker-attributed quotes was available before, Quotebank enabled the researchers to analyze the tone of US politicians’ public language, as seen through the lens of online news media, at a level of representativeness and completeness that was previously impossible.
Negativity confirmed
"In order to quantify the prevalence of negative language over time, we used established psycholinguistic tools to score each quote with respect to its emotional content, aggregated quotes by month, and worked with the resulting time series," explained Professor Robert West, Head of DLAB."We found that during Barack Obama’s tenure in the White House the frequency of negative emotion words decreased continuously but then it suddenly jumped up with the 2016 primary campaigns. When removing Donald Trump’s quotes from the data, the June 2015 increase in negative emotion words, when Trump kicked off his campaign, dropped by 40%. Seen the other way round, Trump’s quotes increased the effect size by 63%. So, Trump was clearly the main driver of the effect, although not the only one," West continued.
The results objectively confirm the subjective impression held by most Americans that recent years have indeed seen a profound and lasting change toward a more negative tone in US politicians’ language as reflected in online news, and that Donald Trump’s appearance in the political arena was linked to a directional change, rather than a continuation of previously existing trends in political tone.
"Our results have implications for how we see both the past and the future of US politics. They emphasize the symptoms of growing toxicity in US politics from a new angle and they highlight the future danger of a positive feedback loop of negativity. Finding ways to break out of this cycle of negativity is one of the big challenges faced by the United States today," reflected West.
Quotebank
The paper, United States politicians’ tone became more negative with 2016 primary campaigns was the first to be published using Quotebank, the unique new tool designed and built by DLAB, and there are other research projects in the pipeline. The team began working on the platform in 2017, using artificial intelligence to extract the dataset and then fine-tuning Google’s open source BERT Natural Language Processing framework for the specific task of attributing quotes to speakers.It contains two types of data: quotation-centric data in which quotations are aggregated across all their occurrences in the news, and article-centric data: each entry in this dataset is a news article that contains one or more quotations.
Akhil Arora, a PhD candidate based in DLAB, whose research has focused on Quotebank, says he’s proud of helping develop a tool that is very accessible and can play a key role in keeping democracy honest. He also acknowledged the contributions of Jozef Coldenhoff, a Masters student affiliated with DLAB, who played an important role in the deployment of the tool.
"This data has always been accessible to programmers but with this interface anyone can now explore it. I think Quotebank is highly useful for accountability and for plain fact checking. For the first time, we have an open access sharp knife that allows you to dig up all the facts about what someone has said and in politics this is so important," said Arora. "Another is reflexivity. Journalists for example may not even be aware that their field has moved into a worse state over time and the Trump research clearly shows an increase in the use of non-objective language."
West agrees. One limitation of Quotebank is that the dataset is static and he sees the potential for partnerships to overcome data access challenges.
"Our technology is straightforward, so instead of us bringing data to our algorithm, Google News or Reuters or Bloomberg, which collect news and make it publicly available could use our algorithm and add a quotation interface to their database. It would be the ideal scenario and give us an ongoing barometer of the state of our democracy and information structures, meaning that we could then act if things start going wrong," he concluded.
[1] https://www.pewresearch.org/politics/wp-content/uploads/sites/4/2019/06/PP_2019.06.19_Political-Discourse_FINAL.pdf