Study finds Twitter mentions of pneumonia up in Europe ahead of COVID-19 outbreak

The study compared mentions of pneumonia in the winter of 2019-2020 to those of previous winters.
By Laura Lovett
01:58 pm
Coronavirus tracking

Twitter could play a role in predicting COVID-19 outbreaks across Europe, according to a new study published in Nature. The research compares the winter of 2019-2020 to the last five winters and found an uptick in levels of concern about pneumonia in regions that soon after became coronavirus hot spots.

“[W]e have analyzed data from Twitter across a number of European countries to show that unexpected levels of concerns about pneumonia had been raised for several weeks before the first cases of infection were officially announced,” authors of the report wrote.

“Interestingly, we also show that whistleblowing came primarily from the geographical regions that turned out to be the key breeding grounds for infections. Our infodemiological approach to studying the spread of COVID-19 across Europe can help policymakers to better identify, geo-localize and manage chains of infection across national borders and linguistic barriers.”


Researchers created a new database that collected Twitter messages and posts with the word “pneumonia” in them across seven European languages from time period of Dec. 2014 to March of 2020.

The study found that besides Germany, posts and messages containing the word “pneumonia” were statistically higher in the winter of 2019-2020 than in 2018-2019. When researchers compared the 2019-2020 winter to the last five winters they reported similar findings.

While, 2020 had an increase in tweets about pneumonia, authors said that these tweets were before an outbreak of COVID-19 was reported.

“By leveraging social media, these findings offer the first clear accounting of how far behind many European countries were in detecting the virus," authors of the study wrote. 

"At the same time, the approach here outlined shows how governments, policy-makers and local authorities can obtain important contextual geo-localized information in real time for devising effective intervention policies throughout the whole epidemiological cycle, from the investigation and recognition phases of a pandemic up to the deceleration and preparation phases.” 

Researchers also created a data base for dry cough, and found similar results, with the prevalence of mentions significantly increasing before a coronavirus outbreak was reported.


Authors of the report used the Twitter API to collect information. This data was then vetted.

“The initial data set concerned with the winter seasons 2020–2019 and 2019–2018 included 573,298 unique users and a total of 891,195 unique tweets. From this data set we extracted a sample including tweets concerned with pneumonia and posted in the period between 15 December 2018 and 21 January 2019 and the period between 15 December 2019 and 21 January 2020. To conduct further robustness checks, we also extracted samples of tweets posted in all other corresponding winter seasons since 2014,” authors of the report wrote.

Researchers then narrowed down those tweets by locations pinpoint the U.K., Germany, France, Italy, Spain, Poland and the Netherlands. The team adjusted for potential bias including removing posts with links to news URLs, posts from users with over 2,000 followers, and posts that included any mention of the coronavirus.

“We then identified the users that cited pneumonia in the selected European countries between 15 December 2019 and 21 January 2020, and compared them with the total number of users that cited pneumonia in the same weeks of the previous year,” authors wrote.

Researchers performed two sample Kolmogorov-Smirnov tests on the data, and an Anderson-Darling test.


Over the course of the year social media has frequently been tapped as a means of tracking and tracing the coronavirus spread. In April Facebook announced three new maps to its Data for Good program, which is aimed at tracking the spread of the virus. The maps include a co-location map, which helps determine the probability of users coming into contact with each other, another focused on movement range, and a third on social connectedness.

Big tech also made its way into contact-tracing efforts. Apple and Google teamed up on a contract-tracing effort that notifies users when they have come into contact with the virus.


The latest news in digital health delivered daily to your inbox.

Thank you for subscribing!
Error! Something went wrong!