How data science is helping to combat COVID-19

How data science is helping to combat COVID-19

Tim Coupe
by Tim Coupe

Since December 2019, we have seen a barely reported virus in China become truly devastating. Over 6 million infections, 375,000 deaths, and the world is largely at a standstill. As governments try to minimise the impact, it has been made abundantly clear that modelling the spread of the disease is vital.

The study of the spread of diseases is referred to as epidemiology. Epidemiologists traditionally come from a strong mathematics background rather than a biological one, using traditional techniques to model the spread of diseases.

It’s often tempting to believe that a machine learning solution will instantly perform more effectively than traditional techniques. However, rather than replace these ingrained, tried and tested epidemiological techniques, machine learning complements them.

Machine learning tools can quickly and effectively analyse a mass of data in a variety of formats. This can give epidemiologists additional, crucial information about how the disease is acting that improves the effectiveness of their traditional disease modelling.

Natural Language Processing

Natural Language Processing (NLP) is a hot topic in the world of machine learning right now. NLP tools use machine learning to read and understand spoken or written language. Alibaba have allegedly created an NLP solution that performs better than humans when interpreting text.

In response to the COVID-19 pandemic, NLP is being used by the Allen Institute for AI and the World Health Organisation to interrogate and extract key information from over 52,000 scholarly articles.

Similarly, NLP solutions are also being used to track Twitter activity relating to COVID-19. By tracking the tweets of an area over time you can get information on how many people are showing symptoms in that area and how many people have recovered.

The information gleaned from these analyses guides epidemiologists to have a greater understanding of how the virus works and how it is changing – which strengthens their traditional modelling techniques.

Collaborative ML Solutions

To answer more specific questions, a different approach is required.

Kaggle is a very well-established community owned by Google. The Kaggle platform facilitates the global sharing of machine learning tools and insights amongst researchers – and it is being used extensively to provide analysis on many different aspects of COVID-19.

For instance, people are using statistical and visualisation methods to show the current state of the virus and how it is affecting different countries in different ways. Clustering algorithms are also being used to provide a more mathematically rigorous method of where the virus is currently.

All of this information is used by researchers to improve their knowledge and understanding of how the virus is spreading which leads to them creating better models.

Machine Learning standalone solutions

It should be made clear that data science and machine learning do have more to offer than purely complementing epidemiology.

To name a few examples:

Given the hype surrounding machine learning, many would automatically look to ML tools and techniques to better model the spread of COVID-19. However, traditional, established models should be the first port of call, with ML tools helping to make these models more effective.

Due to how widely applicable machine learning is it can still have a part to play in other areas, but it is important to understand when machine learning should take centre stage and when it should be an enabler.

discuss this with us & find more insights on

Want to learn more?

let's talk