Researchers use genomic data to map webs of COVID-19 transmission, forecast peaks for local outbreaks
Researchers at Georgia State University are modeling the real-time global spread of the SARS-CoV-2 virus, and their visualizations of COVID-19 outbreaks and transmission networks around the world reveal one key takeaway: we are truly interconnected, across all borders and oceans.
Dr. Pavel Skums, an assistant professor of computer science in the College of Arts & Sciences, and Dr. Gerardo Chowell, professor of mathematical epidemiology in the School of Public Health, teamed up to track the rapid global transmission of the new coronavirus, and published their findings in a preprint on medRxiv in late March.
Using viral genomes collected and shared by researchers from around the world, they found how shortly after the virus emerged in Wuhan, China in November 2019 it jumped to Asia, western Europe, Australia, Canada and the United States, and eventually South America and Africa.
Skums has applied similar bioinformatics techniques to track transmission and outbreaks of other infections such as Hepatitis C. He and Chowell now have new tools and extraordinary amounts of data at their disposal. Creating a global transmission network like this is possible because of advances in genomic sequencing technologies that have made sequencing rapid and affordable.
"The data on the virus is growing as fast as the virus," Skums said. "This is actually the first outbreak in history where we have so much data. It's the first global public health emergency for which next-generation sequencing technologies have been employed at such a vast scale. For Ebola, we had nothing of this magnitude."
Skums said he was working at the Centers for Disease Control and Prevention at the time of Ebola, and "scientists were traveling to Africa to help produce and analyze the data."
Scientists can now access global data from their own "shelter-at-home" computers, working together to solve the challenges the coronavirus presents. Skums, Chowell and computer science doctoral students Pelin Icer Baykal and Fatemeh Mohebbi are mining freely available data from the GISAID database, a global database where researchers upload their virus sequences, as well as related clinical and epidemiological data.
The team's analysis allows them to determine where the virus has peaked, is peaking or is yet to peak.
"Right now we see that the hotspots like New York City, Italy and Spain, have reached their maximum incidence rate," Chowell said. "They are leveling off or just started to follow a downward trend, though at very high levels."
Atlanta is about a week from the peak, he said, because interventions were not implemented here until recently.
The first genome of the novel coronavirus was sequenced in January, but scientists around the globe have since sequenced more than 5,000 other genomes of the virus.
"Global modelling like this helps us understand that there was not one single introduction of the virus in each country," said Skums.
Almost every country had multiple introductions of the virus, depicted as multiple arcs across land and ocean. For example, strains of the virus landed in Hong Kong via Shanghai, and jumped to the United Kingdom, Italy, Norway, Portugal, France and even Iceland. France may have received the virus from multiple countries, ranging from Iceland to Switzerland, Finland, Portugal, Spain and Australia. Washington can be linked to Canada, Shanghai and Australia, among other places.
Multiple points of entry show that "it is not enough to try and find the single patient zero," Skums said. Epidemiologists try to determine that first patient, because they can use the information to help determine the ultimate curve of exponential growth that fans out from that first infection. They do this through mathematical models that include how contagious the virus is, how long the incubation period is, whether it can be transmitted while someone is asymptomatic and other factors.
This information is not enough in a global pandemic.
"Our model really shows that closing travel from one country, such as China, won't make enough of a difference," Skums said.
By the time a country or the world realizes we are in danger of a pandemic, the seeds have already been widely dispersed.
"An epidemic is formed by clusters, or local outbreaks, that are not entirely synchronous," said Chowell, who has studied the arc of outbreaks ranging from the so-called "Spanish flu" of 1918 to the Ebola epidemic of 2014. Chowell said one can create an overall picture of a pandemic across the world from those clusters, but one can also drill down to discover just how and where the virus is moving, find out where more severe outbreaks are occurring and predict what lies ahead for the next 2-3 weeks locally and globally.
The researchers' model shows multiple interconnected vertices across the world, and each vertex, they say, represents one distinct genome of the virus that has been sequenced. The model, which is updated weekly, shows where the most intense outbreaks are and the infectious arcs that branch out from them, like the fan of an immense spider web. Skums compares this to a person on Twitter who has a few million followers.
"Their social network is huge," Skums said. "But another person on Twitter may have only a few followers."
This kind of information is especially important as a new virus like this spreads rapidly through what is known as a "naïve population." Humans have never encountered this virus before, so the human immune system has no innate defenses against it.
Skums hopes that as researchers study the transmission networks in the coming weeks and months, they may be able to see where outbreaks are more intense, which strains are most prevalent and whether certain strains may have mutations responsible for higher infectivity or virulence. He and Chowell also have begun to model transmission networks at a state level, which will let individual states see when their peak has occurred and how many cases are forecast in the next few weeks.
More information: Pavel Skums et al. Global transmission network of SARS-CoV-2: from outbreak to pandemic, medRxiv (2020). DOI: 10.1101/2020.03.22.20041145