Bioinformatics: At the forefront, behind the scenes

Bioinformatics is an inherently interdisciplinary effort, combining the molecular biology revolution heralded by pioneers like Watson and Crick with advances in computer science that have placed previously unimaginable abilities at our fingertips. Reflecting the collaborative nature of bioinformatics, my colleagues and I at the A*STAR Bioinformatics Institute (BII) study everything from sequence analysis to image processing, working with academic, clinical and industry partners alike.

Nowhere has the synergistic power of bioinformatics been made more clear, however, than during the ongoing COVID-19 pandemic, where over 15 million people have been infected and over half a million have sadly lost their lives in the first six months of the pandemic. Amid the uncertainty and upheaval, bioinformaticians have been at the forefront of supporting the development of urgently needed diagnostics and drug repurposing, as well as behind the scenes carefully monitoring the virus genome for potentially dangerous mutations and tracing the virus evolution to study and help curb transmission.

From sequence to test kit

We first heard reports of an unusual viral pneumonia spreading in Wuhan in late December 2019. In the field of infectious diseases, this is usually not a cause for concern; experts are always on the lookout for 'the next big one' but most of the time, it turns out to be nothing. This one, however, did progress further and by the second week of January, the world was informed by authorities in China that it was caused by a coronavirus, the same family of viruses that caused severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome (MERS). We immediately sat up and took notice, and were soon called up by our colleagues from the Global Initiative on Sharing All Influenza Data (GISAID) to take action.

Originally designed to rapidly disseminate information about influenza viruses, GISAID was called upon by affected countries to make available its platform known for its unique sharing mechanism, so countries could share their virus sequences with unprecedented speed. It all started with five genomes from three Chinese labs.

The realization that we were dealing with something bigger than we had ever faced before didn't happen the very first moment the sequences of what later became known as SARS-CoV-2 were made publicly available. Instead, the realization grew gradually as the cases climbed to the hundreds and then thousands, spreading from the city of Wuhan to other Chinese cities.

From an average of three or four new sequences per day, the number of submissions quickly escalated to hundreds per day and we soon had to build a more robust database to store all 30 kilobase pairs of each sequence, along with important metadata like where the virus was isolated and when the sample was taken. GISAID could rely on programmers and scientists in different parts of the world working literally day and night for weeks and months. To deal with the influx, we also had to adapt our procedures for handling incoming genomes, and quickly develop computational tools to screen for mistakes and quality so my colleagues at A*STAR's BII and Genome Institute of Singapore (GIS) developed software tools that allow us to automatically flag genomes for issues such as illegal characters, as well as sort them into categories based on the quality of coverage. Colleagues in France from Institut Pasteur improved checking of metadata and colleagues in Brazil and Argentina also joined our team of up to 50 people in this huge curation effort to be able to respond at every hour of the day by covering all time zones. As of July 2020, the GISAID database has over 60,000 sequences and that number is growing by the day.

While tracking the difference in SARS-CoV-2 sequences can give us important insight into where the virus could possibly have come from and how it is evolving over time, the first point of action everyone had to do to prepare for the virus was to develop accurate and reliable diagnostic kits. My A*STAR colleague Dr. Masafumi Inoue at the Experimental Drug Development Center (EDDC) and Dr. Timothy Barkham at Tan Tock Seng Hospital immediately got down to work. Aided by Dr. Sidney Yee, CEO of the Diagnostics Development (DxD) Hub, and the entire A*STAR ecosystem, we were able to launch the Fortitude quantitative reverse transcription polymerase chain reaction (qRT-PCR) diagnostic kits by the first week of February.

To develop Fortitude, we needed access not just to one genome but reference multiple genomes—from this outbreak relative to previous viruses—to identify a region that is not only unique to the new virus but also relatively stable, so that it is common among all the current outbreak strains. This is where bioinformatics had a profound impact: not only did it enable us to make comparisons rapidly across the thousands of bases in the genome, but it also continues to help us make sense of the constant flow of sequences so that we can ensure that the diagnostic kits will continue to work even as the virus mutates.

Beyond diagnostics, once you have the genome sequenced, you can also use it to predict drug targets and start screening existing drugs in silico, greatly speeding up the search for much-needed therapies.

The meaning of mutations

Apart from addressing the urgent need for diagnostics and drugs and triggering vaccine development on the frontlines, bioinformatics also plays a crucial role in helping us understand how the virus is mutating. First of all, I would like to stress that mutations are normal, particularly for RNA viruses like SARS-CoV-2 which naturally make mistakes when replicating, making imperfect copies of themselves. But just because the virus is mutating does not necessarily mean that it is more dangerous.

Secondly, most mutations are small and are either bad for the virus or have no impact at all. To give you an analogy, if the entire virus genome is like a car, the mutated form of the virus would be the same car in the same color with only a tiny difference such as a single letter difference in the license plate. Just as this change in the license plate doesn't affect the performance of the car or make it more fuel-efficient, these mutations do not mean that the virus has become more or less virulent.

However, this "changed license plate" can tell us where the car came from and when it was registered. Similarly, mutations can give us a sense of how the different viral "cars" are related to each other, a piece of information that we can then use in contact tracing.

On very rare occasions, it is possible that there are mutations that do actually change the performance and fitness of the virus and it often requires multiple steps. For example, one such set of mutations in the evolution of SARS-CoV-2 is thought to have given it the ability to jump from animals like bats or pangolins into human hosts.

With real-time genomic surveillance enabled by platforms such as GISAID in combination with modern tools of bioinformatics, we can quickly detect these rare changes when they occur and judge if they affect diagnostics, treatments or increase virulence. For example, the virus that caused the outbreak in Europe had evolved to be slightly different from the original strain, such that it was not so well detected by the RT-PCR kit initially developed by colleagues in China based on the first outbreak genomes. As soon as we saw that change, we notified our colleagues and, armed with that information, they were able to quickly change their protocols and subsequently were able to fully detect the new European strains as well.

As the battle against SARS-CoV-2 continues to be waged across the globe, bioinformaticians around the world are racing against the continually evolving virus. By following its genomic evolution, we hope to catch up with it or sometimes even be one step ahead.

Provided by Agency for Science, Technology and Research (A*STAR), Singapore

Bioinformatics: At the forefront, behind the scenes

From sequence to test kit

The meaning of mutations

The fast track to a 'Fortitude Kit' for rapid COVID-19 diagnosis

Blood test finds knee osteoarthritis up to eight years before it appears on X-rays

Using stem cell-derived heart muscle cells to advance heart regenerative therapy

Robotic nerve 'cuffs' could help treat a range of neurological conditions

Nanomaterial that mimics proteins could be basis for new neurodegenerative disease treatments

Using AI to improve diagnosis of rare genetic disorders

Researchers create an AI-powered digital imaging system to speed up cancer biopsy results

With hybrid brains, these mice smell like a rat

Gene linked to epilepsy and autism decoded in new study

Researchers find pregnancy cytokine levels impact fetal brain development and offspring behavior

Study finds biomarkers for psychiatric symptoms in patients with rare genetic condition 22q

Clinical trial evaluates azithromycin for preventing chronic lung disease in premature babies

Scientists report that new gene therapy slows down amyotrophic lateral sclerosis disease progression

Analysis identifies 50 new genomic regions associated with kidney cancer risk

Illusion demystifies the way vision works: Experiments imply brightness perception occurs deeper in brain than thought

How buildings influence the microbiome and human health

Biomarkers identified for successful treatment of bone marrow tumors

Neuroscientists investigate how the target of an arm movement is spatially encoded in the primate brain

Homelessness found to be a major issue for many patients in the emergency department

Donate and enjoy an ad-free experience

Bioinformatics: At the forefront, behind the scenes

From sequence to test kit

The meaning of mutations

Let us know if there is a problem with our content

Thank you for taking time to provide your feedback to the editors

Donate and enjoy an ad-free experience

Share article

E-MAIL THE STORY