Samtools CRAMS in support for improved compression formats

August 15, 2014

Computer scientists at the Wellcome Trust Sanger Institute have released a major upgrade of Samtools, one of the most popular next-generation sequence analysis tools. The revised Samtools 1.0 enables researchers to easily compress, share and analyse genomic sequence data, reducing costs and supporting genomics research around the world.

The Global Alliance for Genomics and Health, in which the Sanger Institute is a partner, has been set up to enable researchers and clinicians to work together using standardised and efficient DNA sequence data formats to find the genetic variants responsible for disease. Samtools 1.0 supports this initiative by enabling researchers to read and write data in the new CRAM format, which was recently adopted by the Global Alliance, in addition to the existing SAM and BAM file formats for genomic sequence information.

The benefits of using CRAM are immediate: it gives a size reduction of 10-30 per cent. In addition, in a similar fashion to the JPEG format for images, CRAM supports much greater compression – up to a hundred fold – in "lossy" mode which preserves almost all of the important information.

"This major rebuild of Samtools reflects our commitment to supporting the global use of sequencing data," says Dr Richard Durbin, Head of Computational Genomics at the Sanger Institute. "Genome science worldwide relies on fast and efficient data analysis and storage, and Samtools 1.0 fulfils this need by supporting new sequencing and analysis technologies."

Samtools software is embedded in many bioinformatics pipelines and is the foundation of many thousands of genomic research papers. Since its creation in 2009, the program has been downloaded more than 225,000 times. Samtools 1.0 is freely available at http://www.htslib.org/. This new version was substantially rewritten to support the highly efficient genomic data format CRAM, add new functionality, and integrate more cleanly with other tools.

"Samtools 1.0 embeds CRAM into genomic data analysis pipelines and removes the need for additional processing," says Dr John Marshall, from the Sanger Institute. "This development paves the way for widespread uptake of this highly efficient file format in genomic research and will lead to lower storage costs."

The significant savings in storage that can be achieved are due to incorporating data compression techniques developed jointly by the Sanger Institute and the EMBL-European Bioinformatics Institute.

"It has been exciting to work on implementing CRAM into Samtools," says James Bonfield, at the Sanger Institute. "The great flexibility of CRAM has allowed a number of new compression techniques to be incorporated, which when combined with Samtools 1.0 will help to future-proof genomic data storage and analysis."

Explore further: Team develops tool to better visualize, analyze human genomic data

Related Stories

Team develops tool to better visualize, analyze human genomic data

August 3, 2014
Scientists at the University of Maryland have developed a new, web-based tool that enables researchers to quickly and easily visualize and compare large amounts of genomic information resulting from high-throughput sequencing ...

Frederick Sanger, double Nobel winner, dies at 95

November 20, 2013
British biochemist Frederick Sanger, who twice won the Nobel Prize in chemistry and was a pioneer of genome sequencing, has died at the age of 95.

A new, clinically validated diagnostic test for detecting BRCA1 and BRCA2 mutations

October 7, 2013
The recognition of a causal link between mutations in BRCA1 and BRCA2 genes and increased risk of developing breast and ovarian cancer has intensified the demand for genetic testing. Identifying mutations in these large genes ...

Recommended for you

Genome analysis with near-complete privacy possible, say researchers

August 17, 2017
It is now possible to scour complete human genomes for the presence of disease-associated genes without revealing any genetic information not directly associated with the inquiry, say Stanford University researchers.

Science Says: DNA test results may not change health habits

August 17, 2017
If you learned your DNA made you more susceptible to getting a disease, wouldn't you work to stay healthy?

Genetic variants found to play key role in human immune system

August 16, 2017
It is widely recognized that people respond differently to infections. This can partially be explained by genetics, shows a new study published today in Nature Communications by an international collaboration of researchers ...

Phenotype varies for presumed pathogenic variants in KCNB1

August 16, 2017
(HealthDay)—De novo KCNB1 missense and loss-of-function variants are associated with neurodevelopmental disorders, with or without seizures, according to a study published online Aug. 14 in JAMA Neurology.

Active non-coding DNA might help pinpoint genetic risk for psychiatric disorders

August 16, 2017
Northwestern Medicine scientists have demonstrated a new method of analyzing non-coding regions of DNA in neurons, which may help to pinpoint which genetic variants are most important to the development of schizophrenia and ...

Evolved masculine and feminine behaviors can be inherited from social environment

August 15, 2017
The different ways men and women behave, passed down from generation to generation, can be inherited from our social environment - not just from genes, experts have suggested.

0 comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.