Noisy data facilitates Dartmouth investigation of breast cancer gene expression

Researchers from Dartmouth's Norris Cotton Cancer Center, led by Casey S. Greene, PhD, reported in Pacific Symposium on Biocomputing on the use of denoising autoencoders (DAs) to effectively extract key biological principles from gene expression data and summarize them into constructed features with convenient properties.

"Cancers are very complex," explained Greene. "Our goal is to measure which genes are being expressed, and to what extent they're being expressed, and then automatically summarize what the cancer is doing and how we might control it."

Normally, it is difficult to apply computational models across different studies because the is "noisy," meaning that there are many factors that differ in the way is measured. To begin their analysis, Greene's team added more noise to the data and then trained a computer to remove the noise. To remove the noise, the computer had to learn about key underlying features of . "This approach of removing noise makes the models we constructed more generally applicable," Greene said.

Greene and the Dartmouth team studied DAs, which train computers directly on the data without requiring researchers to provide known to the computer, as a method to identify and extract complex patterns from . The model that the computer constructs can then be compared to previous discoveries to understand where data supports those discoveries and where the data raises new questions. The performance of DAs was evaluated by applying them to a large collection of breast cancer gene expression data. Results show that DAs were able to recognize changes in gene expression that corresponded to the cancers' molecular and clinical information.

"These techniques and findings will enable others to use the DAs to evaluate gene expression data in a variety of disease sites," reported Greene. "While noise in data is usually viewed as a problem, adding to data can actually be a good thing because it can help reveal the underlying signal. When we did this to analyze data from breast cancers, we found gene expression features that generalize across studies and represent important clinical factors."

Next for Greene's research team are more complex models that take multiple levels of regulation into account. Their goal is to develop methods that not only model data but that can automatically explain to researchers what the models have learned.

Explore further

New method for analysing RNA sequence data identifies new subtypes of cells

More information: … edings/psb15/tan.pdf
Citation: Noisy data facilitates Dartmouth investigation of breast cancer gene expression (2015, January 22) retrieved 18 April 2021 from
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Feedback to editors

User comments