Shakespeare and cancer diagnoses: how bard can it be?

July 24, 2013

Shakespeare's plays and cancer: two seemingly unrelated topics with an underlying common thread.

The techniques that and computer scientists use to analyse the Bard's works are also used in cancer diagnostic procedures – and it's all down to the quantification of subtle variations of attributes present in large amounts of data.

In last month's published collaboration in the journal PLoS ONE, we applied a simple and novel ranking method to a dataset involving plays of undisputed authorship from the Shakespearean era.

We ranked the frequency of words by playwrights John Fletcher, Ben Jonson, Thomas Middleton and William Shakespeare, testing all 55,055 unique words used in 168 plays.

The results of using this new method were very encouraging. For some authors, such as Shakespeare, the slight under-use of particular words provided better markers of individuation than over-used words. We found Shakespeare's four lowest ranked words to be:

  • all
  • to (infinitive)
  • now
  • ye

The last one was also among the top 20 lowest ranked scores for Jonson and Middleton, but interestingly, was the top highest score for Fletcher. His preference for the use of "ye" over the average of the plays of that do not belong to him is now very clear.

These are quantifiable markers that can objectively measure an author's creative mind at work.

The idea that variations on the use of words over time can give clues about psychological problems or even markers of depression in the work of suicidal poets has already been discussed.

But this simple idea for a new scoring method may also give expected dividends in other areas, such as diagnostics, medical algorithms and .

Information-based medicine

In a study from 2009, Shakespeare and other English Renaissance authors were studied using methods based on information theory (the scientific field that leads with the quantification of information).

They observed that Shakespeare's work seemed remarkable for its homogeneity on the probability of use of common words and for its closeness to overall average use of words at the time. This naturally triggers a central question:

Would it be possible to find some distinctive signatures of individual authors by looking at the fluctuations of the observed frequencies of words used?

So, you may be asking yourself:

Why would this be a question of interest for the analysis of biomedical data?

The identification of biological markers is critical for information-based medicine. Such biomarkers are quantitative indicators that can be objectively measured and indicate normal biological processes, the existence of pathogenic processes, or altered pharmacologic responses to a therapeutic intervention.

Biomarkers are needed for cancer diagnostics and early screening (for example, levels of the enzyme Kallikrein-3, also known as PSA or prostate-specific antigen, are often elevated in men with prostate cancer or other prostate disorders).

Biomarkers are central for the core aims of personalised medicine and the quest to individualise risk, identification, therapies and the post-treatment monitoring of possible recurrence.

But controversies exist about the use of a single biomarker (in fact, this is already happening even with established biomarkers such as PSA for prostate cancer) so current medical research advocates for finding panels of biomarkers.

Statistical scores are usually employed to rank and identify the best biomarkers when individually tested. But to identify panels it is important to find the best combination of biomarkers. Other mathematical methods are needed.

Our team uses combinatorial optimisation (the branch of computer science and discrete applied mathematics that deals with these optimal selection problems) approaches to do so, not only in cancer and the selection of therapeutic combinations but also in multiple sclerosis and in Alzheimer's disease.

Using panels of biomarkers it is possible to improve the classification accuracy of the tests, boosting sensitivities and specificities to approximately 90% as we have recently shown in studies in Alzheimer's Disease.

Finding the best fit

This is not the first time that combinatorial optimisation has been used at the University of Newcastle's Centre for Bioinformatics, Biomarker Discovery and Information-based Medicine (CIBM) both in cancer and in literature and linguistic studies.

In a different paper published in 2006, combinatorial optimisation methods were used to produce a consensus phylogenetic tree of 84 Indo-European languages. In that same study, we showed how to generate a classification of several different cancer cell lines.

Again, our approach was heavily based on combinatorial optimisation.

The application of these more sophisticated methods is necessary for personalised medicine as they can be used to subtype different types of cancers at the molecular level by analysing patterns of variations across different samples.

While our team's work concentrates on developing molecular signatures of disease states based on a combination of biomarkers (as opposed to single scores like the novel one used in our study) we also recognise the usefulness of this new score, presented in the analysis of Shakespeare's works, for a rapid preliminary analysis of large biomarker datasets.

Our team now routinely analyses large biomedical datasets with this new method. As in the Shakespeare study mentioned above, it has served to identify potentially mislabelled samples, outliers of a major class of interest of a disease, and other potential pitfalls identifiable and avoidable during early processing of the data.

For our institution our new contribution accounts as one of those success stories of collaboration across faculties and disciplines, a rare curiosity-driven basic research endeavour that generally does not get the nod from national funding agencies that only look to support translational medical research with simplistic definitions.

These "unthinkable quests" are vital to spin-off breakthrough translational research.

They need to be protected, supported and developed as computer science provides the core expertise that may lead to new scalable ways to address the tidal wave of data coming from the life sciences that may ultimately result in a blessing for your health.

Explore further: MET protein levels show promise as biomarker for aggressive colon cancer

More information: … journal.pone.0066813

Related Stories

MET protein levels show promise as biomarker for aggressive colon cancer

June 4, 2013
MET protein levels correlate strongly with epithelial-mesenchymal transition (EMT) phenotype, a treatment-resistant type of colorectal cancer and may be used as a surrogate biomarker, according to new research from The University ...

Refocusing the boom in biomarker research

July 27, 2011
An article in the current edition of Chemical & Engineering News, ACS's weekly newsmagazine, describes the trials, tribulations, and triumphs of one of the hottest pursuits in modern biomedical science — the search for ...

Text mining: Technology to speed up Alzheimer's biomarker discovery

November 8, 2012
New research proves that 'text mining' or using the power of computers to read the entire biomedical knowledge base, is a promising new tool in the search for Alzheimer's disease biomarkers.

New genetic test can predict man's risk of developing prostate cancer

February 8, 2013
Researchers in Japan have created a genetic test that will help doctors diagnose prostate cancer. When given together with testing for prostate specific antigen (PSA), a widely used diagnostic biomarker for prostate cancer, ...

Researchers identify genetic variants predicting aggressive prostate cancers

June 19, 2013
Researchers at Moffitt Cancer Center and colleagues at Louisiana State University have developed a method for identifying aggressive prostate cancers that require immediate therapy. It relies on understanding the genetic ...

Genetic test helps predict risk of prostate cancer recurrence

May 10, 2013
(Medical Xpress)—Prostate cancer ranks as the most common internal malignancy diagnosed in men in the United States, but often does not require extensive treatment.

Recommended for you

T-cells engineered to outsmart tumors induce clinical responses in relapsed Hodgkin lymphoma

January 16, 2018
WASHINGTON-(Jan. 16, 2018)-Tumors have come up with ingenious strategies that enable them to evade detection and destruction by the immune system. So, a research team that includes Children's National Health System clinician-researchers ...

Researchers identify new treatment target for melanoma

January 16, 2018
Researchers in the Perelman School of Medicine at the University of Pennsylvania have identified a new therapeutic target for the treatment of melanoma. For decades, research has associated female sex and a history of previous ...

More evidence of link between severe gum disease and cancer risk

January 16, 2018
Data collected during a long-term health study provides additional evidence for a link between increased risk of cancer in individuals with advanced gum disease, according to a new collaborative study led by epidemiologists ...

Researchers develop a remote-controlled cancer immunotherapy system

January 15, 2018
A team of researchers has developed an ultrasound-based system that can non-invasively and remotely control genetic processes in live immune T cells so that they recognize and kill cancer cells.

Dietary fat, changes in fat metabolism may promote prostate cancer metastasis

January 15, 2018
Prostate tumors tend to be what scientists call "indolent" - so slow-growing and self-contained that many affected men die with prostate cancer, not of it. But for the percentage of men whose prostate tumors metastasize, ...

Pancreatic tumors may require a one-two-three punch

January 15, 2018
One of the many difficult things about pancreatic cancer is that tumors are resistant to most treatments because of their unique density and cell composition. However, in a new Wilmot Cancer Institute study, scientists discovered ...


Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.