Scientific research published in the current issue of the Journal of the American Medical Informatics Association (JAMIA) reports on a study of genetic variants that influence human susceptibility to peripheral arterial disease (PAD), made possible by leveraging electronic medical records (EMRs; also called EHRs or electronic health records). A team of authors from the Mayo Clinic Divisions of Cardiovascular Diseases and Biomedical Informatics and Statistics conducted the study and concluded that EMR-based data, used across institutions in a structured way, "offer great potential for diverse research studies, including those related to understanding the genetic bases of common diseases."
The authors, Iftikhar Kullo, MD; Jin Fan, MD; Jyotishman Pathak, PhD; Guergana Savova, PhD; Zeenat Ali, MD; and Christopher G. Chute, MD, DrPH, demonstrated the feasibility of leveraging EHRs to launch a genome-wide association study of PAD, which affects approximately eight million Americans 40 years old and older, and which includes 20 percent of the elderly (70+ years old) in the United States. According to the authors, PAD is associated with "significant mortality and morbidity, underscoring the necessity of a rigorous investigation."
The physicians used EHRs to confirm cases of PAD, and to identify phenocopies, i.e. mimics of atherosclerotic PAD. With patient consent, and the approval of Mayo's Institutional Review Board, the research team accessed electronic health records in a federated warehouse of patient data that Mayo Clinic has used since 1997—a database of more than eight million patients. Using the Mayo Enterprise Data Trust (EDT), the researchers extracted relevant clinical variables on study participants that could confound the association of genetic susceptibility variants with PAD.
Dr. Chute observed that the EDT "provides a scalable solution for clinical research, providing comparable and consistent data that can be employed in comparative effectiveness studies, outcomes research, or translational research as illustrated by this JAMIA paper."
In the study, PAD was defined as a resting/post-exercise ankle-brachial index (ABI) less than or equal to 0.9, a history of lower extremity revascularization, or having poorly compressible leg arteries. Controls were patients without evidence of PAD. Demographic data and laboratory values were extracted from EHRs. Medication use and smoking status were identified by natural language processing (NLP) of clinical notes.
"Although manual abstraction of medical records can provide high-quality data," the authors write, "for large studies such as genetic association studies, manual review of medical records can be prohibitively expensive and time-consuming. Our study demonstrates . . . .several significant advantages over traditional approaches to genomic medicine research by simplifying logistics, reducing timelines, and overall costs through efficient data acquisition."
In their statistical analyses, the researchers used metrics long recognized in the NLP and information-retrieval community—precision, recall, and F-measure—to evaluate EMR-based algorithms compared with manual medical record review. Most cardiovascular risk factors and co-morbidities were captured from the EMRs with an accuracy rate higher than 90 percent. The researchers analyzed age, sex, BMI, race, geographical distribution, risk factors, co-morbidities, smoking status, and medications.