Electronic medical records (EMRs) contain enormous amounts of information that could be used in clinical research and quality improvement. However, ethical concerns — such as patient consent and minimization of reidentification abound.

For those accessing records tied to clinics, hospitals, insurance companies and similar organizations, multiple databases are typically accessed to pull records. Utilization of this data often involves approval from an institutional review board (IRB) and documentation of HIPAA compliance.

Access can be messy, often requiring a computer programmer who can design an algorithm to extract the data. Even then, data is not always high quality. However, there are now viable alternatives that provide clinically-relevant data that has already been deidentified. Here is a look at a few:

The NIH Collaboratory Distributed Research Network allows researchers to collaborate with each other using various institutional and organizational EMRs, while also safeguarding protected health information and proprietary data. The Network's querying capabilities reduce the need to share confidential or proprietary data and significantly minimize the legal, regulatory, privacy, proprietary, and technical barriers associated with data sharing for research.

The i2b2 (Informatics for Integrating Biology and the Bedside) is an NIH-funded National Center for Biomedical Computing based at Partners HealthCare System, led by officials and researchers from both Harvard and MIT. The group is currently developing and piloting a tool that will allow clinical researchers to use deidentified existing clinical data (provided by a complex network of healthcare providers and academic health centers) for clinical research.

Kaiser Permanente's Research Program on Genes, Environment, and Health (RPGEH) is a data set based on the 6 million-member Kaiser Permanente Medical Care Plan. This resource links comprehensive EMRs, data on relevant behavioral and environmental factors, and biobank data from consenting health plan members.

The Million Veteran Program (MVP) is funded and managed by the U.S. Department of Veterans Affairs (VA). The aim of this program is to establish one of the largest databases of genetic, military exposure, lifestyle and health information. Any patient that comes in contact with a VA hospital is screened for participation.

Mini-Sentinel is a pilot project sponsored by the U.S. Food and Drug Administration (FDA). The original purpose of this database was to serve as an active surveillance system that monitors the safety of FDA-regulated medical products. However, the massive volume of electronic healthcare data will allow this database to also serve as a rich reservoir for potential research. As of September 2013, the database included quality-checked data from 18 partner organizations, leading to data on 153 million patients.

Still prefer to use your own network of EMR data? Dr. Richard H. Kennedy, vice provost and senior associate dean for research at Loyola University Chicago Stritch School of Medicine had this preference, but found his institution's EMR system cumbersome and challenging to work with.

As a result, he created a large-scale, easily-accessed clinical research database with deidentified, quality-checked data from Loyola University Health System's Epic EMR. He then created a supporting an Intranet website with predefined dashboards and ad-hoc query tools that allowed end-users to directly identify targeted patient cohorts, define and refine data categories, and identify frequent types of data requests.

This is one way institutions can make their EMRs more accessible to researchers and quality improvement experts, without jeopardizing quality or patient identification. This method also improves cost-effectiveness by removing the "middle man" (computer programmer) from EMR queries.