Opinion: Reducing barriers to data access for research in the public interest—lessons from COVID-19
In a joint letter, Research Fellow Francesca Cavallaro, Associate Professor Katie Harron (both UCL GOS Institute of Child Health) and their colleagues explain how the COVID-19 pandemic has generated an urgency to improve data access.
The COVID-19 outbreak has sparked increased awareness of the importance of timely, system-wide data for examining trends and modeling different scenarios to inform policy response. The scale and speed of data access and use has been unprecedented in public health history. Pre-print articles sharing results before peer review have proliferated (with implications for research quality) and over 500 vaccine and treatment clinical trials have been initiated in record time. The entire economy of knowledge production related to COVID-19 has been accelerated, with the understanding that, if we wait for perfect information before acting, we will be too late. COVID-19 is providing valuable lessons on improving data access and the importance of using data for efficient and effective service response.
This situation contrasts sharply with the cumbersome processes usually faced by researchers using administrative (or routinely collected) health data to inform policy making on other topics, resulting from systems that are not purpose-built for research and summarized by four key obstacles.
First, the cost of using administrative data is prohibitive. For example, a non-commercial license for GP data through the Clinical Practice Research Datalink costs £75,000, and roughly double with linked socioeconomic and hospital data. Second, there are lengthy approval processes (up to one year) even for de-identified data posing little risk to confidentiality: researchers are required to demonstrate scientific quality and public benefit in applications to data providers and governance bodies, even when these important aspects have already been assessed by peer review and funders. While appropriate governance is important for protecting confidentiality and preserving public trust, approval processes are not streamlined and timelines do not reflect expectations of the public. Access to UK-wide data is particularly problematic due to different approval processes in different countries. Third, standard datasets are finalized several months after the time period covered, and inefficiencies in releasing data to researchers mean that it can take many months to receive them. These delays hinder the rapid production of results to inform policy in a timely way. Lastly, data access is inefficient: most data providers mandate the hosting of data in specified secure settings, often involving travel outside of usual research environments, with limited computing capacity, restricted hours and software. All these obstacles become exponentially greater for cross-sectoral linkage of administrative datasets, for which clear legal pathways for access may not exist.
Pre-COVID-19, these problems caused substantial delays to analyzing and reporting results on research in the public interest—delays which have been exacerbated since the start of the pandemic, due to the divergence of resources from non-COVID-related areas. Important research simply is not done when access is refused or when timelines jeopardize grant funding. Considerable opportunity costs are associated with non-use of health data and delayed evaluations of public programs, leading to a lack of evidence to inform more effective and equitable services, and to save lives (as well as money).
COVID-19 has highlighted the fundamental limitations of existing systems, and has sparked innovation for supporting data access. For example, the need for approval under Regulation 3(4) of the Health Service Control of Patient Information Regulations 2002 has been suspended by the Secretary of State for Health and Social Care for specific COVID-19 related research projects, as the public benefit from this research is clear. The Office for National Statistics has enabled temporary remote access to data during the COVID-19 lockdown, exercising additional flexibility within the scope of regulations, albeit with logistical challenges. Existing research studies such as UK Biobank have been granted new access to data sources. However, these changes are too little and too late. When we reach the "new normal," we should not return to business-as-usual, but instead take heed of lessons learned during the pandemic and rebalance the public benefits of wider data use against numerous existing barriers. We recommend the following measures:
Reduce costs of administrative data access to researchers through core government funding for data processing, linkage and curation (avoiding cost-recovery models). This would enable more researchers to address questions in the public interest. This is already possible in some sectors, as demonstrated by the Department for Education for England and Wales, and in Sweden, where two thirds of MONA data system costs are centrally funded.
Simplify approval processes for de-identified data access through standardized guidance on necessary approvals proportionate to identification risk. Approval processes should be streamlined across organizations, including for demonstration of public benefit.
Reduce data release delays through increased capacity and more specialized data providers. Independent, accredited data providers should be created, with expert processing and disseminating capacity, knowledge of how data are used in research, and understanding of how best to prepare and deliver datasets to researchers (emulating the successful Secure Anonymised Information Linkage (SAIL) Databank in Wales). Innovations that have allowed more timely data release during COVID-19, such as the OpenSAFELY collaborative or more frequent releases of GP and hospital data, should continue and be made available to researchers to allow timely research on many topics. Timely data release should not compromise quality, and organizations providing data should adhere to transparent and efficient response times.
Enable more efficient data use through remote systems that comply with data protection requirements. E-infrastructure must be improved to enable rapid data extraction and analysis.
In addition, better data collection should be established for community services and social care, and household-based cohorts, among others. This would have facilitated tracking of transmission patterns during COVID-19, and is equally important for a range of other public health topics.
Underpinning all the above, public trust and understanding is essential if researchers are to continue to use administrative data, and we should harness the surge in realization of the value of data for decision-making resulting from COVID-19. Public engagement and involvement should be included "by design and default" within systems for data access, via individual research projects and high-profile national engagement campaigns.
COVID-19 has demonstrated the value of timely data sharing, while highlighting flaws in UK data access systems that prevent agile and responsive research. Although these concerns have been communicated to the government previously, it was not until COVID-19 that the potential impact was realized and actions taken. However, the implications are no less critical for other public health topics. The potential risks involved in the use of administrative data will always need to be carefully considered, but COVID-19 has shown that increased capacity and political will can successfully simplify approval processes, reduce delays and enable more efficient data access whilst respecting data protection principles. Building on the substantial interest in health data—and appreciation of its complexities—arising from the pandemic, we urge the government and data providers to learn the lessons of COVID-19, and to work with the research community to build data access systems that are timely, resilient and responsive to changing local, national, and international contexts. Data providers need to fulfill their social license with the public to use administrative data from the public, to benefit the public. COVID-19 shows this can be done—it needs to continue.
This piece was first published as an open letter was sent to the UK Information Commissioner, Chief Medical Officers of the UK, and UK data providers, and signed by 374 signatories.
This article was published in the BMJ on 6 July.