Justin Keen, Radu Calinescu, Richard Paige, John Rooksby: Big Health Data: Institutional And Technological Challenges

This paper has been published as: Justin Keen, Radu Calinescu, Richard Paige and John Rooksby (2013) Big data  + politics = open data: The case of health care data in England. Policy and Internet 5 (2) 228-243.

Justin Keen, University of Leeds
Radu Calinescu, University of York
Richard Paige, University of York
John Rooksby, University of St Andrews


This paper will discuss Big Data in the NHS in England, viewing it as a development of current policies and practices for managing large datasets. We argue that there are overlapping, and sometimes conflicting policy, managerial, academic and commercial interests in exploiting Big Data in health care. Each of the interests has its own aspirations for the integration and use of Big Data, but it is possible to identify three main objectives, each shared by two or more key interests. One is accountability, an objective shared with the Open Data movement. The second is the identification of opportunities for productivity improvements in the NHS. The third is commercial exploitation, involving the provision of NHS patient information to pharmaceutical and other firms, a development supported by the Prime Minister (Cameron 2011). It has been claimed that the value of Big Health Data could run into the hundreds of billions of dollars (Manyika et al 2011), realised through a combination of re-engineering health services and commercial exploitation.

Much of the discussion of Big Data to date has focused on the opportunities and problems associated with the manipulation of large datasets. In practice, however, data management cannot be separated from consideration of technological infrastructure and governance arrangements (Bowker and Star 1999). This can be seen clearly when an organisation responsible for ‘live’ data has to integrate hitherto separate large datasets. We use the example of the work of the NHS Information Centre for Health and Social Care to illustrate our arguments. The current Health and Social Care Bill proposes that the Information Centre will become a discrete legal entity (a Body Corporate) in 2013, and will be the central collection, processing and distribution point for health and social care data. It will have a duty to publish as much information as possible, subject to legal and regulatory constraints. It will also provide a data linkage service, linking together a number of large datasets, making them available to the NHS, information intermediaries and the research community.

The NHS Information Centre has to address a set of data, technology and governance challenges. Taking data first, there are challenges associated with the nature of health care information. There are widely recognised limitations with NHS datasets. Coverage is uneven, notably in mental health and social care services. Some important datasets are reasonably accurate and complete, but many others are chronically incomplete, and because many data are keyed in, prone to inaccuracies. Historically, data has been collected along functional lines, and it is difficult to use it to evaluate the performance of the NHS along key dimensions, such as the effectiveness of co-ordination of services. Information about the management of clinical risks (such as the risk of an older person falling), which is important for patients in preventing or minimising problems, and for the NHS for managing demand, are not routinely collected. When they are, as in the case of a relatively new indicator, on patients’ risks of developing dangerous blood clots (thromboses) in hospitals, they can involve excessive staff time to collect. While combining datasets may generate insights in some areas, key questions about the equity and efficiency of services simply cannot be addressed with existing datasets.

On the technology infrastructure, progress has been made with the development of data standards in the last two decades, but there is still a lack of standards compliance in some services (notably social care), of effective inter-operability between systems and of useful metadata to support important trends in health services. This limits the possibilities for automating the linkage of datasets in meaningful ways. For example, the site provides access to hundreds of health-related datasets, but these are of limited value to practitioners or researchers for these reasons. There is a further problem, which is that some legacy datasets do not have effective schemas, and cannot easily be manipulated. A move to a G-Cloud (Cabinet Office 2011) would necessitate solving technology challenges concerned with re-architecting enterprise systems for new and very different technology platforms.

Turning to information governance the NHS is subject to general legislation, including the Data Protection Act 1998. Over and above this, the Health Act 2006 and Health and Social Care Act 2008 define safeguards for accessing personal information without consent (creating an exception to the DPA 1998). There are also many internal regulations, including Caldicott guidelines, and guidance published by professional bodies. It seems reasonable to say that current arrangements within the NHS are defensive in nature, protecting the interests of the NHS and its staff rather than actively promoting the interests of patients. Neither do they resolve issues of ownership. Key clinical datasets are currently managed by a number of bodies, and subject to data sharing agreements with a range of parties. Access to those datasets will have to be negotiated before linkage can be undertaken. Looking forward, the current Bill does not include provisions that address Zittrain’s (2008) Privacy 2.0, even though Big Data hastens its arrival. Privacy and confidentiality are recognised as a major concern by the UK Government (O’Hara 2011), and by health care policy makers elsewhere (Kundra 2011), and are an area of active research in the information security community.

In summary, it is a moot point how far existing datasets, however manipulated, will support the achievement of the objectives relating to accountability, efficiency and commercial exploitation. It could be argued that the current governance arrangements are the central issue, and that key data and technology challenges are a consequence of weaknesses in those arrangements.


