Data is intrinsically intertwined with our day-to-day lives, from mobile maps and GPS to energy consumption and online shopping we live in a world which is increasingly hard to live in normal society without data in our lives. It’s everywhere, and for many of us our working lives are focused on understanding data. However, the rate of change in the world of data, such as sources, uses, and governance is swift with decisions being made every day, affecting our lives, our businesses, our government, often in ways which are obscure and known only to specialists in that area, so subtle that we hardly notice the changes in our everyday life.
A wider perspective can often spark innovation for those open to seek it.
Data Research, Access and Governance Network (DRAGoN) is an interdisciplinary research cluster aiming to bring together researchers and practitioners from academia, think-tanks, industry and government. Recognising that effective data use and governance requires contributions from many different professions: ethicists, statisticians, computer scientists, psychologists, economists, management scientists, their aim is to create an environment for discourse which can bring differing perspectives together for the wider benefit. They see the application of theory to practice as essential to the ethos of the group.
Research from the DRAGoN aims to create synergy between different research interests to build a sustainable effective research network embracing all aspects of the role, value, use and management of data. The goal is to create research that enables improved decision making in public life, taking a holistic view that encompasses the providers of sensitive data, the designers/appliers of technology to preserve privacy, and responsible analysts and end-users. To surmount the limitations of current perspectives, interdisciplinary research is needed into how we can holistically understand
- who contributes their sensitive data, and who has access to it
- what is collected and deemed ‘disclosive’, or ‘acceptable risk’
- when can training and algorithms reduce disclosure and improve use; and
- why – the ethical and societal context for data collection and use.
The following projects showcase some of the work DRAGoN are currently involved with.
Project 1: QUACK: Qualidata Use And Confidential Knowledge
Led by Elizabeth Green, Lecturer in Economics and DRAGoN member, Qualidata Use And Confidential Knowledge (QUACK) aims to explore methods for preserving confidentiality in qualitative research results.
When research data consists of personal, identifiable information there is a risk that publishing analyses will inadvertently release confidential information about data subjects.
Quantitative data has well-established practices to manage this risk. As well as statistical theory, there are widely used practical guidelines and teaching materials. Research in Statistical Disclosure Control (SDC) was mostly sponsored by national statistics institutes (NSIs), which provided both the motivation and the market for the research.
In contrast, there are almost no guidelines for qualitative data. Researchers working with qualitative data must trust their own judgement and experience, often without any training or mentoring. As well as increasing the risk of confidentiality breach, this is inefficient, as each generation must learn the same lessons for itself.
One reason for this is the sheer range of qualitative data: ethnographic studies, social media analyses, interviews, videos, clinical case studies, court records…Guidelines in one field, may be meaningless in another. A subsidiary reason is that there is no equivalent of the NSI network to sponsor qualitative data research.
We think this needs to change.
Project 2: Expanding Research Excellence – Output Statistical Disclosure Control (EREOSDC)
The aim of this proposal is to develop research and practice material for output statistical disclosure control (OSDC) and statistical output checking.
The market for output checking, and UWE capability is growing, and not just because of the proliferation of ‘trusted research environments’ (TREs) which insist upon output checking. When UWE was commissioned to design on output checking course for ONS in 2019, the expectation was that 10-15 people a year, across the UK, would need training; but in 2021 alone the UWE team have already trained 50 people, and anticipate a strong demand in 2022 from Europe, Canada and Australia.
While TREs are currently the main customers for output checking, there is a vast untapped market of organisations producing statistics from their own confidential data – including, for example research students. Information and training is almost non-existent for this group.
Whilst there is a large literature on checking official tables produced by national statistical institutes, for analytical users the theoretical underpinning is thin and the guidance is largely drawn from a single report in 2010, which drew heavily on ideas drafted by UWE staff. Whilst UWE staff have continued to develop theory, practical guidelines and technological solutions, this has been in an ad hoc way. The aim of this project is to review the whole landscape and develop new theory, new guidelines, and new tools to help practitioners manage outputs more securely and more efficiently.
One are rarely discussed is the public’s view on all this. Is there public concern about the residual risk produced by statistical outputs? The universal answer is “no, it’s too complicated/niche/technical”; yet many TREs highlight output checking as one of their ‘five safes’ of security. UWE’s Science Communication Unit and UWE Philosophy will help DRAGoN to explore, for the first time anywhere we think, what the public view actually is. This will lay the foundation for a wider programme of work on public attitudes to data ethics, perceptions of data governance, (pseudo-)anonymisation, data storage and sharing, mitigation of risks, data and cyber security.
Specifically, the project has five workstreams:
- Statistical research – to develop underlying knowledge base (lead: Felix Ritchie)
- Webinars/website – to produce material and publicise the project (lead: Lizzie Green)
- Manual – to develop the core resource (lead: Felix Ritchie)
- Tech solutions – to expand upon two existing potential technological solutions developed at UWE (lead: Jim Smith)
- Public engagement – to collect evidence on public attitudes (lead: Paul White)
The slides from the kick-off workshop in November 2021 are now available HERE
Project 3: Guidelines and Resources for Artificial Intelligence Model Access from Trusted Research Environments (GRAIMatter)
The GRAIMatter project will investigate the risks associated with ML modelling in TREs, and aims to develop metrics, good-practice guidelines and reference texts for managing the risks. Led by the University of Dundee, with Jim Smith (UWE lead), Felix Ritchie and a team of analysts from UWE.
Machine learning (ML) models are growing in popularity, particularly in health where they can play an important role supporting clinical and operational practice. These models are trained to, for example, recognise early stage carcinomas or predict demand for a service to improve resource scheduling. ML models are ‘trained’ by providing them with a large number of examples, and by allowing the model to identify for itself the relationships that matter. This training data can comprise sensitive information – such as electronic health records or confidential business data – and can cover millions of individual records.
Trusted research environments (TREs) can play a key role in developing machine learning (ML) models where the training data is confidential. The TRE provides a secure space to develop algorithms, with only the model being released. Although no data is deliberately released from TREs, there is a possibility that the models could inadvertently reveal confidential information. TREs have well-established processes to manage disclosure control in traditional statistical outputs, but ML models present several new challenges, including uncertainty over what even counts as disclosure. Some data sources more commonly used in ML, such as images, offer incentives for malicious attacks which are not relevant in regular statistical modelling.
Project 4: Wage and Employment Dynamics
Led by Professor Felix Ritchie and Damian Whittard, this project links data from various official surveys and administrative datasets, with the objective of providing new insights into the dynamics of earnings and employment in the UK. The project aims to create the basis for a sustainable, documented ‘wage and employment spine’ with the potential to fundamentally transform UK research and policy analysis across a vast range of topics. Alongside the creation of data infrastructure, the project will also generate research findings of direct interest to policy makers. Public benefit will be maximised through the provision of high-quality metadata and training for users.
To find out more about the work of undertaken by the DRAGoN Team follow the links below:
This research cluster is funded through the Expanding Research Excellence scheme at UWE Bristol. The scheme aims to support and develop interdisciplinary, challenge-led research across the University. It is designed to bring together research clusters or networks that will work together to respond to challenges (local, regional, national, global) aligned with major research themes.