Past DRAGoN Webinars: The importance of solidarity in Data sharing.

Posted on

DRAGoN runs a very popular webinar series that brings academics together with practitioners and members of the public facilitating conversation to flow from theory to practice. The next run of these webinars are due to start at the end of February 2023.

In the first of three blogs looking back on the Data Ethics and Governance webinar series so far Francesco Tava reflects on past seminars and the impact they have had on his work . In this post Francesco focuses on the webinar closest to one of his main research areas – data ethics.

Francesco starts by explaining “The idea of this series is to overcome the typical academic boundaries by involving professionals working in various sectors of data governance in a discussion around ethical concepts and problems arising from the use, access and sharing of data.” From the webinars we have held so far, one that aligns to my current research interests was the discussion on solidarity-based data governance.” with Barbara Prainsack (University of Vienna)

Francesco goes on to say that often the first and only consideration underpinning data governance is how to defend privacy. However, there are other principles which are equally important when it comes to data access and sharing that tend to fall by the wayside. One of which being solidarity. How can we rebalance this approach to include principles such as solidarity when assessing the practicalities and risks involved in data management?

‘’Data, not just
something completely
 impersonal, objective material…
Somehow describes
 who we are”

The talk that Barbara Prainsack held and the discussion that followed was very valuable. Not only did this discussion investigate a series of issues stemming from data management, but it also envisaged how a solidarity-based approach can provide possible solutions to these problems.

You can listen to the whole conversation for a deeper dive into Francesco’s reflection of this webinar here

You can listen to past DRAGoN talks here.

New linked dataset available to provide insights into earnings and employment in Britain

Posted on

The ADR UK funded Wage and Employment Dynamics (WED) initiative that aims to provide new insights into the dynamics of earnings and employment in Great Britain has made a new dataset available. Accredited researchers can apply to use the de-identified Annual Survey of Hours and Earnings (ASHE) – 2011 Census linked dataset from the Office for National Statistics (ONS) Secure Research Service.

Using the ASHE – 2011 Census linked dataset, researchers can explore how factors such as gender, ethnicity, disability and migration affect individual’s wage levels and pay progression. Access to this data will provide fresh insights into things such as the experiences of young people entering the labour market, job mobility, and career progression to retirement. This will enable policymakers to make better-informed decisions to improve the experiences of people in employment.

About the data

The ASHE dataset is an annual survey based on a 1% sample of employee jobs, drawn from the Pay As You Earn (PAYE) register, and conducted by the ONS. This de-identified dataset contains information on employees’ earnings, paid hours, occupation as well as a limited number of personal characteristics: gender, age, and residential location.

To expand the number of de-identified, personal characteristics that can be observed for employees, the ASHE dataset has been linked to the 2011 Census. The 2011 Census dataset includes characteristics such as educational and vocational qualifications, health, disability, and household circumstances for individuals in England and Wales.

The 2011 Census has been matched to all employees observed in ASHE in either 2010, 2011 or 2012. The ASHE – 2011 Census linked dataset therefore covers the period 1997-2020, but only includes individuals who were in 2011 Census and were able to be matched to individuals in either ASHE 2010, 2011 or 2012. It also includes some enhancements to the basic ASHE data, such as minimum wage rates and survey dates.

The linked ASHE – 2011 Census dataset allows researchers to examine pay and career progression of a cohort of individuals who were in employment around the time of the 2011 Census. This data is linked and made available under the provisions of the 2017 Digital Economy Act, which provides a legal gateway for researchers to access government data in a secure way. For more information about the linked dataset, see the ASHE – 2011 Census user guide.

Huge potential for this dataset

This dataset will help to fill a crucial evidence gap in information about the experiences of individuals in the labour market. Importantly, for example, ASHE – 2011 Census will enable researchers to understand the underlying causes of intersectional pay gaps and provide a deeper understanding of how demographic factors have informed labour market transitions.

The dataset could help answer a range of research questions to inform policy, such as:

  • How do personal characteristics such as household structure affect wage progression?
  • How does wage progression differ depending on characteristics such as gender, disability, or ethnicity?  
  • What role do employers play in wage inequality?  
  • What are the returns on investment from education?
  • What is the relationship between migration and the labour market?

Tim Butcher, Chief Economist, Low Pay Commission said,

The WED project addresses weaknesses in our evidence base – improving the quality of longitudinal earnings data and extending coverage to a broader range of characteristics – that should enable researchers to give new and innovative insights into the wage and employment dynamics of the lowest paid.

How to access the dataset

The ASHE – 2011 Census dataset is available from the ONS Secure Research Service. Access to the dataset is managed securely in line with the Five Safes framework. Accredited researchers can apply to access the data by submitting a project application. For more guidance on submitting project applications, visit the ONS Secure Research Service.

The ASHE – 2011 Census linked dataset was produced as a collaboration between ADR UK, the ONS and the Wage and Employment Dynamics project team. The project team is led by Professor Felix Ritchie and Damian Whittard of the University of the West of England, and involves partners from UCL, Bayes Business School, the University of Reading and National Institute of Economic and Social Research.

Teaching researchers about data protection law: a terrible idea

Posted on

Written by Elizabeth Green (UWE Bristol), Felix Ritchie (UWE Bristol) and Amy Tilbrook (University of Edinburgh)

Many researchers in the UK working with confidential data attend the ‘Safe Researcher Training’ course (SRT). This training course was commissioned by the Office for National Statistics (ONS) in 2017 and is (so far) the only accredited training programme for researchers under the Digital Economy Act 2017. Attendance is compulsory for all those using the ‘safe havens’ or ‘trusted research environments’ run by the ONS, the UK Data Archive, Health and Social Care NI, the Northern Ireland Statistics and Research Agency, HM Revenue and Customs, various big data research centres, and those using certain public sector datasets in Scotland,

The SRT was designed using novel principles of confidentiality training, recognising that researchers are human: intrinsically motivated (i.e. threats of fines and punishment don’t work); self-interested but also well-intentioned; mostly quite bright but occasionally very foolish; and, crucially, able to engage in nuanced discussions about the safe and effective use of confidential data. Over three thousand researchers (from academia, government and the private sector) have been through the training and taken the test since 2017, and the SRT has become a reference point for basic training in research good practice, with materials being adapted for use in Europe, North America and Australasia. The training is delivered by several UK organisations; each has taken a slightly different approach to delivery of the core material but the learning outcomes (as measured by test results) are broadly similar, suggesting the core training material is robust to the presentation style of different trainers.

One omission from the SRT is a detailed discussion of data protection laws: which laws govern access, what the researcher’s formal rights and responsibilities are, what penalties can be incurred. This is in direct contrast to many data management and governance training courses; indeed, the SRT’s predecessor included a before-and-after ‘quiz’ parts of the Data Protection Act 1998 relevant to research. This omission of a detailed exposition of the applicable law(s) causes concern amongst organisations who require users of their data to take the SRT and pass the test.

The rationale of this position is: how can you expect researchers to obey the law if they don’t know what the law is? Researchers should know

  • which laws are relevant, including common law duties of confidentiality
  • the lawful basis of access
  • the specific limitations of each law in relation to their data
  • the consequences of breaches (for which read: fines and jail)

By outlining these legal specifics, researchers should have no doubt as to their responsibility. This also transfers responsibility from the data holder: those who breach laws (intentionally or, more likely, accidentally) cannot attribute the breach to a lack of knowledge around these laws.

This approach arises naturally from the ‘defensive’ approach to data governance typically taken by data holders, aiming to ensure that all potential risks are covered before release is considered. Intellectually, the foundation of this idea is economic models of rational decision-making: data has value and so may be misused, but only if costs exceed benefit. Providing data users with a clear legal basis, evidence of monitoring and control, and knowledge of the severe penalties for transgressing the limits, researchers stay on the right side of the line. Moreover, if they do transgress, the data user has a solid foundation for civil or criminal prosecution, which in itself should increase compliance.

The trouble with this approach is that it lacks any substantive evidence to support it. In contrast, there is a great deal of well-founded evidence, particularly from psychologists, to suggest the opposite. This evidence consistently aligns with our psychological understanding of human behaviour.

Alongside the fundamental misunderstanding of human nature, simplistic training assumptions about legal liability are also likely to be misaligned with real life case law and may miss nuances of legal rules and pathways resulting in further confusion. Most importantly, focusing on legal liability ignores the fact that genuine breaches of confidentiality in research are vanishingly small and very difficult to prove. In contrast, breaches of operating procedures (but not law) are not very unusual, and are generally easy to prove.

Some data controllers see this communication as part of being a responsible data controller – even if the attendees don’t register the detail, they remember the message about legal conditions being important and costly to break. So what is the harm in including this in the course?

The main argument against this is that it is counter-productive in the context of the SRT. The SRT is designed to build trust and community engagement. An assumption of lawlessness and the highlighting of inappropriate behaviour disrupts that message, by implying “I don’t trust you, so here’s what will happen if you put a finger wrong”. This weakens the community message.

In contrast, SRT is designed so researchers know what constitutes safe behaviour with data in most research cases. Researchers are shown how operating procedures serve to protect researchers from accidentally breaking the law, and how to actively engage with them. Researchers are encouraged to discuss the compromises involved in designed data access systems, and so develop a sense of shared responsibility. It is not a textbook for how to complete a successful data application or project to each data controller/research centre, but it is a way to approach such tasks so both parties are satisfied that all risks have been covered. Therefore the SRT approach to legal aspects is grounded in three questions:

  • What do researchers need to know to behave safely?
  • How and what do they learn?
  • How do we build a community identity so that when things do go wrong we co-operate to resolve them?

Focusing on the above three questions allows researchers to actively reflect on their own actions and conceptualise their responsibility to the project and the data. Moreover, in not examining and outlining specific laws the material retains relevance even if law changes, or if the material is used in different countries or for international projects (as it has been). There are evidence-based answers to these three quesitons, and in our next blog we explore them further

Finally, for those who still believe that threats are helpful, it is worth noting that criminal sanctions are not seen as credible. The lack of successful prosecutions, the researcher’s own self-belief that they are not law-breakers, and the obvious disincentive for data holders to publicise a data breach means that criminal sanctions become a straw man, and the teacher’s authority is damaged. In contrast, the SRT focuses on ‘soft’ negatives (reputation, kudos, access to funding, access to data, employability), and emphasises the difference between honest mistakes and selfish behaviour. As well as being more meaningful to researchers, these also align to the ‘community spirit’ being developed. The consistency of the message on this topic is as important as the contents.

DRAGoN Seminar: Sphere transgressions: Risks and Benefits of the Digitisation of Health

Posted on

The digitalization of health and medicine has engendered a proliferation of new collaborations between public health institutions and data corporations, such as Google, Apple, Microsoft and Amazon. Critical perspectives on this “Googlization of health” tend to frame the risks involved in this phenomenon in one of two ways: either as predominantly privacy and data protection risks, or as predominantly commodification of data risks. In this short talk, Prof. Tamar Sharon (Radboud University) discussed the limitations of each of these framings and advanced a novel conceptual framework for studying the Googlization of health beyond (just) privacy and (just) market transgressions. The framework draws on Michael Walzer’s theory of justice and Boltanski and Thévenot’s orders of worth to advance a “normative pragmatics of justice” that is better equipped to identify and address the challenges of the Googlization of health and possibly of the digitalization of society more generally. 

Angela Daly then addressed socio-legal issues pertaining to healthcare data, drawing on political economy perspectives to consider the risks and benefits of healthcare data use, taking as an example her current collaborative research as part of the UKRI DARE GRAIMatter project on AI model export from Safe Havens and Trusted Research Environments (TREs).

You can watch the full recording here;

Speaker bio’s;

Tamar Sharon is Professor of Philosophy, Digitalization and Society, Chair of the Department of Ethics and Political Philosophy and Co-Director of the Interdisciplinary Hub for Digitalization and Society (iHub) at Radboud University, Nijmegen. Her research explores how the increasing digitalization of society destabilizes public values and norms, and how best to protect them. She studied History and Political Theory at Paris VII and Tel Aviv Universities and obtained her PhD on the ethics of human enhancement from Bar Ilan University. Her research has been funded by the ERC and the Dutch Research Council. Tamar is a member of the European Commission’s European Group on Ethics in Science and New Technologies.

Angela Daly is Professor of Law & Technology at the University of Dundee, with a joint position between the Leverhulme Research Centre for Forensic Science (LRSFS) and Dundee Law School. She is a socio-legal scholar of the regulation and governance of new (digital) technologies. She is the author of Socio-Legal Aspects of the 3D Printing Revolution (Palgrave 2016) and Private Power, Online Information Flows and EU Law: Mind the Gap (Hart) and the co-editor of the open access collection Good Data (INC 2019). She is currently chairing an independent expert group for the Scottish Government on Unlocking the Value of Public Sector Personal Data for Public Benefit, which aims to produce recommendations and guidance for public sector actors in Scotland on allowing access to their datasets.

UWE staff appointed to help ESRC plan its data infrastructure strategy

Posted on

The Economic and Social Research Council (ESRC), the body that allocates and oversees social science research funding across the UK higher education sector, will face some significant decisions in data infrastructure and services over the next few years: several of its major investments are due for re-tendering, while others are already in the process of restructuring. At the same time UK Research and Investment (UKRI) is reviewing the wider investment landscape.

As a result the ESRC has begun a major exercise to review the research data infrastructure and services landscape. This project began In August 2021, with a public engagement exercise to identify key issues. This year, ESRC advertised two Future Data Services (FDS) ‘Strategic Fellowships’, and we are pleased to announce that two UWE staff, Elizabeth Green and Felix Ritchie from the Data Research Access and Governance Network (DRAGoN), were successful in bidding for the roles.

This is a great opportunity for UWE: DRAGoN staff are widely involved with all aspects of data access and governance, in the UK and abroad, but this will provide Felix and Lizzie with a unique insight into the strategic decision-making process for UK research investments; and they in turn will be using their expertise and networks to help ESRC design and evaluate a data services infrastructure for the social sciences that will take on board best practices, and challenge ways of thinking.

Professor Ritchie notes that “The UK starts from a strong position, with a long track record of successful investment in data services, and thought leaders across the data landscape. But that landscape continually changes, and although we do many things well in the UK, there are also many examples from other countries of doing things better.”

Some of the gaps are about co-ordination and communication: for example, how can we better share good practice in data governance or researcher training? Others are about adapting the experience of others to the UK: for example, what can we learn from other countries about creating a default-open model of data accessibility and sharing? And some gaps are where we have to fundamentally (re)think basic concepts: how do we put a value on effective data services when we can’t even put a meaningful value on the data itself?

These aren’t straightforward problems, or we wouldn’t need a two year strategy development period. But they are – or will have to be – solvable, and the benefit of getting it right will be felt across the UK research community, as well as in other countries.”

The ESRC commented “ESRC is delighted to make this award.  With ongoing transformations in the data services landscape, this is an exciting time to be undertaking our Future Data Services strategic review. We look forward to working with Felix Ritchie and Elizabeth Green who will provide a very valuable contribution to this review”.

What are the output disclosure control issues associated with qualitative data?

Posted on

Green paper

Elizabeth Green1, Felix Ritchie 1, Libby Bishop2, Deborah Wiltshire 2, Simon Parker 3, Allyson Flaster 4 and Maggie Levenstein4

 1The University of the West of England, 2GESIS, 3DKFZ German Cancer Research Center, 4University of Michigan


When carrying out research with confidential quantitative data, there is much support for researchers. There is ample advice on best practice for collecting (remove identifiers as soon as possible, only collect statistically useful information), a vast literature on how to reduce the risk in microdata (swapping, top coding, local suppression, rounding, perturbation, …), and a small but effective literature on how to prevent statistical outputs from the residual disclosure risk (eg a combination of tables showing that a one-legged miner in Bristol earns £50k a year).

For qualitative data, there is much less guidance. At the collecting/storing stage there is clear good practice (such as removing direct identifiers), although it may be hard to separate out analytically vital information from contextual information. In particular, the trade-off between anonymization and fitness for use may be much sharper for qualitative than quantitative data. Improvements in natural language processing (NLP) have enabled the development of anonymization tools in the UK and in Germany (e.g., QualiAnon) for qualitative data. However, when producing analyses there appears to be little or no general guidance on output disclosure control (ODC), and researchers are required to rely on informal advice and rules of thumb for good practices. This challenge is exacerbated by the wide variety of genres of qualitative data which makes guidance difficult to generalise.

Why the lack of practical guidance for output checking of qualitative data when there is a well-established set of guidelines for quantitative data? In one perspective, the lack of guidelines is not surprising. Guidelines for quantitative data were almost exclusive developed to meet the needs of national statistics institutes (NSIs), and thence filtered down to trusted research environments (TREs, secure research facilities usually specialising in quantitative data). Outside of NSIs and TREs, knowledge of output disclosure control is very limited, not even making it onto the syllabi of research methods courses. In this context, perhaps it not surprising that there are no guidelines for qualitative data: guidelines for quantitative research only appeared because of economies of scale and scope, and have remained largely in the environment in which they were developed.

The need for qualitative data ODC guidelines has five drivers. First, there is a greater awareness of the need to maintain confidentiality, driven by legislation and regulation. Every journal has to trust that researchers have anonymized enough/not too much, but no metrics exist for how to assess this. Second, the lack of consistent guidelines means each generation of researchers must develop their own rules, which is inefficient and increases the likelihood of error. Third, the increased used of NLP tools has increased the number and types of researchers who are working with qualitative data.  Finally, the development of TREs offers great opportunities for very detailed, unredacted qualitative data to be shared easily whilst ,maintaining security; but this must be supported by clear disclosure guidance for outputs. Whilst most TREs have a policy of checking outputs for residual risk, it is not cleared whether the skills, resources and processes exist to do this for qualitative data.


UKDA provides some guidelines and also a tool for anonymising qualitative outputs; however the approach focuses on removal of direct identifiers and does not address nuance or contextual identifiers. The CESSDA Data Management Expert Guide provides a worked example of transcript anonymisation. 

Kaiser (2009) outlines deductive disclosure in which the contextual nuance of a situation allows for an individual to be identified. Kaiser suggests that in order to address this issue, researchers should discuss the use of the research with participants, including describing in the consent to participate how the data will be made available to researchers, the types of research that are permissible, and the protections that will be in place for both the original research team and secondary analysts. However, this may not be possible with some types of data, and this also assumes that the discussion leads to genuinely informed consent rather than meeting a procedural tickbox.

The problem

ODC of quantitative data is conceptually straightforward. While quantitative data may be very highly structured (eg multilevel multiperiod data on GPs, patients and hospitals) or highly unstructured eg quantitative textual analysis), all quantitative data can be seen, ultimately, as tables of numbers used to produce summary data. The same ODC rules can be applied in all cases.

In contrast, qualitative data are varied in both content and structure; examples could be

  • Interview recordings/transcripts
  • Written responses in surveys  
  • Psychiatric case studies
  • Videos and images
  • Ethnographic studies
  • Court records
  • Social media text

In each of these cases, protecting confidentiality may require different solutions. In psychiatric case studies information may remain identifiable when published, but informed consent is used to agree to the higher level of re-identification risk. In interview responses, redaction may be a very effective response; in videos, pixilation. In court records and social media, the semi-public nature of the source data may cause difficulties particularly around de-identification. Future technology is likely to throw up more options, such as digital behaviour data, or currently unimaginable data types.

Approaches to solutions

Given the range of qualitative data types, it seems unlikely that universal rules could be developed. However, there may be ways to develop general solutions (frameworks?)

  • Method-specific solutions
  • Data type-specific solutions

There may also be solutions which involve both input controls (eg consent) as well as output (redaction) methods. This may allow us to sidestep the question of what is permissible, but it does not address what is ethical to disclose.

If considering types of qualitative data output, it may be useful to consider where the value is generated. For example, e.g. in a video recording, nuances of the subject’s body language may be more important than the words; if so, redaction of text is possible, but pixilation isn’t. Understanding the research value may direct outputs of the same type towards different solutions.  Thus it is important to distinguish between disclosure of “raw” data that researchers interact with and disclosure of outputs that are available to the general public in an unregulated environment. It is also important to focus on fitness for use, as certain kinds of disclosure control may degrade data quality but in ways that do not affect its use for certain types of analsysis, while making the data essentially worthless for other types of analysis.

Guidelines should address the different types of data, the accessibility of the data, and the intended use of the data in order to develop broader organising principles.

There may also be a need to create some definitions to provide a common language for discussing risks and solutions.

Next steps

Given the lack of consensus as to how to ensure safe outputs from qualitative analysis, the most productive first step may be a webinar with a credible global audience of interested parties to explore some of the issues raised here. Ideally, the audience would include both researchers working with a variety of qualitative data types and some data protection and confidentiality specialists.

The aim of the webinar would be to develop a programme of work – the initial step in this programme could be the formation of study groups each focused on a particular type of data. These groups could then report back during a second workshop. In addition the workshop could consider how to share guidelines amongst the research community i.e. embedding good practice into Research Methods courses, or Data Management Plans, with the intent of avoiding following the quantitative route of concentrating disclosure control training in limited environments. It may also be helpful to explore funding opportunities if this looks to be a significant programme of work.

The workshop will take place on Friday 10th December 15:00-17:30 via Microsoft teams

To register for the event, please click here

Any questions and queries please contact Lizzie Green

Legal bases and Using Secure Data for Research

Posted on

Researchers requesting microdata (individual records) from data centres or data access panels are usually required to describe the legal basis for their use of the data. This is because data controllers and processors need to have a legal basis documented for each use of data under UK GDPR. 

However, researchers are usually unaware of legal bases. The form fields may be left blank or, more often, filled with vague answers that might not fulfil what the access panel are requesting. This creates an inefficient process where support teams, panel managers and researchers are engaged in a back-and-forth to get the required information in the right box.

Underlying the current application form dance is the hot potato of responsibility. Someone needs to decide whether the data can be legally processed under which legal basis. For those who have to put their name to a decision, one question always hovers in the background: “If something goes wrong, will I/we be blamed?” This encourages shifting responsibility for providing evidence onto the applicants, as the requestors of the data – you want to do this new thing with the data, you have to show it’s safe and legal. But few researchers, data access panels or data centre staff are legal experts, and the responsibility starts its journey. All players want the same thing – confirmed safe and efficient use of data – but can’t always agree the best way of getting there. 

A popular solution is that researchers are requested to go and speak to their institution’s Data Protection Officer (DPO) or legal team to decipher which legal basis fits for their use of the data. But this shifts the problem; it doesn’t solve it. Institutional guardians face the same concerns about taking responsibility. Often stock answers are copied and pasted into forms based on previous experience of what has “passed”.

If this is an academic researcher, requesting data to do academic/government sponsored research, is it worth sending them to DPOs or expensive lawyers to get the same answer as 10 researchers before them, for something the panels are likely to know the answer for? Do researchers now need to be experts in GDPR/data sharing as well as project managers, grant writers, statistical experts, public speakers and all of the other currently required skills?

Most importantly, does this encourage the data sharing community to work together to use data safely? Or is it an example of misunderstanding and division?

From the data controller/support team/access panel point of view, an obvious solution seems to be training researchers in what legal bases are and how to find out what applies. This is the “tell them what they need to do” approach. Guidance documents can be written; if the forms are not completed appropriately, this is down to the applicants not reading or using the guidance.

The trouble is that applicants and the assessors of applicants don’t necessarily have the same language, interests or understanding. To the assessor#, ‘Show how this project supports organisation X’s public function’ has a clear context, purpose and meaning, and directly provides a legal basis for access. To the applicant, the question is gibberish unless she happens to be familiar with the legislation; even then, it is not clear how to answer it.

Is there another better solution?

Pedagogical evidence shows that researchers/applicants can understand and apply complex data protection issues if couched in language and examples that have meaning for them. Instead of telling people what they need to know, decide what you need to get out of them, what they can reasonably be expected to give you that fills that need, and make it interesting and easy for them to give you that information – as Mary Poppins would say “snap, the job’s a game!”.

This encourages a more cooperative frame of mind, a more compliant researcher, a sharing rather than shedding of responsibility. It reflects a broader movement towards the ‘community’ model of data access, where emphasis is placed on shared understanding and joint responsibility rather than separation of duties/risks.

This is not straightforward. Is there a way to ask researchers to describe what they’re going to do with the data, to allow data access panels to be comfortable enough to categorise a legal basis? Could it be a joint conversation? Could a checklist be used in the first instance to support researchers understand what answers MIGHT be acceptable? Could the data centre community create and publish a consensus on what is appropriate, acceptable and will be used as standard – allowing for the inevitable exceptions that cutting edge research brings?

The gains of a cooperative approach are procedural and personal: knowing what information can reasonably be supplied, and designing processes around that, rather than designing processes for an unachievable standard of input.

Pulling things away from the researcher may seem to place a higher burden on the assessment panel: moving from “tell me why what you are doing is lawful and ethical” to “tell me what you are doing, and I’ll decide if it is lawful and ethical”. But the burden comes in two parts, procedure and accountability, and the accountability burden never went away. The potato always stopped with the ones making the decision; shifting responsibility onto applicants to give good information doesn’t change this.

This is one small area of the application process, but across the board there are substantial gains to be made, both in the efficiency of operations , and in the confidence that both applicants and assessment panels can have in the correctness of decisions. The potato of responsibility can be made digestible.

This blog post was written by Professor Felix Ritchie who leads the Data Research, Access and Governance Network (DRAGoN) at UWE Bristol and Amy Tilbrook from the University of Edinburgh.

Welcome to the Data Research, Access & Governance Network (DRAGoN) blog

Posted on

Welcome to the Data Research, Access and Governance Network (DRAGoN) blog where we will share the latest updates and projects we’re involved with.

Led by Professor Felix Ritchie the Management Team also includes Dr Kyle Alves (Business & Management) Elizabeth Green (Economics) Dr Francesco Tava (Philosophy) and Damian Whittard (Economics). Formed in Autumn 2020, DRAGoN recognised that effective data use and governance requires contributions from many different professions: ethicists, statisticians, computer scientists, psychologists, economists, management scientists. Our aim is to create an environment for discourse which can bring differing perspectives together for the wider benefit. 

The modern world is increasingly dependent on data. It is central to our lives, directly in our own experience and indirectly through the way organisations use data. Much of the data is personally confidential, at the point of collection or when combined with other data. Often the confidentiality of data is unclear: are street observations by citizen scientists confidential? Photos of one’s family on social media? Facial recognition? Automatic number plate recognition? Data used to train machine learning systems? Is ‘sensitive’ or ‘personal’ the same as ‘confidential’? The confidentiality of data has a substantial effect on the way it is managed, perceived and exploited. This spills over into the management and use of open data, or data which is confidential for other reasons, such as commercial confidentiality: ethics, public perceptions, data security can also be just as important. 

Data access, management and governance is a highly applied topic; decisions being made every day which affect our lives, our business, our government, often in ways which are obscure or known only to specialists in that area. We see the application of theory to practice as essential to the ethos of the group. 

But we also need to reflect on practice: decisions about data use are often highly political, based on psychological or institutional factors. Working with practitioners helps inform our research with operational insights, as well as allowing us to challenge accepted viewpoints. 

We look forward to sharing developments from this research cluster, but in the meantime you can find out more through our bi-weekly seminars by signing up to our mailing list below and following us on Twitter.

Read more on our website

Follow us on Twitter

Sign up to our email list

This research cluster is funded through the Expanding Research Excellence scheme at UWE Bristol. The scheme aims to support and develop interdisciplinary, challenge-led research across the University. It is designed to bring together research clusters or networks that will work together to respond to challenges (local, regional, national, global) aligned with major research themes.

Event Summary: Rules vs Principles-based Regulations, what can we learn from different professions?

Posted on

Bristol Centre for Economics and Finance hosted an online event on 28th May 2020: Rules vs. Principles-based Regulation: What can we learn from different professions? Below is a summary and recording of each session.

Session 1: Data regulation 

Lizzie Green at UWE introduced the principles-based approach to data governance. She noted that protecting data is easy: just hide it in a big metal box. More difficult is protecting it while simultaneously extracting value from it. A rules-based approach offers clarity and consistency, but it can run into problems: humans are good at finding interpreting ambiguity for their own benefit, sometimes just for the pleasure of getting round a restriction. A principles-based approach gets round this, but it introduces uncertainty which can be hard to manage. One way to do this is to map principles to accreditation procedures, using frameworks such as the Five Safes. 

Felix Ritchie, also of UWE, described experiences of regulation in the UK and Australia. In the UK, this has been an evolutionary model, from a ‘default-closed’ perspective at the start of the century in a context set by a law from 1948, through two pieces of legislation and a shift in attitudes towards a default-open model. In contrast, the Australian federal government took a conscious decision to bring in the default-open principles-based models that had evolved slowly elsewhere. As public perceptions and discussion of data governance in these countries differ substantially, the contrast between the Australian and UK approach will be informative to a great many countries.  

Martin Hickley, Director of Martin Hickley Data Solutions, discussed private sector models of regulation. Focused on the Data Protection Impact Assessment carried out for the Covid-19 tracking app being trialled on the Isle of Wight. He argued that the DPIA is significantly flawed, and appears to have been completed as a check-box exercise rather than understanding the risk-based context. This is one of the problems of rules-based regulation. In contrast the principles-based approach calls for transparency, active scrutiny and debate – which are hard but necessary for robust solutions. 

Finally, Luk Arbuckle, Chief Methodologist of Privacy Analytics, discussed US health data regulation. He demonstrated that the HIPAA guidance includes both rules-based and principles-based regulation. The safe harbour regulations are rules based: comprehensive and easy to follow, but, most importantly, with a catch-all of “actual knowledge”. This brings in flexibility, but can lead to uncertainty; but overall, it means that the safe harbour rules are easily applied. In contrast, the ‘expert determination’ of the level of protection in the data is explicitly principles-based, relying on trained experts to make informed judgements based on “generally accepted statistical and scientific principles”.  In contrasting the two models, he noted that the safe harbour model demonstrates one of the problems of rules-based regulation – that it is more likely to become out of date as it reflects the context in which it was written. 

The ensuing discussion initially focused on the expertise needed in a principles-based environment: how for example do you enforce the principle “drive safely” without training people to understand what this means? More deeply, do we over-estimate the value of principles-based models because we are all ‘experts’ in this field in some way? Finally, how do we make sure that we have enough experts to do principles-based? Rules are very efficient in making sure a lot of people carry out a lot of activity adequately, and perhaps expertise isn’t needed all the time. Moreover, evidence suggests that a basic level of ‘expert knowledge’ can be instilled quite easily, in many different environments. 

A number of participants also suggested that the implementation matters. Some organisations claimed to be principles-based and but are actually rules-based, and there is always an incentive to turn compliance into a tick-box exercise. Perhaps there is a need to accept that encouraging positive behaviour via checkboxes might be a less-worst option than over-estimating people’s willingness to become experts. Understanding the threat environment is key, because all options are a subjective balance of risks. A mix of rules implementing overarching principles may be the preferred outcome. Conceptual frameworks have an important role to play in   developing the context. 

Finally, the discussion considered whether there is a difference between the public and private sector. There do seem to be different incentives (what is important) as well as different disincentives (what punishments are being avoided), and perhaps also a different way of assessing costs and benefits. However, there wasn’t a consensus as to whether this limits the options for public-private co-operative projects. 

Overall, the session concluded that while the principles-based models has many advantages (principally, flexibility in context and application, and efficiency), it does pre-suppose an ability to get agreement on /train individuals in those principles. Moreover, a badly-designed principles-based system doesn’t avoid box-ticking, especially for untrained users. In practice, an element of rules within an overall principles-based approach can offer efficiency gains, whilst not sacrificing the gains from a recognition of principles. Ultimately this is a balance-of-risks decision, and so understanding the risk environment (including human behaviour) is central to a well-designed system. 


Session 2: Regulation in UK financial markets and accounting 

Paul Keenan, of Keenan Regulatory Consulting and visiting Professor at UWE, introduced the two-pronged approach in UK financial markets. Following on from an initial simple principle (‘My word is my bond’) an extensive rulebook has been developed, leading to the current system of higher-level principles backed up by rules. In practice, he explained, when the regulator takes actions against market participants, they look at rule breaches and whether the interpretation of the rule led to a breach of a principle. Essentially, the regulator considers the firm’s understanding and interpretation of the rules to be in breach of the principles. So even if the rules have been broken the action taken, i.e. the fine imposed, is based on the principles. 

Bryan Foss, Digital Non-Executive Director, Risk & Audit Chair, and also visiting Professor at UWE, reflected on the need for, and implementation of, regulation. He argued that regulators should work with stakeholders to develop effective regulation, and that flexibility to change with circumstances over time is a key point for adequate regulation. Principles are therefore generally better suited by allowing scope for differences, innovation, easier revision or withdrawal. Fundamentally, however, both approaches require transparency, accountability, and stakeholder oversight to make them work. He also noted that there tends to be a lot of social pressure at the moment to increase the rules, and the UK regulator looking to bring in aspects of US rules-based elements, despite practitioners recognising the advantages of principles. 

Florian Meier of UWE discussed the self-regulation and enforcement approach used by UK professional accounting bodies. Members are subject to both professional regulations and principles-based codes of ethics, with the key component being the ethical principles. Self-regulation, however, raises a number of challenges which do cast some doubt on the effectiveness of enforcement. Ismail Adelopo, also of UWE, highlighted how corporate governance exhibits a clear split along a geographical line: The UK uses principles and the US uses rules, each having evolved from their historical contexts over time to each address specific situations and needs. The UK approach relies heavily on investors’ active involvement as a key factor in ensuring compliance and enforcement, but this leads to challenges such as: What if investors don’t play along and simply sell non-compliant firms instead of engaging with them? Who enforces compliance if everybody sells, or the market simply doesn’t care? 

The discussion focused largely on questions surrounding enforcement and effectiveness of approaches. Starting with the area of financial market regulations, the initial debate around appropriateness of fines quickly turned to looking at the broader aspect of penalties: As firms seem to increasingly consider fines as cost of doing business, maybe the focus then should be much more on the personal accountability of individuals? In this context the measure of imposing a ‘stop trading’ order on an individual or firm was brought up. Those can be more important than a simple fine since they may even lead to the closing of a firm or ending a career. Given those potentially severe consequences, robust processes to defend yourself or the firm against the regulator (if the regulator is wrong) are therefore seen as essential.  

Another interesting point brought up was that the market regulator’s approach seems to have shifted over the years from being rather heavy-handed and punitive in the past to a much more constructive approach: They are increasingly working with firms and affected individuals to help them improve and change to become better.  

In the area of corporate governance, the discussion touched upon shortcomings of the current UK approach and brought up ideas for improvement. For instance, it questioned the reliance of UK enforcement on investors and pointed to significant shortcomings. For one, the fact that it is essentially being left to the major shareholders to hold the firms to account or take them to the courts was raised as a concern. Unless in line with the majority, the minority shareholders’ interests get disregarded. They lack the resources to fight for their interests, so what recourse do they have other than either accepting this or selling the shares?  

Another concern of growing future significance was raised about pension funds and them increasingly making big investments in private firms. The corporate governance code does not apply to unlisted firms, and as such firms are not easy to divest from, pension funds are therefore probably even more dependent on good corporate governance. The question then becomes: How effective can those investors’ interests be protected, which are ultimately future pensioners? Further, the issue of what constitutes an appropriate penalty was raised and whether the UK has reached an appropriate balance. Especially as firms are often repeat offenders, doubts were expressed whether this can be solved without having a major overhaul to implement a robust regime. On that note, a suggestion was made to maybe learn from other countries, e.g. Australia, where the regulator has powers in regards to corporate governance and can intervene (unlike the UK). 


Session 3: Legal perspective and non-financial regulation 

Nicholas Ryder of UWE introduced the area of terrorism financing and the successful UK approach to combat it. He first described the current Anti Money Laundering (AML) regulations as fundamentally flawed to deal with terrorism financing, as the legal framework and international banks’ practice (‘soft law recommendations’) target the proceeds of crime, whereas terrorism financing is ‘reverse money laundering’ where no profits are made. By contrast, the more recent UK Joint Money Laundering Intelligence Task Force, a public/private partnership (PPP) with the financial sector, has been quite successful in detecting illegal fund flows and identifying funding patterns. Having been recognised as one of the best international examples of public and private cooperation, other countries have now adopted a similar model. Nicholas suggested that a PPP as opposed to a legal principles-based approach this task force could possibly be the way forward. 

Jaya Chakrabarti, CEO of Semantrica (tiscreport), introduced the TISC report (Transparency In Supply Chains) as a repository for measuring compliance with the UK Modern Slavery Act, along with numerous other financial risk and compliance datasets. She pointed out that, despite the level of compliance required being very low, a lot of companies still don’t provide a statement, and only a fraction of all organisations meet all of the minimum compliance criteria. The frequently observed low quality of data provided by firms poses a challenge to effective reporting. It makes acting on it difficult and thereby enables continued corporate misbehaviour. Further, enforcement seems to be largely non-existent despite potentially severe consequences for non-compliance, thus giving firms no ‘incentive’ to comply. As a way forward, while proper enforcement would be a key pillar for better effectiveness, she also presented some suggestions for modifying corporate behaviour that do not require government regulations and enforcement.  

The discussion mainly centred around enforcement and detection of illegal behaviour. The initial debate on the potential future role of Blockchain applications to certify and trace supply chains to aid transparency quickly turned to the key importance of getting the public sector and the key stakeholders on board to actively pursue enforcement. The public sector was lauded for already actively tracking their suppliers and ensuring compliance, with in particular local governments being very active and frequently working with their suppliers to increase levels of compliance. It was argued that a stumbling block to better enforcement was public bodies’ frequent inaction, even if they have the data, because they don’t know how to deal with it in their enforcement. Further, a general lack of enforcement and disinterest shown by the major stakeholders in various areas of regulation was flagged as a key problem. Using insurers as leverage to enforce better compliance was floated as an idea: that is, refusing professional indemnity insurance for cases of illegal company illegal behaviour, although doubts were also expressed about insurers’ willingness to get involved.  

The discussion then moved on to financial crime and the detection of illegal behaviour. First, the big problem of increasing so-called ‘micro-terrorism’ relying on very simple methods and small amounts which makes identifying individuals and prevent small attacks almost impossible, was pointed out. Regardless of approach (rules or principles), the view was that you can never stop all money laundering or financial crime, comparing it to ‘plugging a hole in a dam with plasticine’. The ‘risk-based approach’, as embodied in international laws and international best practice, to try to identify which businesses are more susceptible to fraud or money laundering, was seen as the best option. On fraud detection, the inability of the current self-reporting nature of verifying compliance with both the slavery act and the bribery act was flagged as a major weakness, with ample evidence from banking regulation showing the approach is not working. Examples from financial services were suggested to introduce accountability as a potential solution to the problem: In some roles, such as money laundering officers, individuals are accountable for self-reporting, so they take it very seriously. Hence non-reporting by the firm puts heavy pressure on that person, which may turn them into ‘whistle-blowers’.  


‘Five Safes’ or ‘One Plus Four Safes’? Musing on project purpose

Posted on

by Felix Ritchie and Francesco Tava

recent working paper discusses the ‘Fives Safes’ framework for confidential data governance and management. This splits planning into a number of separate but related topics:

  • safe project: is this an appropriate use of the data? Is there a public benefit, or excessive risk?
  • safe people: who will be using the data? What skills do they have?
  • safe setting: how will the data be accessed? Are there limits on transferring it?
  • safe data: can the detail in the data be reduced without excessively limiting its usefulness?
  • safe outputs: is confidentiality protected in products such as tables of statistics?

This framework has been widely adopted, particularly in government, both as a practical guide (eg  this one ) and as a basis for legislation (eg the UK Digital Economy Act or the South Australia data sharing legislation

As a practical guide, there is one obvious limitation. There is no hierarchy among the ‘safes’, and they are all interrelated; so which should you put most emphasis on?

We use the Five Safes to structure courses in confidential data management. One of the exercises asks the attendees to rank them as ‘what should we be most/least concerned with?’ The point of the exercise is not to come up with a definitive ranking, but to get the attendees to think about how different elements might matter in different circumstances.

This exercise generates much discussion. Over the years, we have had participants putting forward good arguments for each of the Five Safes as being the most important. Traditionally, and in the academic literature, Safe Data is seen as the most important: reduce inherent risk in the data, and all your problems go away. In contrast, in the ‘user centred’ planning we now advocate (eg here], Safe People is key: know who your users are, and design ethical processes, IT systems, training and procedures for them.

When training, this is the line we usually take, because we are training people to use systems which have already been designed. The aim of the training is to help people understand the community they are part of. Our views are therefore coloured by the need to work within existing systems.

Our thinking on this has been challenged by the developments in Australia. The Australian federal government is proposing a cross-government data sharing strategy based on the ‘Australian Data Sharing Principles’ (ADSPs). The ADSPs are based on the Five Safes but designed as a detailed practical guide to Australian government departments looking to share data for analysis. As part of the legislative process, the Australian government has engaged in an extensive consultation since 2018, including public user groups, privacy advocates, IT specialists, the security services, lawyers, academic researchers, health services, the Information Commissioner, and the media.

Most of the concerns about data sharing arising in the consultation centre on the ‘safe project’ aspect. Typical questions that cropped up frequently included:

  • How do we know the data sharing will be legal/appropriate/ethical?
  • Who decides what is in the ‘public interest’?
  • How do you prevent shared data, approved for one purpose, being passed on or re-used for another purpose without approval?
  • What sort of people will we allow to use the data? Should we trust them?
  • What will happen to the data once the sharing is no longer necessary? How is legacy data managed?
  • Do we need to lay down detailed rules, or can we allow for flexible adherence to principles?
  • Where are the checks and balances for all these processes?

These are all questions which need to be addressed at the design stage: define the project scope, users and duration, and then assess whether the likely benefits outweigh costs and reasonable risks. If this can’t be done… why would you take the project any further?

Similarly, in recent correspondence with a consulting firm, it emerged that a key part of their advice to firms on data sharing is about use: the lawfulness of the data sharing is relatively easy to establish – once you have established the uses to which that shared data will be put. Some organisations have argued that there should be an additional ‘safe’ just to highlight the legal obligations.

This is particularly pertinent for data sharing in the public sector, where organisations face continual scrutiny over the appropriate use of public money. A clear statement of purpose and net benefits at the beginning of any project can make a substantial difference to the acceptability of the project. And whilst well-designed and well-run projects tend to be ignored by people not involved, failures in public data sharing (eg Robodebt or tend to have negative repercussions far beyond the original problems.

This is not the only concern facing data holders in a digital age of multi-source data. Handling confidential data always involves costs and benefits. Traditional approaches that focus on Safe Data identify the data holder as the relevant metric for these costs and benefit. A recent paper shows how this vision is at odds with the most recent developments in the information society that we live in. Consider the use of social media in research: is any of the actions by the author, the distributor or the researcher sufficient in itself to establish the moral authority of an end use? In this modified context, traditional ethical notions such as individual agency and moral responsibility are gradually substituted by a framework of distributed morality, whereby multiagent systems (multiple human interactions, filtered and possibly extended by technology) are responsible for big morally-loaded actions that take place in today’s society (see on this).

In this complex scenario, taking the data holder as the only arbiter of data governance might be counterproductive, insofar as practices that are morally neutral for the data holder (for example, refusing to consider data sharing) could damage the multiagent infrastructure which that data holder is part of (eg limiting incentives to participate). On the other hand, practices that can cause a minor damage to one of the agents (such as reputational risk for the data holder) could lead to major collective advantages, whose attainment would justify that minor damage, and make acceptable on a societal basis.

In order to minimise the risks, an innovative data management approach should look at the web of collective and societal bonds that links together data owners and users. In practice, this means that decision-making regarding confidential data management will not be grounded on the individual agency and responsibility of individual agents, but will rather correspond to a balance of subjective probabilities. On these premises, focusing on the Safe Project makes pre-eminent the notion that data should be made available for research purposes if the expected benefit to society outweighs the potential loss of privacy for the individual. The most challenging question is, of course, how to calculate this benefit, when so many of the costs and benefits are unmeasurable.

And this is the difference between Safe Projects and the others. ‘Safe projects’ addresses the big conceptual questions. Safe people, safe settings and safe outputs are about the systems and procedure to implement those concepts, whilst Safe Data is the residual (select an appropriate level of detail once the context is defined). So rather than Five Safes perhaps there should be One Plus Four Safes…

About the authors

Felix Ritchie is Professor of Applied Economics in the department of Accounting Economics and Finance

Francesco Tava is Senior Lecturer in Philosophy in the Department of Health and Applied Social Sciences

Back to top