Past DRAGoN Webinars: The importance of solidarity in Data sharing.

Posted on

DRAGoN runs a very popular webinar series that brings academics together with practitioners and members of the public facilitating conversation to flow from theory to practice. The next run of these webinars are due to start at the end of February 2023.

In the first of three blogs looking back on the Data Ethics and Governance webinar series so far Francesco Tava reflects on past seminars and the impact they have had on his work . In this post Francesco focuses on the webinar closest to one of his main research areas – data ethics.

Francesco starts by explaining “The idea of this series is to overcome the typical academic boundaries by involving professionals working in various sectors of data governance in a discussion around ethical concepts and problems arising from the use, access and sharing of data.” From the webinars we have held so far, one that aligns to my current research interests was the discussion on solidarity-based data governance.” with Barbara Prainsack (University of Vienna)

Francesco goes on to say that often the first and only consideration underpinning data governance is how to defend privacy. However, there are other principles which are equally important when it comes to data access and sharing that tend to fall by the wayside. One of which being solidarity. How can we rebalance this approach to include principles such as solidarity when assessing the practicalities and risks involved in data management?

‘’Data, not just
something completely
 impersonal, objective material…
Somehow describes
 who we are”

The talk that Barbara Prainsack held and the discussion that followed was very valuable. Not only did this discussion investigate a series of issues stemming from data management, but it also envisaged how a solidarity-based approach can provide possible solutions to these problems.

You can listen to the whole conversation for a deeper dive into Francesco’s reflection of this webinar here

You can listen to past DRAGoN talks here.

New linked dataset available to provide insights into earnings and employment in Britain

Posted on

The ADR UK funded Wage and Employment Dynamics (WED) initiative that aims to provide new insights into the dynamics of earnings and employment in Great Britain has made a new dataset available. Accredited researchers can apply to use the de-identified Annual Survey of Hours and Earnings (ASHE) – 2011 Census linked dataset from the Office for National Statistics (ONS) Secure Research Service.

Using the ASHE – 2011 Census linked dataset, researchers can explore how factors such as gender, ethnicity, disability and migration affect individual’s wage levels and pay progression. Access to this data will provide fresh insights into things such as the experiences of young people entering the labour market, job mobility, and career progression to retirement. This will enable policymakers to make better-informed decisions to improve the experiences of people in employment.

About the data

The ASHE dataset is an annual survey based on a 1% sample of employee jobs, drawn from the Pay As You Earn (PAYE) register, and conducted by the ONS. This de-identified dataset contains information on employees’ earnings, paid hours, occupation as well as a limited number of personal characteristics: gender, age, and residential location.

To expand the number of de-identified, personal characteristics that can be observed for employees, the ASHE dataset has been linked to the 2011 Census. The 2011 Census dataset includes characteristics such as educational and vocational qualifications, health, disability, and household circumstances for individuals in England and Wales.

The 2011 Census has been matched to all employees observed in ASHE in either 2010, 2011 or 2012. The ASHE – 2011 Census linked dataset therefore covers the period 1997-2020, but only includes individuals who were in 2011 Census and were able to be matched to individuals in either ASHE 2010, 2011 or 2012. It also includes some enhancements to the basic ASHE data, such as minimum wage rates and survey dates.

The linked ASHE – 2011 Census dataset allows researchers to examine pay and career progression of a cohort of individuals who were in employment around the time of the 2011 Census. This data is linked and made available under the provisions of the 2017 Digital Economy Act, which provides a legal gateway for researchers to access government data in a secure way. For more information about the linked dataset, see the ASHE – 2011 Census user guide.

Huge potential for this dataset

This dataset will help to fill a crucial evidence gap in information about the experiences of individuals in the labour market. Importantly, for example, ASHE – 2011 Census will enable researchers to understand the underlying causes of intersectional pay gaps and provide a deeper understanding of how demographic factors have informed labour market transitions.

The dataset could help answer a range of research questions to inform policy, such as:

  • How do personal characteristics such as household structure affect wage progression?
  • How does wage progression differ depending on characteristics such as gender, disability, or ethnicity?  
  • What role do employers play in wage inequality?  
  • What are the returns on investment from education?
  • What is the relationship between migration and the labour market?

Tim Butcher, Chief Economist, Low Pay Commission said,

The WED project addresses weaknesses in our evidence base – improving the quality of longitudinal earnings data and extending coverage to a broader range of characteristics – that should enable researchers to give new and innovative insights into the wage and employment dynamics of the lowest paid.

How to access the dataset

The ASHE – 2011 Census dataset is available from the ONS Secure Research Service. Access to the dataset is managed securely in line with the Five Safes framework. Accredited researchers can apply to access the data by submitting a project application. For more guidance on submitting project applications, visit the ONS Secure Research Service.

The ASHE – 2011 Census linked dataset was produced as a collaboration between ADR UK, the ONS and the Wage and Employment Dynamics project team. The project team is led by Professor Felix Ritchie and Damian Whittard of the University of the West of England, and involves partners from UCL, Bayes Business School, the University of Reading and National Institute of Economic and Social Research.

Teaching researchers about data protection law: a terrible idea

Posted on

Written by Elizabeth Green (UWE Bristol), Felix Ritchie (UWE Bristol) and Amy Tilbrook (University of Edinburgh)

Many researchers in the UK working with confidential data attend the ‘Safe Researcher Training’ course (SRT). This training course was commissioned by the Office for National Statistics (ONS) in 2017 and is (so far) the only accredited training programme for researchers under the Digital Economy Act 2017. Attendance is compulsory for all those using the ‘safe havens’ or ‘trusted research environments’ run by the ONS, the UK Data Archive, Health and Social Care NI, the Northern Ireland Statistics and Research Agency, HM Revenue and Customs, various big data research centres, and those using certain public sector datasets in Scotland,

The SRT was designed using novel principles of confidentiality training, recognising that researchers are human: intrinsically motivated (i.e. threats of fines and punishment don’t work); self-interested but also well-intentioned; mostly quite bright but occasionally very foolish; and, crucially, able to engage in nuanced discussions about the safe and effective use of confidential data. Over three thousand researchers (from academia, government and the private sector) have been through the training and taken the test since 2017, and the SRT has become a reference point for basic training in research good practice, with materials being adapted for use in Europe, North America and Australasia. The training is delivered by several UK organisations; each has taken a slightly different approach to delivery of the core material but the learning outcomes (as measured by test results) are broadly similar, suggesting the core training material is robust to the presentation style of different trainers.

One omission from the SRT is a detailed discussion of data protection laws: which laws govern access, what the researcher’s formal rights and responsibilities are, what penalties can be incurred. This is in direct contrast to many data management and governance training courses; indeed, the SRT’s predecessor included a before-and-after ‘quiz’ parts of the Data Protection Act 1998 relevant to research. This omission of a detailed exposition of the applicable law(s) causes concern amongst organisations who require users of their data to take the SRT and pass the test.

The rationale of this position is: how can you expect researchers to obey the law if they don’t know what the law is? Researchers should know

  • which laws are relevant, including common law duties of confidentiality
  • the lawful basis of access
  • the specific limitations of each law in relation to their data
  • the consequences of breaches (for which read: fines and jail)

By outlining these legal specifics, researchers should have no doubt as to their responsibility. This also transfers responsibility from the data holder: those who breach laws (intentionally or, more likely, accidentally) cannot attribute the breach to a lack of knowledge around these laws.

This approach arises naturally from the ‘defensive’ approach to data governance typically taken by data holders, aiming to ensure that all potential risks are covered before release is considered. Intellectually, the foundation of this idea is economic models of rational decision-making: data has value and so may be misused, but only if costs exceed benefit. Providing data users with a clear legal basis, evidence of monitoring and control, and knowledge of the severe penalties for transgressing the limits, researchers stay on the right side of the line. Moreover, if they do transgress, the data user has a solid foundation for civil or criminal prosecution, which in itself should increase compliance.

The trouble with this approach is that it lacks any substantive evidence to support it. In contrast, there is a great deal of well-founded evidence, particularly from psychologists, to suggest the opposite. This evidence consistently aligns with our psychological understanding of human behaviour.

Alongside the fundamental misunderstanding of human nature, simplistic training assumptions about legal liability are also likely to be misaligned with real life case law and may miss nuances of legal rules and pathways resulting in further confusion. Most importantly, focusing on legal liability ignores the fact that genuine breaches of confidentiality in research are vanishingly small and very difficult to prove. In contrast, breaches of operating procedures (but not law) are not very unusual, and are generally easy to prove.

Some data controllers see this communication as part of being a responsible data controller – even if the attendees don’t register the detail, they remember the message about legal conditions being important and costly to break. So what is the harm in including this in the course?

The main argument against this is that it is counter-productive in the context of the SRT. The SRT is designed to build trust and community engagement. An assumption of lawlessness and the highlighting of inappropriate behaviour disrupts that message, by implying “I don’t trust you, so here’s what will happen if you put a finger wrong”. This weakens the community message.

In contrast, SRT is designed so researchers know what constitutes safe behaviour with data in most research cases. Researchers are shown how operating procedures serve to protect researchers from accidentally breaking the law, and how to actively engage with them. Researchers are encouraged to discuss the compromises involved in designed data access systems, and so develop a sense of shared responsibility. It is not a textbook for how to complete a successful data application or project to each data controller/research centre, but it is a way to approach such tasks so both parties are satisfied that all risks have been covered. Therefore the SRT approach to legal aspects is grounded in three questions:

  • What do researchers need to know to behave safely?
  • How and what do they learn?
  • How do we build a community identity so that when things do go wrong we co-operate to resolve them?

Focusing on the above three questions allows researchers to actively reflect on their own actions and conceptualise their responsibility to the project and the data. Moreover, in not examining and outlining specific laws the material retains relevance even if law changes, or if the material is used in different countries or for international projects (as it has been). There are evidence-based answers to these three quesitons, and in our next blog we explore them further

Finally, for those who still believe that threats are helpful, it is worth noting that criminal sanctions are not seen as credible. The lack of successful prosecutions, the researcher’s own self-belief that they are not law-breakers, and the obvious disincentive for data holders to publicise a data breach means that criminal sanctions become a straw man, and the teacher’s authority is damaged. In contrast, the SRT focuses on ‘soft’ negatives (reputation, kudos, access to funding, access to data, employability), and emphasises the difference between honest mistakes and selfish behaviour. As well as being more meaningful to researchers, these also align to the ‘community spirit’ being developed. The consistency of the message on this topic is as important as the contents.

Legal bases and Using Secure Data for Research

Posted on

Researchers requesting microdata (individual records) from data centres or data access panels are usually required to describe the legal basis for their use of the data. This is because data controllers and processors need to have a legal basis documented for each use of data under UK GDPR. 

However, researchers are usually unaware of legal bases. The form fields may be left blank or, more often, filled with vague answers that might not fulfil what the access panel are requesting. This creates an inefficient process where support teams, panel managers and researchers are engaged in a back-and-forth to get the required information in the right box.

Underlying the current application form dance is the hot potato of responsibility. Someone needs to decide whether the data can be legally processed under which legal basis. For those who have to put their name to a decision, one question always hovers in the background: “If something goes wrong, will I/we be blamed?” This encourages shifting responsibility for providing evidence onto the applicants, as the requestors of the data – you want to do this new thing with the data, you have to show it’s safe and legal. But few researchers, data access panels or data centre staff are legal experts, and the responsibility starts its journey. All players want the same thing – confirmed safe and efficient use of data – but can’t always agree the best way of getting there. 

A popular solution is that researchers are requested to go and speak to their institution’s Data Protection Officer (DPO) or legal team to decipher which legal basis fits for their use of the data. But this shifts the problem; it doesn’t solve it. Institutional guardians face the same concerns about taking responsibility. Often stock answers are copied and pasted into forms based on previous experience of what has “passed”.

If this is an academic researcher, requesting data to do academic/government sponsored research, is it worth sending them to DPOs or expensive lawyers to get the same answer as 10 researchers before them, for something the panels are likely to know the answer for? Do researchers now need to be experts in GDPR/data sharing as well as project managers, grant writers, statistical experts, public speakers and all of the other currently required skills?

Most importantly, does this encourage the data sharing community to work together to use data safely? Or is it an example of misunderstanding and division?

From the data controller/support team/access panel point of view, an obvious solution seems to be training researchers in what legal bases are and how to find out what applies. This is the “tell them what they need to do” approach. Guidance documents can be written; if the forms are not completed appropriately, this is down to the applicants not reading or using the guidance.

The trouble is that applicants and the assessors of applicants don’t necessarily have the same language, interests or understanding. To the assessor#, ‘Show how this project supports organisation X’s public function’ has a clear context, purpose and meaning, and directly provides a legal basis for access. To the applicant, the question is gibberish unless she happens to be familiar with the legislation; even then, it is not clear how to answer it.

Is there another better solution?

Pedagogical evidence shows that researchers/applicants can understand and apply complex data protection issues if couched in language and examples that have meaning for them. Instead of telling people what they need to know, decide what you need to get out of them, what they can reasonably be expected to give you that fills that need, and make it interesting and easy for them to give you that information – as Mary Poppins would say “snap, the job’s a game!”.

This encourages a more cooperative frame of mind, a more compliant researcher, a sharing rather than shedding of responsibility. It reflects a broader movement towards the ‘community’ model of data access, where emphasis is placed on shared understanding and joint responsibility rather than separation of duties/risks.

This is not straightforward. Is there a way to ask researchers to describe what they’re going to do with the data, to allow data access panels to be comfortable enough to categorise a legal basis? Could it be a joint conversation? Could a checklist be used in the first instance to support researchers understand what answers MIGHT be acceptable? Could the data centre community create and publish a consensus on what is appropriate, acceptable and will be used as standard – allowing for the inevitable exceptions that cutting edge research brings?

The gains of a cooperative approach are procedural and personal: knowing what information can reasonably be supplied, and designing processes around that, rather than designing processes for an unachievable standard of input.

Pulling things away from the researcher may seem to place a higher burden on the assessment panel: moving from “tell me why what you are doing is lawful and ethical” to “tell me what you are doing, and I’ll decide if it is lawful and ethical”. But the burden comes in two parts, procedure and accountability, and the accountability burden never went away. The potato always stopped with the ones making the decision; shifting responsibility onto applicants to give good information doesn’t change this.

This is one small area of the application process, but across the board there are substantial gains to be made, both in the efficiency of operations , and in the confidence that both applicants and assessment panels can have in the correctness of decisions. The potato of responsibility can be made digestible.

This blog post was written by Professor Felix Ritchie who leads the Data Research, Access and Governance Network (DRAGoN) at UWE Bristol and Amy Tilbrook from the University of Edinburgh.

Welcome to the Data Research, Access & Governance Network (DRAGoN) blog

Posted on

Welcome to the Data Research, Access and Governance Network (DRAGoN) blog where we will share the latest updates and projects we’re involved with.

Led by Professor Felix Ritchie the Management Team also includes Dr Kyle Alves (Business & Management) Elizabeth Green (Economics) Dr Francesco Tava (Philosophy) and Damian Whittard (Economics). Formed in Autumn 2020, DRAGoN recognised that effective data use and governance requires contributions from many different professions: ethicists, statisticians, computer scientists, psychologists, economists, management scientists. Our aim is to create an environment for discourse which can bring differing perspectives together for the wider benefit. 

The modern world is increasingly dependent on data. It is central to our lives, directly in our own experience and indirectly through the way organisations use data. Much of the data is personally confidential, at the point of collection or when combined with other data. Often the confidentiality of data is unclear: are street observations by citizen scientists confidential? Photos of one’s family on social media? Facial recognition? Automatic number plate recognition? Data used to train machine learning systems? Is ‘sensitive’ or ‘personal’ the same as ‘confidential’? The confidentiality of data has a substantial effect on the way it is managed, perceived and exploited. This spills over into the management and use of open data, or data which is confidential for other reasons, such as commercial confidentiality: ethics, public perceptions, data security can also be just as important. 

Data access, management and governance is a highly applied topic; decisions being made every day which affect our lives, our business, our government, often in ways which are obscure or known only to specialists in that area. We see the application of theory to practice as essential to the ethos of the group. 

But we also need to reflect on practice: decisions about data use are often highly political, based on psychological or institutional factors. Working with practitioners helps inform our research with operational insights, as well as allowing us to challenge accepted viewpoints. 

We look forward to sharing developments from this research cluster, but in the meantime you can find out more through our bi-weekly seminars by signing up to our mailing list below and following us on Twitter.

Read more on our website

Follow us on Twitter

Sign up to our email list

This research cluster is funded through the Expanding Research Excellence scheme at UWE Bristol. The scheme aims to support and develop interdisciplinary, challenge-led research across the University. It is designed to bring together research clusters or networks that will work together to respond to challenges (local, regional, national, global) aligned with major research themes.

Back to top