EVENT: How can councils enhance their utilisation of data for research, evaluation, and decision-making?

ripleywk | 3 May 20243 May 2024

In the last webinar of the season Abdinasir Kowdan and Orla Dunn will be presenting a talk on “How can councils enhance their utilisation of data for research, evaluation, and decision-making?” This engaging webinar promises to delve into the crucial role of data in informing decision-making processes within local councils.

Their research aims to explore strategies and best practices that can empower councils to leverage data more effectively, fostering evidence-based decision-making, robust evaluation, and impactful research initiatives. With their expertise and insights, this talk will undoubtedly shed light on the transformative potential of data-driven governance.

Whether you’re a researcher, a council member, or simply someone passionate about data-driven policymaking, this event is an excellent opportunity to broaden your understanding and gain valuable insights from this work.

Date: 08/05/24

Time:6pm

April 24: Save the date for the next instalment of the DRAGoN Seminar Series!

ripleywk | 19 April 2024

We’re excited to announce our upcoming session, “Data Governance and Religion”, led by the Juan Carlos. Join us on April 24th at 6:00 PM for what promises to be a thought-provoking discussion on the intersection of data governance and religious perspectives.

Seminar Details:

Date and Time: April 24, 2024, at 6:00 PM

Location: The seminar will be hosted on Microsoft Teams. Joining instructions will be provided upon registration.

Registration: As always, our seminars are free and open to all who are interested. Secure your spot by registering at this link.

Bridging the Gap: Governance of Confidential Research Data in Low- and Middle-Income Countries

ripleywk | 28 July 202328 July 2023

DRAGoN Host a weeklong international workshop, reviewing the current gap between Data Governance in high income countries (HICs) and low- and middle-income countries (LMICs). This workshop will run with two parallel cohorts to accommodate both Eastern and western hemispheres.

What Gap?

When it comes to research and policy development on the governance of confidential research data, high-income countries (HICs) have taken centre stage. However, this has left a significant knowledge gap for low- and middle-income countries (LMICs), where three-quarters of the global population resides. The existing guidelines and practices, although robust, fail to address the unique circumstances and challenges faced by LMICs. To address this disparity, DRAGoN is organizing a workshop working directly with people in these countries addressing and exploring these issues. This article gives an overview into what the workshop aims to achieve and why it matters to anyone involved in data research.

The workshop seeks to delve into the current state of research data governance in LMICs and uncover the challenges these countries face. Unlike their HIC counterparts, LMICs operate within distinct institutional, legal, historical, and cultural contexts, necessitating tailored data governance models.

The lack of evidence regarding the effectiveness of transferring HIC data governance models to LMICs further emphasizes the need for comprehensive understanding. Additionally, certain data governance issues specific to LMICs have yet to be addressed

The lack of evidence regarding the effectiveness of transferring HIC data governance models to LMICs further emphasizes the need for comprehensive understanding. Additionally, certain data governance issues specific to LMICs have yet to be addressed. By identifying the gaps in knowledge and evidence, the workshop aims to pave the way for data governance practices that are more inclusive and effective.

Exploring Good Practices & Opportunities for Supporting Good Data Governance:

Despite the existing disparities, the workshop will also explore examples of good practice that could be applied more widely in LMICs. These success stories will provide inspiration and practical insights for data researchers, regulators, academics, governments, and NGOs involved in making confidential data available for research use. By sharing these experiences, the workshop aims to encourage the adoption of effective data governance practices across LMICs.

One of the main goals of the workshop is to identify opportunities for supporting good data governance in LMICs. By bringing together stakeholders from different regions, the event aims to foster collaboration and the exchange of ideas. Over the course of a week small-group discussions will take place virtually, accommodating participants from both the Eastern and Western Hemispheres. These discussions will help generate valuable insights and recommendations for addressing the challenges and improving data governance practices.

To gather additional evidence and insights, a short survey will be distributed to participants and others involved in data research. The survey aims to capture current practices and perceptions regarding research data governance in LMICs. The information collected can then contribute to a comprehensive summary report, outlining the current landscape, trends, disparities, and highlighting any potential ways forward. This report will serve as a snapshot of the current landscape and a foundation for future exploration by academics and policymakers, stimulating further research and policy development in this vital area.

Bottom line; Why is this workshop so important?

The governance of confidential research data is a critical aspect of ensuring ethical and effective data research practices worldwide. By acknowledging the unique circumstances of LMICs and addressing the challenges they face, we can foster more inclusive and sustainable data governance models. The workshop organized by DRAGoN is an important step toward bridging the gap between HIC-dominated research and the specific needs of LMICs. It provides an opportunity for stakeholders to come together, exchange knowledge, and collectively work towards advancing data governance practices. By participating in this workshop, you can contribute to shaping the future of research data governance in LMICs and make a meaningful impact in the field of data research.

To register for the workshop or for further information, please visit the event page for more information or contact the workshop planning group at dragon@uwe.ac.uk. We look forward to your participation!

What are the output disclosure control issues associated with qualitative data?

jyotishaw | 23 November 202124 November 2021

Green paper

Elizabeth Green¹, Felix Ritchie¹, Libby Bishop², Deborah Wiltshire², Simon Parker³, Allyson Flaster⁴and Maggie Levenstein⁴

¹The University of the West of England, ²GESIS, ³DKFZ German Cancer Research Center, ⁴University of Michigan

Context

When carrying out research with confidential quantitative data, there is much support for researchers. There is ample advice on best practice for collecting (remove identifiers as soon as possible, only collect statistically useful information), a vast literature on how to reduce the risk in microdata (swapping, top coding, local suppression, rounding, perturbation, …), and a small but effective literature on how to prevent statistical outputs from the residual disclosure risk (eg a combination of tables showing that a one-legged miner in Bristol earns £50k a year).

For qualitative data, there is much less guidance. At the collecting/storing stage there is clear good practice (such as removing direct identifiers), although it may be hard to separate out analytically vital information from contextual information. In particular, the trade-off between anonymization and fitness for use may be much sharper for qualitative than quantitative data. Improvements in natural language processing (NLP) have enabled the development of anonymization tools in the UK and in Germany (e.g., QualiAnon) for qualitative data. However, when producing analyses there appears to be little or no general guidance on output disclosure control (ODC), and researchers are required to rely on informal advice and rules of thumb for good practices. This challenge is exacerbated by the wide variety of genres of qualitative data which makes guidance difficult to generalise.

Why the lack of practical guidance for output checking of qualitative data when there is a well-established set of guidelines for quantitative data? In one perspective, the lack of guidelines is not surprising. Guidelines for quantitative data were almost exclusive developed to meet the needs of national statistics institutes (NSIs), and thence filtered down to trusted research environments (TREs, secure research facilities usually specialising in quantitative data). Outside of NSIs and TREs, knowledge of output disclosure control is very limited, not even making it onto the syllabi of research methods courses. In this context, perhaps it not surprising that there are no guidelines for qualitative data: guidelines for quantitative research only appeared because of economies of scale and scope, and have remained largely in the environment in which they were developed.

The need for qualitative data ODC guidelines has five drivers. First, there is a greater awareness of the need to maintain confidentiality, driven by legislation and regulation. Every journal has to trust that researchers have anonymized enough/not too much, but no metrics exist for how to assess this. Second, the lack of consistent guidelines means each generation of researchers must develop their own rules, which is inefficient and increases the likelihood of error. Third, the increased used of NLP tools has increased the number and types of researchers who are working with qualitative data. Finally, the development of TREs offers great opportunities for very detailed, unredacted qualitative data to be shared easily whilst ,maintaining security; but this must be supported by clear disclosure guidance for outputs. Whilst most TREs have a policy of checking outputs for residual risk, it is not cleared whether the skills, resources and processes exist to do this for qualitative data.

Literature

UKDA provides some guidelines and also a tool for anonymising qualitative outputs; however the approach focuses on removal of direct identifiers and does not address nuance or contextual identifiers. The CESSDA Data Management Expert Guide provides a worked example of transcript anonymisation.

Kaiser (2009) outlines deductive disclosure in which the contextual nuance of a situation allows for an individual to be identified. Kaiser suggests that in order to address this issue, researchers should discuss the use of the research with participants, including describing in the consent to participate how the data will be made available to researchers, the types of research that are permissible, and the protections that will be in place for both the original research team and secondary analysts. However, this may not be possible with some types of data, and this also assumes that the discussion leads to genuinely informed consent rather than meeting a procedural tickbox.

The problem

ODC of quantitative data is conceptually straightforward. While quantitative data may be very highly structured (eg multilevel multiperiod data on GPs, patients and hospitals) or highly unstructured eg quantitative textual analysis), all quantitative data can be seen, ultimately, as tables of numbers used to produce summary data. The same ODC rules can be applied in all cases.

In contrast, qualitative data are varied in both content and structure; examples could be

Interview recordings/transcripts
Written responses in surveys
Psychiatric case studies
Videos and images
Ethnographic studies
Court records
Social media text

In each of these cases, protecting confidentiality may require different solutions. In psychiatric case studies information may remain identifiable when published, but informed consent is used to agree to the higher level of re-identification risk. In interview responses, redaction may be a very effective response; in videos, pixilation. In court records and social media, the semi-public nature of the source data may cause difficulties particularly around de-identification. Future technology is likely to throw up more options, such as digital behaviour data, or currently unimaginable data types.

Approaches to solutions

Given the range of qualitative data types, it seems unlikely that universal rules could be developed. However, there may be ways to develop general solutions (frameworks?)

Method-specific solutions
Data type-specific solutions

There may also be solutions which involve both input controls (eg consent) as well as output (redaction) methods. This may allow us to sidestep the question of what is permissible, but it does not address what is ethical to disclose.

If considering types of qualitative data output, it may be useful to consider where the value is generated. For example, e.g. in a video recording, nuances of the subject’s body language may be more important than the words; if so, redaction of text is possible, but pixilation isn’t. Understanding the research value may direct outputs of the same type towards different solutions. Thus it is important to distinguish between disclosure of “raw” data that researchers interact with and disclosure of outputs that are available to the general public in an unregulated environment. It is also important to focus on fitness for use, as certain kinds of disclosure control may degrade data quality but in ways that do not affect its use for certain types of analsysis, while making the data essentially worthless for other types of analysis.

Guidelines should address the different types of data, the accessibility of the data, and the intended use of the data in order to develop broader organising principles.

There may also be a need to create some definitions to provide a common language for discussing risks and solutions.

Next steps

Given the lack of consensus as to how to ensure safe outputs from qualitative analysis, the most productive first step may be a webinar with a credible global audience of interested parties to explore some of the issues raised here. Ideally, the audience would include both researchers working with a variety of qualitative data types and some data protection and confidentiality specialists.

The aim of the webinar would be to develop a programme of work – the initial step in this programme could be the formation of study groups each focused on a particular type of data. These groups could then report back during a second workshop. In addition the workshop could consider how to share guidelines amongst the research community i.e. embedding good practice into Research Methods courses, or Data Management Plans, with the intent of avoiding following the quantitative route of concentrating disclosure control training in limited environments. It may also be helpful to explore funding opportunities if this looks to be a significant programme of work.

The workshop will take place on Friday 10th December 15:00-17:30 via Microsoft teams

To register for the event, please click here

Any questions and queries please contact Lizzie Green elizabeth7.green@uwe.ac.uk

Legal bases and Using Secure Data for Research

jyotishaw | 10 November 202110 November 2021

Researchers requesting microdata (individual records) from data centres or data access panels are usually required to describe the legal basis for their use of the data. This is because data controllers and processors need to have a legal basis documented for each use of data under UK GDPR.

However, researchers are usually unaware of legal bases. The form fields may be left blank or, more often, filled with vague answers that might not fulfil what the access panel are requesting. This creates an inefficient process where support teams, panel managers and researchers are engaged in a back-and-forth to get the required information in the right box.

Underlying the current application form dance is the hot potato of responsibility. Someone needs to decide whether the data can be legally processed under which legal basis. For those who have to put their name to a decision, one question always hovers in the background: “If something goes wrong, will I/we be blamed?” This encourages shifting responsibility for providing evidence onto the applicants, as the requestors of the data – you want to do this new thing with the data, you have to show it’s safe and legal. But few researchers, data access panels or data centre staff are legal experts, and the responsibility starts its journey. All players want the same thing – confirmed safe and efficient use of data – but can’t always agree the best way of getting there.

A popular solution is that researchers are requested to go and speak to their institution’s Data Protection Officer (DPO) or legal team to decipher which legal basis fits for their use of the data. But this shifts the problem; it doesn’t solve it. Institutional guardians face the same concerns about taking responsibility. Often stock answers are copied and pasted into forms based on previous experience of what has “passed”.

If this is an academic researcher, requesting data to do academic/government sponsored research, is it worth sending them to DPOs or expensive lawyers to get the same answer as 10 researchers before them, for something the panels are likely to know the answer for? Do researchers now need to be experts in GDPR/data sharing as well as project managers, grant writers, statistical experts, public speakers and all of the other currently required skills?

Most importantly, does this encourage the data sharing community to work together to use data safely? Or is it an example of misunderstanding and division?

From the data controller/support team/access panel point of view, an obvious solution seems to be training researchers in what legal bases are and how to find out what applies. This is the “tell them what they need to do” approach. Guidance documents can be written; if the forms are not completed appropriately, this is down to the applicants not reading or using the guidance.

The trouble is that applicants and the assessors of applicants don’t necessarily have the same language, interests or understanding. To the assessor#, ‘Show how this project supports organisation X’s public function’ has a clear context, purpose and meaning, and directly provides a legal basis for access. To the applicant, the question is gibberish unless she happens to be familiar with the legislation; even then, it is not clear how to answer it.

Is there another better solution?

Pedagogical evidence shows that researchers/applicants can understand and apply complex data protection issues if couched in language and examples that have meaning for them. Instead of telling people what they need to know, decide what you need to get out of them, what they can reasonably be expected to give you that fills that need, and make it interesting and easy for them to give you that information – as Mary Poppins would say “snap, the job’s a game!”.

This encourages a more cooperative frame of mind, a more compliant researcher, a sharing rather than shedding of responsibility. It reflects a broader movement towards the ‘community’ model of data access, where emphasis is placed on shared understanding and joint responsibility rather than separation of duties/risks.

This is not straightforward. Is there a way to ask researchers to describe what they’re going to do with the data, to allow data access panels to be comfortable enough to categorise a legal basis? Could it be a joint conversation? Could a checklist be used in the first instance to support researchers understand what answers MIGHT be acceptable? Could the data centre community create and publish a consensus on what is appropriate, acceptable and will be used as standard – allowing for the inevitable exceptions that cutting edge research brings?

The gains of a cooperative approach are procedural and personal: knowing what information can reasonably be supplied, and designing processes around that, rather than designing processes for an unachievable standard of input.

Pulling things away from the researcher may seem to place a higher burden on the assessment panel: moving from “tell me why what you are doing is lawful and ethical” to “tell me what you are doing, and I’ll decide if it is lawful and ethical”. But the burden comes in two parts, procedure and accountability, and the accountability burden never went away. The potato always stopped with the ones making the decision; shifting responsibility onto applicants to give good information doesn’t change this.

This is one small area of the application process, but across the board there are substantial gains to be made, both in the efficiency of operations , and in the confidence that both applicants and assessment panels can have in the correctness of decisions. The potato of responsibility can be made digestible.

This blog post was written by Professor Felix Ritchie who leads the Data Research, Access and Governance Network (DRAGoN) at UWE Bristol and Amy Tilbrook from the University of Edinburgh.

Welcome to the Data Research, Access & Governance Network (DRAGoN) blog

Anna Jones | 9 November 20216 September 2022

Welcome to the Data Research, Access and Governance Network (DRAGoN) blog where we will share the latest updates and projects we’re involved with.

Led by Professor Felix Ritchie the Management Team also includes Dr Kyle Alves (Business & Management) Elizabeth Green (Economics) Dr Francesco Tava (Philosophy) and Damian Whittard (Economics). Formed in Autumn 2020, DRAGoN recognised that effective data use and governance requires contributions from many different professions: ethicists, statisticians, computer scientists, psychologists, economists, management scientists. Our aim is to create an environment for discourse which can bring differing perspectives together for the wider benefit.

The modern world is increasingly dependent on data. It is central to our lives, directly in our own experience and indirectly through the way organisations use data. Much of the data is personally confidential, at the point of collection or when combined with other data. Often the confidentiality of data is unclear: are street observations by citizen scientists confidential? Photos of one’s family on social media? Facial recognition? Automatic number plate recognition? Data used to train machine learning systems? Is ‘sensitive’ or ‘personal’ the same as ‘confidential’? The confidentiality of data has a substantial effect on the way it is managed, perceived and exploited. This spills over into the management and use of open data, or data which is confidential for other reasons, such as commercial confidentiality: ethics, public perceptions, data security can also be just as important.

Data access, management and governance is a highly applied topic; decisions being made every day which affect our lives, our business, our government, often in ways which are obscure or known only to specialists in that area. We see the application of theory to practice as essential to the ethos of the group.

But we also need to reflect on practice: decisions about data use are often highly political, based on psychological or institutional factors. Working with practitioners helps inform our research with operational insights, as well as allowing us to challenge accepted viewpoints.

We look forward to sharing developments from this research cluster, but in the meantime you can find out more through our bi-weekly seminars by signing up to our mailing list below and following us on Twitter.