‘Five Safes’ or ‘One Plus Four Safes’? Musing on project purpose

Posted on

by Felix Ritchie and Francesco Tava

A recent working paper discusses the ‘Fives Safes’ framework for confidential data governance and management. This splits planning into a number of separate but related topics:

  • safe project: is this an appropriate use of the data? Is there a public benefit, or excessive risk?
  • safe people: who will be using the data? What skills do they have?
  • safe setting: how will the data be accessed? Are there limits on transferring it?
  • safe data: can the detail in the data be reduced without excessively limiting its usefulness?
  • safe outputs: is confidentiality protected in products such as tables of statistics?

This framework has been widely adopted, particularly in government, both as a practical guide (eg  this one ) and as a basis for legislation (eg the UK Digital Economy Act or the South Australia data sharing legislation

As a practical guide, there is one obvious limitation. There is no hierarchy among the ‘safes’, and they are all interrelated; so which should you put most emphasis on?

We use the Five Safes to structure courses in confidential data management. One of the exercises asks the attendees to rank them as ‘what should we be most/least concerned with?’ The point of the exercise is not to come up with a definitive ranking, but to get the attendees to think about how different elements might matter in different circumstances.

This exercise generates much discussion. Over the years, we have had participants putting forward good arguments for each of the Five Safes as being the most important. Traditionally, and in the academic literature, Safe Data is seen as the most important: reduce inherent risk in the data, and all your problems go away. In contrast, in the ‘user centred’ planning we now advocate (eg here], Safe People is key: know who your users are, and design ethical processes, IT systems, training and procedures for them.

When training, this is the line we usually take, because we are training people to use systems which have already been designed. The aim of the training is to help people understand the community they are part of. Our views are therefore coloured by the need to work within existing systems.

Our thinking on this has been challenged by the developments in Australia. The Australian federal government is proposing a cross-government data sharing strategy based on the ‘Australian Data Sharing Principles’ (ADSPs). The ADSPs are based on the Five Safes but designed as a detailed practical guide to Australian government departments looking to share data for analysis. As part of the legislative process, the Australian government has engaged in an extensive consultation since 2018, including public user groups, privacy advocates, IT specialists, the security services, lawyers, academic researchers, health services, the Information Commissioner, and the media.

Most of the concerns about data sharing arising in the consultation centre on the ‘safe project’ aspect. Typical questions that cropped up frequently included:

  • How do we know the data sharing will be legal/appropriate/ethical?
  • Who decides what is in the ‘public interest’?
  • How do you prevent shared data, approved for one purpose, being passed on or re-used for another purpose without approval?
  • What sort of people will we allow to use the data? Should we trust them?
  • What will happen to the data once the sharing is no longer necessary? How is legacy data managed?
  • Do we need to lay down detailed rules, or can we allow for flexible adherence to principles?
  • Where are the checks and balances for all these processes?

These are all questions which need to be addressed at the design stage: define the project scope, users and duration, and then assess whether the likely benefits outweigh costs and reasonable risks. If this can’t be done… why would you take the project any further?

Similarly, in recent correspondence with a consulting firm, it emerged that a key part of their advice to firms on data sharing is about use: the lawfulness of the data sharing is relatively easy to establish – once you have established the uses to which that shared data will be put. Some organisations have argued that there should be an additional ‘safe’ just to highlight the legal obligations.

This is particularly pertinent for data sharing in the public sector, where organisations face continual scrutiny over the appropriate use of public money. A clear statement of purpose and net benefits at the beginning of any project can make a substantial difference to the acceptability of the project. And whilst well-designed and well-run projects tend to be ignored by people not involved, failures in public data sharing (eg Robodebt or care.data) tend to have negative repercussions far beyond the original problems.

This is not the only concern facing data holders in a digital age of multi-source data. Handling confidential data always involves costs and benefits. Traditional approaches that focus on Safe Data identify the data holder as the relevant metric for these costs and benefit. A recent paper shows how this vision is at odds with the most recent developments in the information society that we live in. Consider the use of social media in research: is any of the actions by the author, the distributor or the researcher sufficient in itself to establish the moral authority of an end use? In this modified context, traditional ethical notions such as individual agency and moral responsibility are gradually substituted by a framework of distributed morality, whereby multiagent systems (multiple human interactions, filtered and possibly extended by technology) are responsible for big morally-loaded actions that take place in today’s society (see on this).

In this complex scenario, taking the data holder as the only arbiter of data governance might be counterproductive, insofar as practices that are morally neutral for the data holder (for example, refusing to consider data sharing) could damage the multiagent infrastructure which that data holder is part of (eg limiting incentives to participate). On the other hand, practices that can cause a minor damage to one of the agents (such as reputational risk for the data holder) could lead to major collective advantages, whose attainment would justify that minor damage, and make acceptable on a societal basis.

In order to minimise the risks, an innovative data management approach should look at the web of collective and societal bonds that links together data owners and users. In practice, this means that decision-making regarding confidential data management will not be grounded on the individual agency and responsibility of individual agents, but will rather correspond to a balance of subjective probabilities. On these premises, focusing on the Safe Project makes pre-eminent the notion that data should be made available for research purposes if the expected benefit to society outweighs the potential loss of privacy for the individual. The most challenging question is, of course, how to calculate this benefit, when so many of the costs and benefits are unmeasurable.

And this is the difference between Safe Projects and the others. ‘Safe projects’ addresses the big conceptual questions. Safe people, safe settings and safe outputs are about the systems and procedure to implement those concepts, whilst Safe Data is the residual (select an appropriate level of detail once the context is defined). So rather than Five Safes perhaps there should be One Plus Four Safes…

About the authors

Felix Ritchie is Professor of Applied Economics in the department of Accounting Economics and Finance

Francesco Tava is Senior Lecturer in Philosophy in the Department of Health and Applied Social Sciences

Australia’s bold proposals for government data sharing

Posted on

By Felix Ritchie.

In August I spent a week in Australia working with the new Office of the National Data Commissioner (ONDC). The ONDC, set up at the beginning of July, is barely two months old but has been charged with the objective of getting a whole-of-government approach to data sharing ready for legislation early in 2019.

This is a mammoth undertaking, not least because the approach set out in the ONDC’s Issues Paper proposes a new way of regulating data management. Rather than the traditional approach of trying to specify in legislation exactly what may or may not be allowed, the ONDC is proposing a principles-based approach: this focuses on setting out the objectives of any data sharing and the appropriate mechanisms by which access is governed and regulated.

In this model, the function of legislation is to provide the ground rules for data sharing and management within which operational decisions can be made efficiently. This places the onus on data managers and those wanting to share data to ensure that their solutions are demonstrably ethical, fair, appropriate and sensible. On the other hand, it also frees up planners to respond to changing circumstances: new technologies, new demands, shifts in attitudes, the unexpected…

The broad idea of this is not completely novel. In recent years, the principles-based approach to data management in government has increasingly come to be seen as operational best practice, allowing as it does for flexibility and efficiency in response to local conditions. It has even been brought into some legislation, including the UK’s Digital Economy Act 2017 and the European General Data Protection Regulation. Finally, the monumental Australian Productivity Commission report of 2017  laid out much of the groundwork, by providing an authoritative evidence base and a detailed analysis of core concepts and options.

In pulling these strands together, the ONDC proposals move well beyond current legislation but into territory which is well supported by evidence. Because of the unfamiliarity with some of the concepts, the ONDC has been carrying out an extensive consultation, some of which I was able to observe and participate in.

A key proposal is to develop five ‘Data Sharing Principles’, based on the Five Safes framework (why, who, how, with what detail, with what outcomes) as the overarching structure. The Five Safes is the most widely used model for government data access but has only been used twice before to frame legislation, in the South Australia Public Sector (Data Sharing) Act 2016 and the  UK Digital Economy Act 2017.

The most difficult issues facing the ONDC arise from the ‘why’ domain: what is the public benefit in sharing data and the concomitant risk to an individual’s privacy? How will ‘need-to-know’ for data detail be assessed? What are the mechanisms to prevent unauthorised on-sharing of data? How will shared data be managed over its lifecycle, including disposal? To what uses can shared data be put? Can data be shared for compliance purposes? How can proposals be challenged?

These are all good questions, but they are not new: any ethics or approvals board worth its salt asks similar questions, and would expect good answers before it allows data collection, sharing or analysis to proceed. A good ethics board also knows that this is not a checklist: ethical approval should be a constructive conversation to ensure a rock-solid understanding of what you’re trying to achieve and the risks you’re accepting to do so.

This is the also the crux of the principles-based approach being taken by the ONDC: it is not for the law to specify how things should be done, nor to specify what data sources can be shared. But the law does provide the mechanisms to ensure that any proposals put forward can be assessed against a clear purpose test around when data may and may not be shared and that appropriate safeguards are in place…

Finally, the law will require transparency; this has to be done in sunlight. A public body, using public money and resources for the public benefit, should be able to answer the hard questions in the public arena; otherwise, where is the accountability? The ONDC will require data sharing agreements to be publicly available, so people can see for what purpose (and with what associated protections) their data are being used.

To some, this need to justify activities on a case-by-case basis, rather than having a black-and-white yes/no rule, might seem like an extra burden. The aim of the consultation is to ensure that this isn’t the case. In fact, a transparent, multi-dimensional assessment is any project’s best friend: it provides critical input at the design stage and helps to spot gaps in planning or potential problems, as well as giving opponents a clear opportunity to raise objections.

Of course, even if the legislation is put in place, there is still no guarantee that it will turn out as planned. As I have written many times (for example in 2016), attitudes are what matter. The best legislation or regulation in the world can be derailed by individuals unwilling to accept the process. This is why the consultation process is so important. This is also why the ONDC has been charged with the broader role of changing the Australian public sector culture around data sharing, which tends to be risk-averse. The ONDC also has a role to build and maintain trust with the public through better engagement to hear their concerns.

From my perspective, this is a fascinating time. The ONDC’s proposals are bold but built on a solid foundation of evidence. In theory, they propose a ground-breaking way to offer a holy trinity of flexibility, accountability, and responsibility. If the legislation ultimately reflects the initial proposals, then I suspect many other governments will be beating a path to Australia’s door.

All opinions expressed are those of the author.

Measuring non-compliance with minimum wages

Posted on

By Professor Felix Ritchie

When a minimum wage is set, ensuring that employees do get at least that minimum is a basic requirement of regulators. Compliance with the minimum wage can vary wildly: amongst richer countries, around 1%-3% of wages appear to fall below the minimum but in developing countries non-compliance rates can be well over 50%.

As might be expected, much non-compliance exists in the ‘informal’ economy: family businesses using relatives on an ad hoc basis, cash-only payments for casual work, agricultural labouring, or simply the use of illegal workers. However, there is also non-compliance in the formal economy. This is analysed by regulators using large surveys of employers and employees which collect detailed information on hours and earnings. This analysis allows them to identify broad characteristics and the overall scale of non-compliance in the economy.

In the UK, enforcement of the minimum wage is carried out by HM Revenue and Customs, supported by the Low Pay Commission. With 30 million jobs in the UK, and 99% of them paying at or above the minimum wage, effective enforcement means knowing where to look for infringements (for example, retail and hospitality businesses tend to pay low, but compliant, wages; personal services are more likely to pay low wages below the minimum; small firms are more likely to be non-compliant than large ones, and so on). Ironically, the high rate of compliance in the UK can bring problems, as measurement becomes sensitive to the way it is calculated.

A new paper by researchers at UWE and the University of Southampton looks at how non-compliance with minimum wages can be accurately measured, particularly in high-income countries. It shows how the quantitative measurement of non-compliance can be affected by definitions, data quality, data collection methods, processing and the choice of non-compliance measure.

The paper shows that small variations in these can have disproportionate effects on estimates of the amount of non-compliance. As a case study, it analyses the earnings of UK apprentices to show, for example, that even something as simple as the number of decimal places allowed on a survey form can have a significant effect on the non-compliance rates.

The study also throws light on the wider topic of data quality. Much research is focused on marginal analyses: looking at the relative relationships between different factors. These don’t tend to be obviously sensitive to very small variations in data quality, but that is partly because it is can be harder to identify sensitive values.

In contrast, non-compliance with the minimum wage is a binary outcome: a wage is either compliant or it is not. This makes tiny variations (just above or just below the line) easier to spot, compared to marginal analysis. Whilst this study focuses on compliance with the minimum wage, it highlights how an understanding of all aspects of the data collection process, including operational factors such as limiting the number of significant digits, can help to improve confidence in results.

Ritchie F., Veliziotis M., Drew H., and Whittard D. (2018) “Measuring compliance with minimum wages”. Journal of Economic and Social Measurement, vol. 42, no. 3-4, pp. 249-270. https://content.iospress.com/articles/journal-of-economic-and-social-measurement/jem448

Training Researchers to Work with Confidential Data: A New Approach

Posted on

Prof Felix Ritchie of UWE’s Business School has recently spent time with the Northern Ireland Statistics and Research Agency and makes the following analysis.

I’ve just spent two days at the Northern Ireland Statistics and Research Agency (NISRA), working with them to develop training for researchers who need access to the confidential data held by NISRA for research. This training is jointly being developed by the statistical agencies of the UK (NISRA, the General Register Office for Scotland, and the Office for National Statistics in England and Wales), as well as HMRC, the UK Data Archive and academic partners. The project is being led by ONS as part of its role to accredit researchers under the new Digital Economy Act, with UWE providing key input; other statistical agencies, such as INSEE
in France and the Australian Bureau of Statistics, are being consulted and are trialling
some of the material.

Training researchers in the use of confidential data is common across statistical agencies around the world, particularly when those researchers need access to the most sensitive data only available through Controlled Access Facilities (CAFs). The growth in CAFs in recent years has mostly come from virtual desktops which allow researchers to run unlimited analyses while still operating in an environment controlled by the data holder. There are now six of these in the UK, and many countries in continental Europe, North America and Oceania operate at least one. The existence of CAFs has led to an explosion in social science research as many things that were not previously allowed because it was too risky to send out data (such as use of non-public business data, or detailed personal data) have now become feasible and cost-effective.

All agencies running CAFs provide some training for researchers; around half of these use ‘passive’ training such as handouts or web pages, but the other half require face-to-face training. Much of this training has evolved from a programme developed at ONS in the UK in the 2000s and this training was recommended as an example of ‘best practice’ for face-to-face training by a Eurostat expert group.

However, this style of training is showing its age. Such training typically has two components: firstly how to behave in the CAFs and secondly how to prevent confidential data from mistakenly showing up in research outputs (‘statistical disclosure control’, or SDC). Both are typically taught mechanistically, in the form of dos and don’ts, explanations of laws and penalties and lots of SDC exercises. Overall the aim of the courses is to impart information to the researcher.

The new training is radically different from the old training. It starts from the premise that researchers are both the biggest risk and the biggest advantage to any CAF: the biggest risk because a poorly-trained or malcontented researcher can negate any security mechanism put in place; the biggest advantage because highly-motivated researchers means cheaper system design, better and more robust security and the chance for the data holder to exploit the goodwill of researchers in methodological research, for example.

In this world the main aim of the training is to encourage the researcher to see himself or herself as part of the data community. If this can be established then the rest of the training follows as a consequence. For example, knowledge of the legal environment or SDC is shared not because it keeps you out of jail but because everyone needs to understand this so the community as a whole works. This gives the course quite a different feel to more traditional courses: much of the day is spent in open-ended facilitated discussions exploring concepts of data access.

The training was designed from the ground up in order to take advantage of recent developments in thinking about data access and SDC. This was also done to avoid being restricted by having to ‘fit’ preconceived ideas about what worked or not; material was included on its own merits, not whether “this was what we used to do…”. For example, the previous SDC component had a large number of numerical examples, developed over many years, leading to attendees remarking on afternoons spent “doing Sudoku”. We reviewed every example to identify the minimum set of principles needing to be explored and then wrote a small number of new examples based on this minimum set. On the other hand, the previous training had relatively little to say about the context for checking outputs for confidentiality breaches; this has now been expanded as it fits with the ethos of understanding why things are done.

Of course, this was not all plain sailing. The original structure, trialled in June 2017, had just one presentation before being comprehensively abandoned. Modules have dropped in and out and been moved around. The initial test for the course has been completely rewritten (a topic for a later blog). Various sections have been inserted as ‘options’ to take account of regional variations in operating practices. Throughout this, multiple organisations have been able to feed into the process so that the final product itself has a sense of community ownership.

We are now at the stage of training-the-trainers to enable independent delivery around the UK. This is already generating much feedback for the future development of the course: for example, a need has arisen for ‘crib sheets’ to help in the facilitation of certain exercises. Overall, however, we are confident that we have a well-structured, informative, course that meets the needs of 21st century data training.

Further reading: for more information on the evidential and conceptual basis for the course, see Ritchie F., Green E., Newman J. and Parker T. (2017) “Lessons Learned in Training ‘Safe Users’ of Confidential Data“. UNECE work session on Statistical Data Confidentiality 2017. Eurostat.