by Felix Ritchie and Francesco Tava
A recent working paper discusses the ‘Fives Safes’ framework for confidential data governance and management. This splits planning into a number of separate but related topics:
- safe project: is this an appropriate use of the data? Is there a public benefit, or excessive risk?
- safe people: who will be using the data? What skills do they have?
- safe setting: how will the data be accessed? Are there limits on transferring it?
- safe data: can the detail in the data be reduced without excessively limiting its usefulness?
- safe outputs: is confidentiality protected in products such as tables of statistics?
This framework has been widely adopted, particularly in government, both as a practical guide (eg this one ) and as a basis for legislation (eg the UK Digital Economy Act or the South Australia data sharing legislation
As a practical guide, there is one obvious limitation. There is no hierarchy among the ‘safes’, and they are all interrelated; so which should you put most emphasis on?
We use the Five Safes to structure courses in confidential data management. One of the exercises asks the attendees to rank them as ‘what should we be most/least concerned with?’ The point of the exercise is not to come up with a definitive ranking, but to get the attendees to think about how different elements might matter in different circumstances.
This exercise generates much discussion. Over the years, we have had participants putting forward good arguments for each of the Five Safes as being the most important. Traditionally, and in the academic literature, Safe Data is seen as the most important: reduce inherent risk in the data, and all your problems go away. In contrast, in the ‘user centred’ planning we now advocate (eg here], Safe People is key: know who your users are, and design ethical processes, IT systems, training and procedures for them.
When training, this is the line we usually take, because we are training people to use systems which have already been designed. The aim of the training is to help people understand the community they are part of. Our views are therefore coloured by the need to work within existing systems.
Our thinking on this has been challenged by the developments in Australia. The Australian federal government is proposing a cross-government data sharing strategy based on the ‘Australian Data Sharing Principles’ (ADSPs). The ADSPs are based on the Five Safes but designed as a detailed practical guide to Australian government departments looking to share data for analysis. As part of the legislative process, the Australian government has engaged in an extensive consultation since 2018, including public user groups, privacy advocates, IT specialists, the security services, lawyers, academic researchers, health services, the Information Commissioner, and the media.
Most of the concerns about data sharing arising in the consultation centre on the ‘safe project’ aspect. Typical questions that cropped up frequently included:
- How do we know the data sharing will be legal/appropriate/ethical?
- Who decides what is in the ‘public interest’?
- How do you prevent shared data, approved for one purpose, being passed on or re-used for another purpose without approval?
- What sort of people will we allow to use the data? Should we trust them?
- What will happen to the data once the sharing is no longer necessary? How is legacy data managed?
- Do we need to lay down detailed rules, or can we allow for flexible adherence to principles?
- Where are the checks and balances for all these processes?
These are all questions which need to be addressed at the design stage: define the project scope, users and duration, and then assess whether the likely benefits outweigh costs and reasonable risks. If this can’t be done… why would you take the project any further?
Similarly, in recent correspondence with a consulting firm, it emerged that a key part of their advice to firms on data sharing is about use: the lawfulness of the data sharing is relatively easy to establish – once you have established the uses to which that shared data will be put. Some organisations have argued that there should be an additional ‘safe’ just to highlight the legal obligations.
This is particularly pertinent for data sharing in the public sector, where organisations face continual scrutiny over the appropriate use of public money. A clear statement of purpose and net benefits at the beginning of any project can make a substantial difference to the acceptability of the project. And whilst well-designed and well-run projects tend to be ignored by people not involved, failures in public data sharing (eg Robodebt or care.data) tend to have negative repercussions far beyond the original problems.
This is not the only concern facing data holders in a digital age of multi-source data. Handling confidential data always involves costs and benefits. Traditional approaches that focus on Safe Data identify the data holder as the relevant metric for these costs and benefit. A recent paper shows how this vision is at odds with the most recent developments in the information society that we live in. Consider the use of social media in research: is any of the actions by the author, the distributor or the researcher sufficient in itself to establish the moral authority of an end use? In this modified context, traditional ethical notions such as individual agency and moral responsibility are gradually substituted by a framework of distributed morality, whereby multiagent systems (multiple human interactions, filtered and possibly extended by technology) are responsible for big morally-loaded actions that take place in today’s society (see on this).
In this complex scenario, taking the data holder as the only arbiter of data governance might be counterproductive, insofar as practices that are morally neutral for the data holder (for example, refusing to consider data sharing) could damage the multiagent infrastructure which that data holder is part of (eg limiting incentives to participate). On the other hand, practices that can cause a minor damage to one of the agents (such as reputational risk for the data holder) could lead to major collective advantages, whose attainment would justify that minor damage, and make acceptable on a societal basis.
In order to minimise the risks, an innovative data management approach should look at the web of collective and societal bonds that links together data owners and users. In practice, this means that decision-making regarding confidential data management will not be grounded on the individual agency and responsibility of individual agents, but will rather correspond to a balance of subjective probabilities. On these premises, focusing on the Safe Project makes pre-eminent the notion that data should be made available for research purposes if the expected benefit to society outweighs the potential loss of privacy for the individual. The most challenging question is, of course, how to calculate this benefit, when so many of the costs and benefits are unmeasurable.
And this is the difference between Safe Projects and the others. ‘Safe projects’ addresses the big conceptual questions. Safe people, safe settings and safe outputs are about the systems and procedure to implement those concepts, whilst Safe Data is the residual (select an appropriate level of detail once the context is defined). So rather than Five Safes perhaps there should be One Plus Four Safes…
About the authors
Felix Ritchie is Professor of Applied Economics in the department of Accounting Economics and Finance
Francesco Tava is Senior Lecturer in Philosophy in the Department of Health and Applied Social Sciences