Australia’s bold proposals for government data sharing

Posted on

By Felix Ritchie.

In August I spent a week in Australia working with the new Office of the National Data Commissioner (ONDC). The ONDC, set up at the beginning of July, is barely two months old but has been charged with the objective of getting a whole-of-government approach to data sharing ready for legislation early in 2019.

This is a mammoth undertaking, not least because the approach set out in the ONDC’s Issues Paper proposes a new way of regulating data management. Rather than the traditional approach of trying to specify in legislation exactly what may or may not be allowed, the ONDC is proposing a principles-based approach: this focuses on setting out the objectives of any data sharing and the appropriate mechanisms by which access is governed and regulated.

In this model, the function of legislation is to provide the ground rules for data sharing and management within which operational decisions can be made efficiently. This places the onus on data managers and those wanting to share data to ensure that their solutions are demonstrably ethical, fair, appropriate and sensible. On the other hand, it also frees up planners to respond to changing circumstances: new technologies, new demands, shifts in attitudes, the unexpected…

The broad idea of this is not completely novel. In recent years, the principles-based approach to data management in government has increasingly come to be seen as operational best practice, allowing as it does for flexibility and efficiency in response to local conditions. It has even been brought into some legislation, including the UK’s Digital Economy Act 2017 and the European General Data Protection Regulation. Finally, the monumental Australian Productivity Commission report of 2017  laid out much of the groundwork, by providing an authoritative evidence base and a detailed analysis of core concepts and options.

In pulling these strands together, the ONDC proposals move well beyond current legislation but into territory which is well supported by evidence. Because of the unfamiliarity with some of the concepts, the ONDC has been carrying out an extensive consultation, some of which I was able to observe and participate in.

A key proposal is to develop five ‘Data Sharing Principles’, based on the Five Safes framework (why, who, how, with what detail, with what outcomes) as the overarching structure. The Five Safes is the most widely used model for government data access but has only been used twice before to frame legislation, in the South Australia Public Sector (Data Sharing) Act 2016 and the  UK Digital Economy Act 2017.

The most difficult issues facing the ONDC arise from the ‘why’ domain: what is the public benefit in sharing data and the concomitant risk to an individual’s privacy? How will ‘need-to-know’ for data detail be assessed? What are the mechanisms to prevent unauthorised on-sharing of data? How will shared data be managed over its lifecycle, including disposal? To what uses can shared data be put? Can data be shared for compliance purposes? How can proposals be challenged?

These are all good questions, but they are not new: any ethics or approvals board worth its salt asks similar questions, and would expect good answers before it allows data collection, sharing or analysis to proceed. A good ethics board also knows that this is not a checklist: ethical approval should be a constructive conversation to ensure a rock-solid understanding of what you’re trying to achieve and the risks you’re accepting to do so.

This is the also the crux of the principles-based approach being taken by the ONDC: it is not for the law to specify how things should be done, nor to specify what data sources can be shared. But the law does provide the mechanisms to ensure that any proposals put forward can be assessed against a clear purpose test around when data may and may not be shared and that appropriate safeguards are in place…

Finally, the law will require transparency; this has to be done in sunlight. A public body, using public money and resources for the public benefit, should be able to answer the hard questions in the public arena; otherwise, where is the accountability? The ONDC will require data sharing agreements to be publicly available, so people can see for what purpose (and with what associated protections) their data are being used.

To some, this need to justify activities on a case-by-case basis, rather than having a black-and-white yes/no rule, might seem like an extra burden. The aim of the consultation is to ensure that this isn’t the case. In fact, a transparent, multi-dimensional assessment is any project’s best friend: it provides critical input at the design stage and helps to spot gaps in planning or potential problems, as well as giving opponents a clear opportunity to raise objections.

Of course, even if the legislation is put in place, there is still no guarantee that it will turn out as planned. As I have written many times (for example in 2016), attitudes are what matter. The best legislation or regulation in the world can be derailed by individuals unwilling to accept the process. This is why the consultation process is so important. This is also why the ONDC has been charged with the broader role of changing the Australian public sector culture around data sharing, which tends to be risk-averse. The ONDC also has a role to build and maintain trust with the public through better engagement to hear their concerns.

From my perspective, this is a fascinating time. The ONDC’s proposals are bold but built on a solid foundation of evidence. In theory, they propose a ground-breaking way to offer a holy trinity of flexibility, accountability, and responsibility. If the legislation ultimately reflects the initial proposals, then I suspect many other governments will be beating a path to Australia’s door.

All opinions expressed are those of the author.

Measuring non-compliance with minimum wages

Posted on

By Professor Felix Ritchie

When a minimum wage is set, ensuring that employees do get at least that minimum is a basic requirement of regulators. Compliance with the minimum wage can vary wildly: amongst richer countries, around 1%-3% of wages appear to fall below the minimum but in developing countries non-compliance rates can be well over 50%.

As might be expected, much non-compliance exists in the ‘informal’ economy: family businesses using relatives on an ad hoc basis, cash-only payments for casual work, agricultural labouring, or simply the use of illegal workers. However, there is also non-compliance in the formal economy. This is analysed by regulators using large surveys of employers and employees which collect detailed information on hours and earnings. This analysis allows them to identify broad characteristics and the overall scale of non-compliance in the economy.

In the UK, enforcement of the minimum wage is carried out by HM Revenue and Customs, supported by the Low Pay Commission. With 30 million jobs in the UK, and 99% of them paying at or above the minimum wage, effective enforcement means knowing where to look for infringements (for example, retail and hospitality businesses tend to pay low, but compliant, wages; personal services are more likely to pay low wages below the minimum; small firms are more likely to be non-compliant than large ones, and so on). Ironically, the high rate of compliance in the UK can bring problems, as measurement becomes sensitive to the way it is calculated.

A new paper by researchers at UWE and the University of Southampton looks at how non-compliance with minimum wages can be accurately measured, particularly in high-income countries. It shows how the quantitative measurement of non-compliance can be affected by definitions, data quality, data collection methods, processing and the choice of non-compliance measure.

The paper shows that small variations in these can have disproportionate effects on estimates of the amount of non-compliance. As a case study, it analyses the earnings of UK apprentices to show, for example, that even something as simple as the number of decimal places allowed on a survey form can have a significant effect on the non-compliance rates.

The study also throws light on the wider topic of data quality. Much research is focused on marginal analyses: looking at the relative relationships between different factors. These don’t tend to be obviously sensitive to very small variations in data quality, but that is partly because it is can be harder to identify sensitive values.

In contrast, non-compliance with the minimum wage is a binary outcome: a wage is either compliant or it is not. This makes tiny variations (just above or just below the line) easier to spot, compared to marginal analysis. Whilst this study focuses on compliance with the minimum wage, it highlights how an understanding of all aspects of the data collection process, including operational factors such as limiting the number of significant digits, can help to improve confidence in results.

Ritchie F., Veliziotis M., Drew H., and Whittard D. (2018) “Measuring compliance with minimum wages”. Journal of Economic and Social Measurement, vol. 42, no. 3-4, pp. 249-270. https://content.iospress.com/articles/journal-of-economic-and-social-measurement/jem448

Training Researchers to Work with Confidential Data: A New Approach

Posted on

Prof Felix Ritchie of UWE’s Business School has recently spent time with the Northern Ireland Statistics and Research Agency and makes the following analysis.

I’ve just spent two days at the Northern Ireland Statistics and Research Agency (NISRA), working with them to develop training for researchers who need access to the confidential data held by NISRA for research. This training is jointly being developed by the statistical agencies of the UK (NISRA, the General Register Office for Scotland, and the Office for National Statistics in England and Wales), as well as HMRC, the UK Data Archive and academic partners. The project is being led by ONS as part of its role to accredit researchers under the new Digital Economy Act, with UWE providing key input; other statistical agencies, such as INSEE
in France and the Australian Bureau of Statistics, are being consulted and are trialling
some of the material.

Training researchers in the use of confidential data is common across statistical agencies around the world, particularly when those researchers need access to the most sensitive data only available through Controlled Access Facilities (CAFs). The growth in CAFs in recent years has mostly come from virtual desktops which allow researchers to run unlimited analyses while still operating in an environment controlled by the data holder. There are now six of these in the UK, and many countries in continental Europe, North America and Oceania operate at least one. The existence of CAFs has led to an explosion in social science research as many things that were not previously allowed because it was too risky to send out data (such as use of non-public business data, or detailed personal data) have now become feasible and cost-effective.

All agencies running CAFs provide some training for researchers; around half of these use ‘passive’ training such as handouts or web pages, but the other half require face-to-face training. Much of this training has evolved from a programme developed at ONS in the UK in the 2000s and this training was recommended as an example of ‘best practice’ for face-to-face training by a Eurostat expert group.

However, this style of training is showing its age. Such training typically has two components: firstly how to behave in the CAFs and secondly how to prevent confidential data from mistakenly showing up in research outputs (‘statistical disclosure control’, or SDC). Both are typically taught mechanistically, in the form of dos and don’ts, explanations of laws and penalties and lots of SDC exercises. Overall the aim of the courses is to impart information to the researcher.

The new training is radically different from the old training. It starts from the premise that researchers are both the biggest risk and the biggest advantage to any CAF: the biggest risk because a poorly-trained or malcontented researcher can negate any security mechanism put in place; the biggest advantage because highly-motivated researchers means cheaper system design, better and more robust security and the chance for the data holder to exploit the goodwill of researchers in methodological research, for example.

In this world the main aim of the training is to encourage the researcher to see himself or herself as part of the data community. If this can be established then the rest of the training follows as a consequence. For example, knowledge of the legal environment or SDC is shared not because it keeps you out of jail but because everyone needs to understand this so the community as a whole works. This gives the course quite a different feel to more traditional courses: much of the day is spent in open-ended facilitated discussions exploring concepts of data access.

The training was designed from the ground up in order to take advantage of recent developments in thinking about data access and SDC. This was also done to avoid being restricted by having to ‘fit’ preconceived ideas about what worked or not; material was included on its own merits, not whether “this was what we used to do…”. For example, the previous SDC component had a large number of numerical examples, developed over many years, leading to attendees remarking on afternoons spent “doing Sudoku”. We reviewed every example to identify the minimum set of principles needing to be explored and then wrote a small number of new examples based on this minimum set. On the other hand, the previous training had relatively little to say about the context for checking outputs for confidentiality breaches; this has now been expanded as it fits with the ethos of understanding why things are done.

Of course, this was not all plain sailing. The original structure, trialled in June 2017, had just one presentation before being comprehensively abandoned. Modules have dropped in and out and been moved around. The initial test for the course has been completely rewritten (a topic for a later blog). Various sections have been inserted as ‘options’ to take account of regional variations in operating practices. Throughout this, multiple organisations have been able to feed into the process so that the final product itself has a sense of community ownership.

We are now at the stage of training-the-trainers to enable independent delivery around the UK. This is already generating much feedback for the future development of the course: for example, a need has arisen for ‘crib sheets’ to help in the facilitation of certain exercises. Overall, however, we are confident that we have a well-structured, informative, course that meets the needs of 21st century data training.

Further reading: for more information on the evidential and conceptual basis for the course, see Ritchie F., Green E., Newman J. and Parker T. (2017) “Lessons Learned in Training ‘Safe Users’ of Confidential Data“. UNECE work session on Statistical Data Confidentiality 2017. Eurostat.