Using Randomised Control Trials to Evaluate Public Policy

[Originally published on the Australian Government Public Sector Innovation Network under a Creative Commons 3.0 BY AU licence]

How can we in the public service best assess whether a specific intervention has worked – or has worked better than what is already available and is also cost-effective?

On Thursday 31 January, the Department of Industry, Innovation, Science, Research and Tertiary Education and the Department of Education and Workplace Relations hosted a joint workshop looking at the question of how randomised control trials can be used to evaluate public policy. We were joined at the workshop by two expert facilitators

What is a randomised control trial (RCT)? An example might be where a new program is being introduced and there is a pool of possible candidates that could be covered by this new initiative. A RCT might divide the possible candidates into two groups – those that will have the new program apply to them, and another group that will continue with the status quo. The progress of the two groups will then be monitored, and the effectiveness of the program assessed against those results.

In this way a RCT can address an issue common to many evaluations. A RCT not only gives you evidence about what happens with the intervention, but also evidence of what happens without the intervention. It also helps overcome biases (intentional and unintentional) that can impact on other forms of evaluation – such as the selection of comparison groups. To some, a RCT can be seen as the highest or best form of evidence.

The workshop discussion revealed that there were a lot of possible ways this process can be applied to take into account many different scenarios and possible factors. Our facilitators presented on some examples from their own experience where RCTs provided valuable evidence (and sometimes unexpected results) about the value of an intervention.

Some of the specific issues that were covered in the discussion included:

    • Sometimes it is possible to do a quasi-experiment where you start with an existing program and then find a control group (people who were not covered by the introduction of the program because of an unrelated factor)
    • A key advantage of randomisation of participants is that it allows you to control for both observable and non-observable characteristics
    • It also allows for the testing of a sub-component of the program. A standard, after-the-fact, evaluation usually only allows you to consider the effect of the whole program, not specific elements in isolation
    • RCTs are easy to do, if you know what you are doing. It is valuable to have a pilot phase for the RCT and then scale up, as it allows you to identify and then address potential issues that might arise
    • Therefore the design of an RCT requires a fair degree of investment and rigour. It should be considered early on in the design of not just the evaluation, but of the intervention/initiative to be evaluated
    • An RCT is not always the most appropriate form of evaluation – for instance it may not be cost-effective or there may be enough pre-existing knowledge about what does or does not work
    • RCTs can help provide more precise and relevant answers to the questions underpinning the specific intervention – so not just whether something works, but also the factors that lead to something working, which can then allow you to ask why that might be so
    • While there can be important ethical considerations, it should be remembered that the point of the exercise is to find out whether something does work. There may be just as many ethical risks in including someone in the test group as there is in leaving them out
    • RCTs can be expensive – but so can introducing a program and then keeping it going without really knowing whether it works as hoped
    • There can be some difficult methodological questions – in some cases the program may not be the major intervention so it can be difficult to assess whether it is the thing making the difference, or there may be difficulty in capturing evidence about the performance of the non-participants in the control group. For these and other reasons, using an RCT for a specific intervention should be considered on a case by case basis (though that is good advice with any form of evaluation).

It was a very interesting discussion and I found it valuable in getting a better understanding of how RCTs might apply in the public sector, where any one intervention can be happening in a mix of a lot of different factors.

One of the questions I have remaining after the workshop, is how RCTs might best fit with innovation and design. Innovation and design require a lot of iteration and evolution (a specific initiative may start out being less effective than what is already in place) – so does that mean it would be best to only apply a RCT once the innovation is more developed? But then that might complicate the options for a control group. And what if the innovation is a system-level intervention – e.g. you either have the intervention or not, it is not possible to divide the system into two parts or to quarantine the intervention from other parallel systems.

Overall though, it is easy to see that there are a number of situations where RCTs might be a better form of evaluation than has previously been used.

If you’d like to find out more about RCTs, the papers from the workshop included:

And there were also some case study papers:

Also provided are copies of the write-up of the workshop and of the presentations:

  • Workshop presentations [1. Please note that the content of these presentations is not covered by Commonwealth Copyright or Creative Commons]