Algorithm ‘backlash’ may deter future use says regulator
A statistics watchdog has warned that fear of a ‘backlash’ from the public may discourage public bodies from using statistical models to support decisions in future. The warning comes after the use of algorithms was perceived to be largely responsible for the exams grading fiasco in 2020. The Office for Statistics Regulation (OSR), which is the regulatory arm of the UK Statistics Authority, has reviewed the approach that was taken by governments and exam regulators across the four nations of the UK to developing models for awarding grades in key examinations last year.
When the grades determined by those models were released in August 2020, they met with widespread public dissatisfaction, and ultimately the grades in all four nations were re-issued based on the teacher assessed grades instead. The OSR’s fears of a reluctance to now rely on statistical models may possibly already be seen in the approach taken by the DfE this year, which is in marked contrast to that in 2020, with an emphasis on teacher assessment to determine grades. Education secretary Gavin Williamson commented when announcing the new approach that ‘This year, we will put our trust in teachers rather than algorithms’.
The OSR report does identify weaknesses in the approaches taken in 2020 including that, despite regulators drawing on relevant expertise and conducting ‘extensive analysis’, there was ‘limited professional statistical consensus’ on the methods which were adopted. They should have been subjected to scrutiny by a wider range of experts, the report suggests. The OSR also say that regulators focused too much on the aggregate results produced by the model, paying insufficient attention to the impact on individuals. Better communication by governments about the nature of the models being used and their limitations might have bolstered public confidence and understanding, they say. However the report acknowledges that given the
unprecedented situation, the time constraints and the lack of established best practice to draw upon, it would have been ‘very difficult to deliver exam grades in a way that commanded public confidence’ in 2020.
The authors also note that the approaches taken by the exam regulators in all four nations had ‘many strengths’, and in designing their models they were responding to directions and priorities set by their respective governments. For example, regulators were asked to ensure attainment gaps between different groups of students did not widen, and met this aim. Regulators, the OSR concludes, ‘worked with integrity to develop the best method in the time available’. Despite these strengths the results delivered by the models were badly received by the public, and the OSR conclude that achieving public confidence is not just about delivering the key technical aspects of a statistical model, or having a good communications strategy, but rather about embedding considerations of public confidence throughout the process.
Other recommendations made by the report include that a comprehensive directory of guidance be created to aid government bodies which are deploying statistical models, and that guidance should also be drawn up to help public bodies test the public acceptability of their models. It also suggests policy makers that commission statistical models need a greater understanding that they are more than just automated processes, and should also know the strengths and limitations of any proposed model. Commenting on the report, Ed Humpherson, director general for regulation at the OSR, said ‘Public debate about algorithms veers between blind optimism in their potential and complete rejection. Both extremes have been very evident, at different times, in the discussion of the use of models to award exam grades. Our report offers a more balanced perspective: algorithms and broader model-driven decisions can operate in the public interest. But only when there is a rounded approach that goes beyond the technical coding – and considers the public impact of the models and is open about their limitations’.
Full report: https://tinyurl.com/z7m7m8ft