More prejudicial than probative?
The past decade has witnessed the irresistible rise of violence risk assessment. Calls for risk assessment reports are more frequent than ever before: from minor cases of domestic breaches of the peace, through all indictable cases of sexual offending, to the ultimate in risk assessments – in terms of time, cost and detail – those prepared in relation to orders of lifelong restriction (Criminal Procedure (Scotland) Act 1995, s 210B).
The increasing pervasiveness of violence risk assessments begs the questions, are they credible; how much weight should the decision-maker place on such assessments; indeed, what challenges should defence agents pose? In this brief paper I want to focus specifically on the use of actuarial risk scales, including the Risk Matrix 2000, Stable and Acute 2007, the LSI-R and the Static-99 – names with which courts are becoming increasingly familiar.
The use of these actuarial procedures is perhaps surprising, given that both Government reports and professional best practice guidelines support the use of different approaches, based on structured professional judgment not on actuarial methods.1-4 However, surprise may be lessened when it is considered that they are quick to use and require relatively little training. They are attractive to organisations under pressure to respond to the burgeoning demand for risk assessments.
The actuarial paradigm is apparently straightforward. A group of offenders, usually prisoners, is assessed, often in terms of characteristics that are easy to measure (age, marital status, history of offending, type of victims etc); they are followed up and new criminal convictions identified from criminal records. Statistical methods are applied to link the assessed characteristics to the observed probability of reconviction. This information about group relationships is used to make a prognostication about a new individual: guidance is given to decision-makers about his likelihood of reoffending.
The use of actuarial tests is now so pervasive that their validity appears to have been accepted; they are not subject to sufficient challenge. A further concern is that poor practice based on actuarial scales will devalue the currency of properly conducted assessments of violence risk using the structured professional judgment approach, as recommended by both the Cosgrove and MacLean reports2;5. As will be seen, it is my opinion that the application of actuarial tests to make decisions in the individual case is more prejudicial than probative.
A disquieting case
My disquiet about actuarial approaches was confirmed by a referral from Glasgow Sheriff Court. A social worker opined that the individual accused of sexual offence was “high risk”; a psychologist opined he was “low risk”. I suspect the sheriff was bemused: he contacted me. The social worker had applied the RM2000 correctly and in line with the manual; he implied that “high risk” equated to a likelihood of sexual reoffending of 26% over five years and 36% over 15 years, figures that would raise concern. How valid were these figures?
There are two stages in calculating the level of risk using RM2000. First, three risk factors are considered: the individual’s age at commencement of risk, his number of court appearances for sexual offences, and the number of appearances for other types of offences. The accused scored zero on the latter two items, but he obtained a positive score merely because he was aged between 18 and 24. On that basis he was assessed as “medium risk” for sexual reoffending.
On the second step “aggravating factors” are considered. These include convictions for a contact offence against a male, convictions for a sex offence against a stranger, any convictions for a non-contact sex offence, and finally, whether the accused is single or has ever lived with an adult partner for at least two years. Two of these aggravating factors applied; in terms of RM2000 this is sufficient to increase the risk factor by one category, i.e. “medium risk” becomes “high risk”.
First, the court heard, and accepted, that while the accused had not met the victim before, she had pursued him by text and phone messages for four or five weeks prior to the offence; nonetheless, she was deemed a stranger victim. She consented to sexual intercourse and as she was only 12 years and 11 months at the time, he committed an offence. Secondly, the accused, a 19-year-old student, had not married or cohabited for two years or more, i.e. he was deemed to have difficulty forming intimate relationships. This seems tenuous: it is not usual for 19-year-old students to have cohabited for two years or more; indeed, the contrary is the case.
This evaluation appeared to me to confirm Menken’s observation “There is a simple solution to every human problem – neat, plausible and wrong.” The conclusion that this accused posed a “high risk” was based on three pieces of information. In my experience people are more complex than that – as are the risks that they pose.
Actuarial approaches: a false analogy
Actuarial methods are compelling because they appear to be scientific, they are based on data, they are based on statistical analyses, and their product is a number. Unfortunately, this appearance of science is very misleading. There are at least three lines of argument challenging the utility of these devices for making prognostications about an accused individual: logical, statistical and empirical.
The (il)logic of actuarial approaches
From the logical perspective the reasoning inherent in the actuarial approach commits the fallacy of division.6 This fallacy rests on drawing a conclusion about an individual member of a group based on the collective properties of that group. For example, it is obviously fallacious to argue that if, in general, intelligent people earn more than less intelligent people, then Jules, with an IQ of 120, will earn more than Jim with an IQ of 100. Equally, it is fallacious to argue that since people who score highly on an actuarial risk scale generally reoffend more than people who do not score highly, Bill in the “high risk” group will reoffend more often – or more quickly – than Brian in the “low risk” group.
A common defence of the actuarial approach is founded upon a related fallacy: “If it is alright for life insurance companies, it should be alright for psychology.” Indeed, a sheriff made such a remark during one of my lectures on risk assessment. The analogue is false. The actuary makes a profit by predicting the proportion of insured lives that will end in a particular time period; they have no interest in predicting the deaths of particular individuals. By way of contrast, the decision maker in court is only interested in the accused in front of them, not the properties of any statistical group from which they may be derived. This has been long recognised; Sherlock Holmes knew it!7
The illusion of certainty
Numerical statements, e.g. there is a 36% likelihood that this individual will reoffend sexually in the next 15 years, are powerful. Numbers stick in the mind. It is difficult for the decision maker to disregard them and alter their evaluation based even on detailed, credible and contradictory information. This is the anchoring bias – a well-established cognitive bias that influences all human judgment. Judges in court are not immune.8;9
This problem is compounded by the tendency to predict rather than forecast, to provide a single value of the likelihood that someone will offend, without any indication of the confidence that should be placed on that single value, such as the range of possible values which that likelihood may take. Is the range narrow or wide? Deterministic predictions create the illusion of certainty in the judge’s mind and may lead to sub-optimal action: a lenient sentence when more control is required, or equally, a disproportionate sentence when such is not required.
It is possible to use statistical methods to quantify the degree of (un)certainty that is associated with any estimate, and this includes predictions. Unfortunately, the manuals for actuarial scales generally do not provide the information necessary to determine uncertainty.
The problem of making predictions for individuals using statistical models is now recognised in other disciplines: it is not merely a function of the complexity of assessing their psychological characteristics. (See Rose13, p 48 in relation to medical risks.) As Stephen Hawking wryly observed, “Thirty years ago I was diagnosed with motor neurone disease, and given two and a half years to live. I have always wondered how they could be so precise about the half.” (It would be interesting to know whether those making decisions concerning the release of Mr al-Megrahi appreciated this uncertainty.)
This view that we cannot predict for individuals has been regarded as controversial in the field of violence risk.12 A thought experiment using a non-psychological example may clarify the point. If I tell you the height of the next man to enter the court, how accurately can you predict his weight? The precision of the measurement of height and weight should be substantially greater than for the measurement of risk factors for violence or violent reoffending; and the correlation between the two is stronger. The prediction is also immediate and not degraded by the passage of time, as is the case with some actuarial scales. This should make prediction easier. We have shown elsewhere that for Scottish men whose height is 1.7m the best estimate is 78 kg; however, the prediction interval (the range within which 95% of men will lie) is between 61 and 95 kg.11 Thus predicting the weight of the next individual into the court based of knowledge of his height is a hit or miss activity. Therefore, how can high precision be expected in predictions about complex and changing risk potential over many years to come?
If we return to the disquieting case, the accused was said to be in the “high risk” group, with an estimated 26% probability of reconviction within five years. While the 95% confidence interval, which is concerned with the average for a group, can be conservatively estimated to be between 19% and 34% (Hart et al10 has a description of a method), to assess the confidence about the probability of reconviction for an individual not in the development sample requires the prediction interval. For this case, the prediction interval was conservatively estimated as lying between 2% and 88%.11 None of the manuals for the actuarial scales provide this information; indeed, many actuarialists do not appear to appreciate the relevance of this consideration.12
In other areas of life it has been long recognised that forecasts should entail an estimate of the degree of certitude that the forecaster holds about their prognostication.14;15 From a scientific and professional perspective it is more honest to communicate the degree of (un)certainty with which the expert holds their opinion. This assists the decision maker to make rational decisions about the management of any risk. Relevant information about uncertainty is not made available for any of the actuarial scales in common use.
Actuarial risk assessments as screening tools
Within Scotland and beyond, actuarial instruments are becoming institutionalised. Under multi-agency public protection arrangements (MAPPA), police officers and social workers, for example, are being trained in the use of RM2000. A growing scepticism amongst certain practitioners may have led to a shift in position: “we only use the actuarial as a screen”. This sounds amiable, tolerant, and evenhanded; unfortunately there is no compelling empirical evidence to support such a use.
Perhaps alarmingly, despite the clear limitations of actuarial approaches, the Risk Management Authority (RMA) argues for the use of RM2000 as a screening tool for the Scottish population of sexual offenders to identify those who require further (and state-of-the-art) risk assessment. “The RMA continues to work with the Scottish Government in supporting and developing an integrated multidisciplinary approach to risk assessment in which the RM2000 plays a useful role as a screening instrument” (RMA, 2007; www.rmascotland.gov.uk/ ViewFile.aspx?id=363).
There are two problems with this position. First, in practice this rarely happens: the social worker and police officer do not have the time – nor probably the training – to provide the systematic risk assessment required if the offender is caught in the screen. The decision maker in court is provided with the results of the actuarial scale without any consideration of certitude or risk formulation.
Secondly, and perhaps more critically, what is the scientific credibility of this position? Has it been demonstrated that these instruments are effective screens? The contrary is the case. Screens are used in medicine in asymptomatic individuals to identify the risk of future disease. It is not generally appreciated that to be effective as a screen, risk factors (or sets of risk factors) must be very strongly associated with the disorder being screened for.16 The best calculation that can be achieved for RM2000 gives a result several orders of magnitude below that which is required for an efficient screen.
To evaluate the effectiveness of a screening tool, it is necessary to compare the relationships between the distributions of the risk factors, e.g. RM2000 scores, for those who reoffend and those who do not. To the best of my knowledge this has not been done. Regrettably a request for access to the data derived from publicly funded research, in order to carry out these and other relevant analyses, has been declined. It is perhaps noteworthy that of the four offenders in Grubin’s 2008 study17 who received life sentences for their new convictions, one was in the “low risk” category; three were “medium risk”; none were “high” or “very high” risk. At the very least, to be effective, a screen should identify all, or nearly all, cases, i.e. it should have a low false negative rate. In particular, it should identify serious cases such as those who receive life sentences.
Challenges to decisions
Actuarial scales have been the subject of consideration in a number of appeal cases. It is perhaps surprising – and somewhat concerning – that the scientific basis of the conclusions based on actuarial scales including the RM2000, Static-99 and LSI-R has not been subject to scrutiny and challenge. The results of these tests are accepted at face value. From a public policy perspective it should be noted that the application of these instruments can, and does, lead to errors in both directions: individuals who are assessed by more comprehensive procedures to be “low risk” may be deemed “high risk”, or vice versa. The public is poorly served by such errors.
A number of Scottish appeal cases illustrate both the influence of, and lack of critical appreciation directed at, these procedures. In HMA v Currie [2008] HCJAC 67 a ground of appeal was that “The learned trial judge erred in failing to obtain a full risk assessment.” In their decision (at [11]) their Lordships concluded: “The Risk Matrix 2000 Assessment Tool is regularly and widely used for the purposes of assessing the risk presented by an offender to the public…. In our view [the trial judge] was entitled to proceed upon the basis of the outcome of the risk assessment carried out using Risk Matrix 2000.” Would their Lordships come to this view if they appreciated the lack of certitude associated with opinions based on the RM2000?
In Robertson v HMA, 17 February 2004, it was accepted by their Lordships that use of RM2000 provided a valid opinion that the convicted person was high risk. The application of another actuarial instrument, Static-99, was part of the evidence used to argue controversially that an individual convicted, amongst other things, of raping a baby girl, was “low risk”: HMA v JT, 24 September 2004. It was used in another case to argue for “high risk”: Jordan v HMA [2008] HCJAC 24.
One exception that I am aware of is the case of Lord Watson: the appeal court accepted that a report I prepared “casts doubt on the validity of the risk assessment”. There were a number of difficulties in the use of an actuarial instrument (LSI-R) in addition to those alluded to above. For example, the procedure was developed on Canadian prisoners with an average age of 26.89, in a sample of general offenders with no reference to fireraising. Lord Watson was not Canadian; he had not been to prison before; he was convicted of fireraising; he was aged 56 when the assessment was carried out (statistically it was very unlikely that there would have been anyone of his age in the development sample).
This case raised a general point: even if the actuarial approach were considered to be appropriate, it is axiomatic that any individual being assessed should be similar to those with whom they are being compared. In statistical language they should be drawn from the same population. Such inappropriate comparisons are common. In recent cases I have seen RM2000 being used with first offenders even though the procedure was developed using data from prisoners (data from the Cosgrove report suggests that fewer than 50% of those convicted of a sexual offence receive custodial sentences); first offenders are likely to be different from recidivists. I have seen the actuarial scales used to assess internet offenders, even though the internet was of limited availability when the development studies were carried out.
Actuarial assessments and expert testimony
Should evidence based on actuarial scales be the basis for expert testimony? Lord Wheatley has recently provided a clear and detailed restatement of the role, responsibility and privileges of the expert witness (Wilson and Murray v HMA [2009] HCJAC 58). In brief, the evidence must contribute to the proper resolution of the dispute and provide relevant information from an area of knowledge or experience that a judge or jury would not generally have access to. Critically, Lord Wheatley noted, “the witness must demonstrate a sufficiently authoritative understanding of the theory and practice of the subject” (at [58]).
As argued above, the scientific basis for actuarial scales, as applied to individuals, may be more illusory than real. In the United States, in relation to scientific evidence, the theories and procedures on which the expert testimony is based should be accepted within the appropriate scientific community (e.g. Frye v United States, 1923), theory and procedures should be testable, have been subjected to peer review, and error rates should be established (Daubert v Merrell Dow Pharmaceuticals, 1993). If criteria such as these were to be applied it is difficult to see how actuarial procedures would be deemed to be admissible given that the uncertainty of individual predictions is large, unknown, or indeed perhaps unknowable.
Given the complexity of the issues discussed above, are the usual witnesses required to provide evidence on risk – criminal justice social workers – in a position, by dint of their training or experience, to provide “a sufficiently authoritative understanding of the theory and practice of the subject”? I suspect not.
In conclusion, I would urge decision makers and others to be cautious in the weight they place on opinions derived from actuarial risk assessments. From a scientific rather than a legal perspective it appears to me that the application of these tests is more prejudicial than probative. As Neils Bohr remarked, “Prediction is difficult, particularly about the future.” I would be interested in the answers to two questions. Are defence agents who do not challenge assessments based on these tests failing their clients? Are organisations that require their employees to use these flawed procedures at corporate risk?
David J Cooke is Professor of Forensic Clinical Psychology at Glasgow Caledonian University and the University of Bergen, and was a member of the MacLean Committee
A slightly fuller version of this paper can be found at www.journalonline.co.uk/extras
References
(1) Department of Health. Best practice in managing risk: Principles and evidence for best practice in the assessment and management of risk to self and others in mental health services. 2007. London, Department of Health.
(2) Lord MacLean. A report of the committee on serious violent and sexual offenders. 2000. Edinburgh, Scottish Executive.
(3) Risk Management Authority: Standards and Guidelines for Risk Assessment. 2006. Paisley, Risk Management Authority.
(4) Royal College of Psychiatry. Rethinking risk to others in mental health services. 2008. London, Royal College of Psychiatry.
(5) Lady Cosgrove. Reducing the Risk: Improving the response to sex offending. The Report of the Expert Panel on Sex Offending. 2001. Edinburgh, Scottish Government.
(6) Rorer, L. “Personality assessment: A conceptual survey”. In: Pervin, L A (ed), Handbook of personality: Theory and research. New York: Guilford; 1990; 693-720.
(7) Doyle, A C. The sign of the four. 1994 ed. Oxford: World’s Classics, 1890: “You can, for example, never foretell what any one man will do, but you can say with precision what an average number will be up to.”
(8) Englich, B, Mussweiler, T. “Sentencing under uncertainty: Anchoring effects in the courtroom”. Journal of Applied Social Psychology 2001; 31:1535-1551.
(9) Englich, B, Soder, K. “Moody experts – How mood and expertise influence judgmental anchoring”. Judgement and Decision Making 2009; 4:41-50.
(10) Hart, S D, Michie, C, Cooke, D J. “The precision of actuarial risk assessment instruments: Evaluating the ‘Margins of Error’ of group versus individual predictions of violence”. British Journal of Psychiatry 2007; 170:60-65.
(11) Cooke, D J, Michie, C. “Limitations of diagnostic precision and predictive utility in the individual case: A challenge for forensic practice”. Law and Human Behavior. In press. See this work for a technical discussion of the distinction between confidence and prediction intervals.
(12) Craig, L, Beech, A R. “Best practice in conducting actuarial risk assessments with adult sexual offenders”. Journal of Sexual Aggression 2009; 15:193-211.
(13) Rose, G. The strategy of preventative medicine. Oxford: Oxford Medical Publications, 1992.
(14) Krzysztofowwicz, R. “The case for probabilistic forecasting in hydrology”. Journal of Hydrology 2001; 249:2-9.
(15) Cooke, W E. “Forecasts and verifications in Western Australia”. Monthly Weather Review 1906; 34:23-24.
(16) Wald, N J, Hackshaw, A K, Frost,
C D. “When can a risk factor be used as a worthwhile screening test?” British Medical Journal 1999; 319:1562-1565.
(17) Grubin, D. “Validation of Risk Matrix 2000 for Use in Scotland.” Report Prepared for the Risk Management Authority. 2008. Paisley, Risk Management Authority.
In this issue
- Forward thinking
- Renewal of transitional guardianships
- End the navel-gazing
- Who speaks for lawyers?
- Reasons to be hopeful
- The full picture
- Hearing and speaking
- Law of unintended consequences
- More prejudicial than probative?
- One giant leap
- If the cap fits
- Half a century of strife
- From the Brussels office
- Law reform update
- Send in the SaaS
- Ask Ash
- Words and sentences
- Two in one
- Enough to turn you to drink
- Uncertain security
- Protections with legs
- Working for the estate
- Home defences
- Splitting from the taxman
- Scottish Solicitors' Discipline Tribunal
- Website review
- Book reviews
- Route to freedom
- Steady as she goes is market forecast