Ethical Issues: Predictive Algorithms in Criminal Justice

This paper was written for the course "Governing Emerging Technologies."

Ethical Issues Raised by Predictive Algorithms in Criminal Justice

One of the less savory, but perhaps more intellectually interesting, elements of technology ethics that the creation of advanced algorithmic systems has raised is how to program in advance for an algorithm to make decisions that have negative consequences. This is unavoidable for any algorithm operating in a dangerous or potentially harmful situation, and exposes a very human tendency to be uncomfortable “designing” moral outcomes in advance, rather than letting them play out in the heat of the moment. Take autonomous vehicles, for example. The designer has to pre-program how the vehicle should act in any given moment, including in an accident. It must decide how to prioritize attempting to save certain people over others (whether it’s the passengers versus pedestrians, more people versus less, and so on). For the most part, human drivers in the same situation don’t have this luxury. But if autonomous vehicles do, is there a moral imperative (not to mention a practical necessity) to program these decisions into the algorithm? And why does this make us more uncomfortable than leaving it up to instinct and chance in the moment? Does it go back to the same dilemma in the trolley problem, in which the act of “intervening” itself, even if it saves more lives, feels like actively killing someone rather than failing to prevent their death? As algorithms have gotten more advanced, and more ubiquitous, these ethical questions have come center-stage, and have exposed fundamental questions about who gets to make such decisions for the general public.

Similarly, as more human tasks and decisions are off-loaded into computers and algorithms, the more researchers and designers must contend with the underlying structure of how society operates, and how decision-making on the part of individuals fits into this larger system. To make intelligence “artificial,” you must first understand “natural” intelligence. And in order to properly situate the decision-making of this “artificial” intelligence, you must understand the context in which you are placing it. As such, algorithms used in the criminal justice system are not external to and separate from the situations in which they’re applied, but part of the overall sociotechnical system. They also happen to highlight quite starkly the underlying injustice and inequality on top of which they are placed, the elemental problems that can’t be smoothed over with technological innovation. Any ethical evaluation of predictive algorithms used in the criminal justice system must take into account these conditions, and must be sure not to fetishize or idealize data-driven algorithms as impartial, objective silver bullets.

This paper will examine two major applications of algorithmic decision-making in the criminal justice system: recidivism predictors used for determining sentencing, bail, and parole; and predictive policing tools that link risk of crime to specific “hot spots” or people, in order to prevent anticipated crime. More specifically, it will lay out the ethical implications of both, focusing on the interplay between the algorithms and underlying societal systems. In the case of recidivism prediction, it will illustrate the competing concepts of “fairness” that are inherent to the algorithm’s design. The larger question is whether an algorithm can ever really be “fair” if our society’s institutions and systems are unfair; and further, even if it can’t, does this mean we shouldn’t use them, instead relying on what’s probably even more “unfair” human judgment? The ethics of complex sociotechnical systems requires weighing options that all have their own downsides, often finding the “least bad” or “least unfair” option. It is imperative that algorithms’ veneer of objectivity doesn’t let us ignore underlying injustice, just as any solution that tries to correct historical injustices should not be used as a cosmetic fix, an excuse to ignore the broader underlying issues.

I. Recidivism Prediction

One of the most notable examples of algorithmic decision-making in the criminal justice system involves the COMPAS model developed by the company Northpointe, which makes predictions about the likelihood of an individual committing another crime after they’ve already been charged. COMPAS is used to make decisions regarding bail, sentencing, and parole, rating a person’s likelihood of re-offending based on a collection of data about them. The inputs, though not entirely known because of the proprietary nature, include factors such as criminal history, but never explicitly discriminatory categories like race. However, as a noted report from ProPublica illustrated, there was still discrimination hard-wired into the algorithm. The report found that Northpointe’s algorithm was more likely to falsely rate black defendants as higher risk than it was with white defendants, and more likely to falsely rate white defendants as lower risk. After looking through the risk scores of over 7,000 people arrested in Broward County, Florida, and following up to see how many went on to be charged with new crimes over the next two years, ProPublica found this discrepancy in risk scores between races.

If race was not explicitly taken into account, how could this be? After ProPublica’s revelation, researchers pointed out that the difference in risk score assignment could be attributed to differences in recidivism rates between races. Because blacks on average have a higher recidivism rate than whites, statistically there will also be a higher false positive rate. Thus, two different views of “fairness” emerged in relation to the model. Northpointe’s conception of fairness—which we’ll call predictive parity—gives the same risk score to people based on that individual’s likelihood of re-offending. In other words, at each risk level, the chance of re-offending is the same for both whites and blacks. A white defendant with a risk score of 7 has a 60% chance of re-offending, while a black defendant with a risk score of 7 has a 61% chance of re-offending (the 1% difference being statistically insignificant). However, because blacks have a higher recidivism rate than whites in general, black defendants are more likely to be marked as a higher score, even when they don’t re-offend. The opposing view—which we’ll call equalizing group error rates—would correct the discrepancy in false positives between whites and blacks that ProPublica uncovered, but would violate the principle of equal treatment, assigning equally risky whites a higher score than their black counterparts. As long as there’s a difference in recidivism rates between races, there will always be a discrepancy along this measure. A risk score can either be equally predictive or equally wrong for all races, but not both.

At the end of the day, this is an uncomfortable ethical decision to explicitly build into a program (not to mention the potentially serious consequences of the outcome). Northpointe, the company making the software, decided their way was more ethical—but why did they get to choose which conception of “fairness” was appropriate? While their model is likely less biased than leaving the judgment up to a human law enforcement official, there’s something about hard-wiring decisions like this into algorithms that feels ethically precarious. It is also easy for these kinds of decisions to hide behind algorithms, which lend the veneer of objectivity. The reality is that these algorithms are merely one component placed into a sociotechnical system that already has its own processes and biases—and much like how creating “artificial” intelligence unveils “natural” intelligence, such algorithms reflect the true nature of the system in which it is placed.

The issue of higher recidivism among blacks, and how this input exerts discriminatory pressure on black defendants in the context of the COMPAS algorithm, is part of a self-reinforcing system. Thus, it is important to understand the context in which it is being placed. There are a number of historical, environmental, and systemic factors involved in why blacks have higher recidivism rates than whites—policing practices that target black neighborhoods, unfair drug sentencing laws, and the social consequences of economic marginalization, just to name a few. Basing sentencing decisions on risk of recidivism means hard-wiring these injustices into the algorithm itself, and potentially perpetuating the underlying problems. If blacks are more likely to be re-arrested, due to a combination of factors, and the algorithm is calibrated to punish those more likely to be re-arrested, there is a feedback loop in which blacks are continuously disproportionately subject to the criminal justice system. In the parlance of Bernard Harcourt, risk can be a proxy for race. On the other hand, ignoring an individual’s risk of re-offending in sentencing and parole decisions doesn’t make sense from a public safety standpoint. There is the risk that this could lead to less safe neighborhoods. Would it be “fair” to the communities into which these repeat offenders are released—often majority black neighborhoods—which are then disproportionately victims of crime? Would it be “fair” for someone who committed one minor crime to be treated the same in sentencing decisions as a serial criminal?

In engaging in conversations regarding the ethics of algorithms such as this, and trying to determine which conception of “fairness” is more fair, there’s a slight risk of forgetting the underlying, perhaps more serious, issues at play. Predictive algorithm developers who claim objectivity, believing their models to transcend ethical and moral questions, are naïve about how their technology interacts with existing systems. To them, the problem is not with their models, but with society. Of course, this attitude is unhelpful—their models are part of society. But there’s a kernel of truth in there—any algorithm that operates within an unfair society is going to be unfair to some degree. While we can’t ignore the disparate impact of Northpointe’s algorithm, we really can’t ignore the problem from which its discrimination stems—the higher recidivism rates of blacks. The history of economic exclusion, discriminatory policing methods, housing discrimination, and other forms of oppression fuel this inequality, and are the sources of injustice for blacks both within the context of criminal justice and in general. That said, being part of a wider sociotechnical system, a predictive algorithmic system itself can be an agent that disrupts the cycle of oppression. Because harsh punitive jail-time sentences can exacerbate the social and economic problems that lead to higher recidivism rates in the first place (through disrupted communities, torn-apart families, lower economic activity, and so on) equalizing treatment between groups may change this cycle.

But what do we do when there’s no obvious “fair” option? Given that the alternative to using a predictive algorithm is relying on human judgment, which is probably even more biased and unfair (and harder to correct), perhaps we should stick with algorithms but try to improve them. In the absence of consensus on a truly “fair” option, it might make sense to find the least unfair option. This would require examining the available options and evaluating which has the least overall negative effects. Perhaps Alexandra Chouldechova’s re-working of Northpointe’s model can provide some insight: she was able to rearrange the scores so that the algorithm was wrong equally often about black and white defendants (thus equalizing group error rates), while the predictive accuracy for black defendants increased to 69% and remained unchanged for white defendants. It may also be useful to consider the context of the algorithms’ application, just as former Attorney General Eric Holder favored using risk scores for parole decisions but not sentencing (which may more explicitly perpetuate systemic injustice). And considering the history of police targeting black communities, along with other factors contributing to higher recidivism rates among blacks, it may make sense to alter the algorithm so that criminal history is less of a factor in determining risk. Though of course, as mentioned before, there are public safety issues involved with this option. At the end of the day, it’s up for debate whether these predictive algorithms are sufficiently accurate in the first place. They may be more accurate, as well as less biased, than the alternative, which may be enough to justify their use. But this by no means should halt the conversation about how to improve them.

II. Predictive policing

Another major application of algorithmic decision-making in the criminal justice system is predictive policing, in which an algorithm uses a combination of data from sources such as crime reports and criminal profiles in an attempt to anticipate where and when crime is likely to occur. Short of the psychic capabilities of the “precogs” in Minority Report, predictive policing looks at environmental factors that may make a particular location or person susceptible to crime and attempts to prevent it before it happens. It borrows principles from seismology and network science to identify “hot spot” locations where crime might break out based on prior trends, as well as “at risk” individuals who may commit the crime. Like recidivism prediction algorithms, predictive policing has the potential to reduce human bias, as this aspect of policing is usually based on instincts (which tend to be biased and/or inaccurate). But it could also similarly perpetuate racial prejudice by hiding behind the “objectivity” of science.

Once again, the models are built on statistics that reflect the inherent bias of the criminal justice system as it has existed. For instance, black communities are policed more heavily, certain types of crimes are targeted more, and the social issues stemming from economic deprivation lead to a vicious cycle of crime—all of which get incorporated into an algorithm that perpetuates this status quo. There is particular concern that these models may be trained using the principle of the broken windows theory of crime—namely, that strictly enforcing laws against petty crimes like vandalism and public drinking (those typically associated with poverty) will prevent more serious crimes. This would indirectly target poorer neighborhoods, and often communities of color. The model would anticipate crime in a given area, sending the police there to prevent it, and in the process increase the likelihood that they will uncover other crime (because of higher police presence), which in turn gets fed back into the algorithm that predicts the area as potentially dangerous—repeating the cycle over and over.

There are also concerns about privacy—will these models give rise to and be used to justify surveillance? This would be especially concerning if a person was “kept track of” before they even committed a crime. Because the police using this model would be primed to think they were going to be dealing with a criminal, and thus potentially face violence, they may enter the situation overly-ready to use violence, making it more likely for a situation to be escalated. It’s also unclear whether predictive policing is even effective at reducing crime in the first place. And because the algorithms are typically proprietary, they can’t be evaluated by external parties. This is problematic from an ethical perspective, as we don’t know exactly what factors they’re building the algorithm with, as well as a practical perspective, since the model can only really be improved with transparency.

Despite its questionable current use, there is potential for some principles of predictive policing to be maintained and reformed for more desired outcomes. For instance, the concepts that the practice borrows from network science—namely, thinking about peoples’ behavior as part of networks—can be used to stage not punitive but preventative interventions. For example, if an algorithm predicts that a particular neighborhood might experience crime due to a recent spike in unemployment, it could alert the relevant authorities to deliver needed social services that may prevent future crime (rather than more heavy-handed policing measures). It makes sense to analyze and understand behavior based on environmental factors—this is key to how humans operate. But instead of responding by anticipating the potential crime, it might be more ethical and effective to respond by anticipating the need that leads to the crime. The same idea could be applied to individuals as well—identify “at risk” people, and target them with pre-emptive needed services.

III. Conclusion

In general, a good rule of thumb is to not fetishize data and its potential applications. Data can be a powerful tool that offloads human decisions to a program that is more efficient and incisive, and often times it’s less biased than humans. But it’s imperative we not forget that data, and the algorithms they build, reflect an unfair society, and therefore can never be truly “objective.” Predictive algorithms alone can’t solve social issues, and they must be continuously critically evaluated to ensure they are as fair as possible (through as many criteria for “fair” as possible). The first step towards this goal is moving the development of these models out of the private sector and into the public, where they are not proprietary and can be evaluated more accurately. Keeping this open source brings a level of transparency to the process that’s very needed, and it seems highly likely a non-profit, civil society organization, or dedicated researcher would develop such a tool, motivated not by profit but by a belief in justice. As we discovered with the Northpointe algorithm, sometimes it’s not possible to develop a truly “fair” algorithm, when the underlying conditions aren’t fair. But keeping the process transparent and open is the best way to get the least unfair option, or as close to truly fair as possible.

It’s also possible to use the same principles from these predictive algorithms to stage more restorative or rehabilitative interventions. As previously mentioned, it might be more beneficial to use the data anticipating crime in a given area or committed by a particular person to deliver needed social services. Merely the act of using the model to justify police presence primes the officers to enter the situation with a mind to violence. But re-working the application so that it anticipates which neighborhoods and people need social services or other interventions is a more humane method of crime prevention to experiment with. Additionally, by mitigating some of the underlying systemic issues that disproportionately affect black communities, it may be a way to use the power and promise of big data without reinforcing the injustice that is often expressed in such data. It may even be a first step in breaking free from the self-perpetuating cycle of poverty, crime, violence, and incarceration. There will never be a truly “fair” algorithm if society itself isn’t. But, given the right models and applications, it’s possible to develop a best possible option. And it’s possible to use the data for more restorative purposes that attempt to end the underlying injustice.

Works Consulted

Angwin, Julia. “Bias in Criminal Risk Scores Is Mathematically Inevitable, Researchers Say.” ProPublica, 30 Dec 2016, https://www.propublica.org/article/bias-in-criminal-risk-scores-is-mathematically-inevitable-researchers-say.

Angwin, Julia et al. “Machine Bias: There’s software used across the country to predict future criminals. And it’s biased against blacks.” ProPublica, 23 May 2016, https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.

Chouldechova, Alexandra. “Fair prediction with disparate impact: A study of bias in recidivism prediction instruments.” Oct 2016, unpublished manuscript, available at https://arxiv.org/abs/1610.07524/.

Corbett-Davies, Sam et al. “A computer program used for bail and sentencing decisions was labeled biased against blacks. It’s actually not that clear.” The Washington Post, Oct 2016, https://www.washingtonpost.com/news/monkey-cage/wp/2016/10/17/can-an-algorithm-be-racist-our-analysis-is-more-cautious-than-propublicas/.

Doleac, Jennifer L. & Megan Stevenson. “Are Criminal Justice Risk Assessment Scores Racist?” The Brookings Institution, 22 Aug 2016, https://www.brookings.edu/blog/up-front/2016/08/22/are-criminal-risk-assessment-scores-racist/.

Harcourt, Bernard E. “Risk as a Proxy for Race.” (September 16, 2010). Criminology and Public Policy, Forthcoming, 16 Sep 2010, University of Chicago Law & Economics Olin Working Paper No. 535, University of Chicago Public Law Working Paper No. 323. Available at SSRN: https://ssrn.com/abstract=1677654.

Hyistendahl, Mara. “Can ‘predictive policing’ prevent crime before it happens?” Science, 28 Sep 2016, http://www.sciencemag.org/news/2016/09/can-predictive-policing-prevent-crime-it-happens.

Mayer-Schonberger, Viktor and Kenneth Cukier. “Big Data: A Revolution That Will Transform How We Live, Work, and Think.” Houghton Mifflin Harcourt, Probability and Punishment, pp. 157-163.

Shapiro, Aaron. “Reform predictive policing.” Nature, 25 Jan 2017, http://www.nature.com/news/reform-predictive-policing-1.21338.