By Rayid Ghani

We keep hearing and saying that in order to implement and correctly use machine learning and predictive models , they must be transparent and interpretable. That makes sense. You don’t want a black box model making important decisions — although one could argue that the guts and intuitions of many human beings are often as opaque and worse in performance than a black box. Domain experts and policymakers need to know what the predictive models are doing, and will do in the future, in order to trust them and deploy them.

Transparency-Gameability Tradeoff?

But then a new wrinkle comes along. For certain problems, it turns out that people don’t want the system to be too transparent because that allows the predictions and outcomes to be gamed. This has traditionally been true in fraud detection systems, but let’s take two examples we’ve worked on recently that ran into this issue:

  1. Building predictive models for on-time high school graduation
  2. Building early intervention systems to prevent adverse police interactions

Building predictive models for on-time high school graduation

In this example, it’s obvious why you want the predictive models to be transparent and interpretable. The model assigns a score to each student according to their probability of dropping out or not graduating on time. This risk score allows high schools to provide additional targeted support and interventions efficiently and increase graduation rates. When we worked with several school districts and developed predictive models that were both highly accurate and interpretable, one surprising comment we received was that “we don’t want to expose the model to the teachers because then they might game the system to reduce the risk scores.”

The worry was that teachers may feel (or will be) evaluated on the percentage of students in their class who are above a certain risk threshold. If the model was transparent — for example, heavily reliant on math GPA — the teacher could inflate math grades and reduce the intermediate risk scores of their students. We say intermediate, because ultimately, the inflated GPA student will probably still not graduate on time, but by that time it may be too late to assign the blame on the teacher they had in 9th grade. You could argue that the real problem here is not the interpretability of the model, but instead the incentive structure and the lack of good, affordable, and rational teacher evaluation models. However, the school district’s concern of the model being “gameable” still existed, and hindered the likelihood of it being implemented effectively.

Building early intervention systems to prevent adverse police interactions

This example is perhaps more clear. Existing early intervention systems used by many police departments to flag officers who may go on to have adverse interactions with the public (such as unjustified uses of force, violence, or weapons) are threshold-based. The departments select a handful of indicators, such as number of previous uses of force, or number of citizen complaints, and select a threshold as well as a time window that results in a warning flag. For example, one rule could be that if an officer has had greater than 3 uses of force in a 90 day window, an “early warning flag” is raised and the officer must meet with a supervisor.

This type of system has a lot of problems, including ineffectiveness at providing accurate and timely warnings (see our work on improving these systems by making them data-driven and predictive). Here we want to focus on a specific shortcoming, as highlighted by several police departments — that too much transparency allows officers to easily game the system. Some of the police departments we talked to complained about these systems being very easy to understand and interpret, but that also makes them easy to game. An officer who has had two uses of force in the past 80 days may choose to be a bit more careful over the next 10 days, until the count rolls over to zero again. While this short-term behavioral effect is beneficial, over the long term, it defeats the whole purpose of having these systems in the first place.

So what can we do?

Make our systems opaque? No, we don’t think so. We do want explanations and interpretability in our models. There are far too many dangers and risks in building machine learning systems that are not understandable. This is especially critical when dealing with public policy since important aspects of people’s lives are being affected.
Can we make machine learning systems that are transparent and interpretable but not gameable? We don’t know, but we think it’s worth exploring further. We should at least explore whether it’s a real tradeoff or if we can get all the interpretability, accuracy, and other performance measures we want, while still protecting against gaming.
Here are some initial thoughts:

  • Gameability could be fine as long as gaming the system means doing the right thing and increasing behaviors that reduce the risk of negative outcomes. If gaming the system means inflating everyone’s grades, that’s bad. If it means reducing the number of unjustified uses of force, that’s a good thing.
  • One way to make that happen is by creating features (variables) that are interpretable but difficult to game, or even better, designing them such that “gaming” would mean doing the right thing. Instead of a feature for “math GPA in the past semester,” what if we only use the GPA percentile? This is desirable anyway for most machine learning models, but the goal here would be specifically to make the model more difficult to game. What about features that are deltas from a previous time period (slope of a raw count) such as “increase in GPA from previous semester,” and what if we normalized that as well so its slope compared to everyone else’s slope? We can keep getting more and more complex, making it difficult for people to game these features without going deeper into the model and understanding it better. Of course, some of these features would still be gameable, and some of them would also make the model more opaque.
  • As we make our features more and more complex, we’ll need to translate the predictions into something end users can still understand. Can we make the model complicated but focus on making the translations more easily interpretable, instead of transparent? Can we take individual predictions and use case/instance-based explanations? Those types of explanations are used in many domains (law and medicine for example) to explain predictions, and at the same time are difficult to use for gaming the model.


These are just some early thoughts. The goal of this blog post is not to give solutions but to highlight this potential tradeoff and ask for other people’s thoughts on this topic. So what do you think? Is this a real issue? Are there good ways to deal with it? Have people done work on this topic?