The Dark Matter of Public Policy Data (Part 1) – Data Science for Social Good Fellowship

Nick Mader and Rob Mitchum

Imagine you are asked to compare patient outcomes at area hospitals. In minutes, you can pull the Medicare data for 30-day mortality rates after a heart attack, heart failure or pneumonia, and start crunching numbers.

But say Hospital A is in a low-income neighborhood with high rates of diabetes, cardiovascular disease and other chronic conditions. Hospital B is in an affluent suburb where patients have access to healthy food and walkable streets. Hospital C serves a diverse population, but has a world-famous cardiologist on staff that specializes in caring for complex patients with poor prognoses.

Simply finding the average mortality rates for each hospital – without taking all this context into account – would leave you with the incorrect conclusion that Hospital B is best for patients and Hospital C or A is the worst.

So how can you more accurately compare each hospital’s quality of care, or the skill of their doctors? What can the data tell you — and what does it miss?

The Illusion of Data

Health care organizations and other entities that receive federal or state dollars are increasingly required to demonstrate their effectiveness to continue receiving funds. Both policymakers and the organizations themselves are motivated to scale up what’s working and leave behind what isn’t.

But how can data science create meaningful evaluation of social programs while avoiding the pitfalls of false comparisons and misleading correlations that factor into high-stakes decision-making?

Some of the Data Science for Social Good projects will need to grapple with this question as they evaluate programs designed to help at-risk school children and first-time mothers. Many factors influence a student’s academic performance or a family’s health – not all of which are reflected in the data that the program collects.

For example, as the Chicago Teacher’s Union has argued, a student’s standardized test scores may not include adequate information about a difficult home situation, violent conditions on their route to school, or psychological problems that influence learning.

Because of this missing data, statistical methods can produce misleading answers that sound more rigorous and concrete than they are in reality.

The consequences of this illusion can have severe consequences.

An effective program that targets kids who are at-risk in ways we can’t account for may look ineffective, leaving the program vulnerable to loss of funding or complete shutdown.

How To See Invisible Data

To avoid these mistakes, we’re training Data Science for Social Good fellows to worry about what the data doesn’t show. They are asked to consider how they can combine the insight of data with on-the-ground expertise in their projects.

This week and next, the fellows are meeting with their partner organizations – either at our downtown space, or, ideally, out in the field – so partners can communicate their needs, priorities and concerns. These interactions are meant to help the fellows understand the data the partners are sharing, but also the significant information that’s missing.

Drawing from the partners’ expertise in their fields will make our program evaluations more accurate, useful, and considerate of the high stakes involved for these programs and the individuals they serve. By recognizing the gap between the numbers and the real world, the fellows can build the appropriate caveats into their conclusions.

In addition to understanding the qualitative context of the organization, there are also quantitative methods for detecting these “dark matter” factors. Economists use a number of statistical techniques, including instrumental variables and control function methods, to filter out the hidden influence of unobserved factors, and reach more accurate conclusions about an intervention’s effectiveness.

The goal is to improve care in the city by learning what works and spreading those ideas, rather than just rewarding the good and abandoning the bad. Finding the most accurate path to that goal is as much about understanding the data that’s missing as it is about analyzing the data that’s there.

In the next two parts of this series, we’ll go more in-depth into the statistical methods used to address the dark matter problem, the same techniques we’ll be using and improving on to evaluate nonprofits on several of our projects. And for the qualitative side, we’ll follow the fellows as they visit one of our project partners and chat with workers on the front line of their organization.

Nick Mader is an economist at Chapin Hall at the University of Chicago and a Data Science for Social Good mentor. Rob Mitchum is a science writer for the Computation Institute.