By Rob Mitchum

Around the world, cities are showing signs of old age. For urban areas, one of the first signs of advancing years comes in its circulatory system — the water infrastructure — as pipes laid underground over a century ago start to break down. Beyond the direct cost of repairing water main breaks and replacing the broken components, these incidents cause major headaches for nearby residences, businesses, and traffic. But aside from the very expensive project of replacing hundreds or thousands of miles of pipes, there is little cities can do but wait for the next main to burst.

Data Science for Social Good 2016 partnered with the City of Syracuse on this problem, testing whether data from a variety of sources — including, in some cases, hand-drawn diagrams from as far back as the 19th century — could help predict where future water main breaks would occur. Working with the Syracuse Office of Innovation and Water Department, fellows Syed Ali Asad Rizvi, Benjamin Brooks and Avishek Kumar, mentors Ali Vanderveld and Kevin Wilson, and project manager Chad Kenney sought to find a new solution for the city’s water main headaches.

Water main breaks in Syracuse from 2004 to 2015.

Syracuse deals with hundreds of water main breaks, leaks, and other issues that require attention each year, distributed without a clear pattern across the entire city. City engineers know that age is the primary driver of water main failure, but also suspect that additional factors, such as soil content, pipe material, and weather, contribute to the chance of a break. For the summer project, they asked the DSSG team to build a predictive model incorporating these data types and additional sources to identify specific water mains at the highest risk of rupturing soon.

But gathering that data in the first place was a significant challenge. Syracuse has 500 miles of water mains running through the city, but age data only covered a quarter of those mains, and only 1 percent of pipes had both age and pipe material data assigned. Working with their partners, the DSSG team used a combination of document-digging (including property tax data and decades-old field notebooks) and statistical imputation to fill in the missing data, then set about finding an accurate predictive model.

Example of a field notebook page containing data on pipe installation and materials.

While this project was the first at DSSG to focus on an infrastructure issue, in a broad sense, the problem was similar to others taken on by DSSG in the past. Just as other projects looked to identify relatively rare adverse events, such as students not graduating high school or properties descending into blight, the Syracuse team’s model would attempt to predict the pipes most likely to break in the near future. To test different models, they trained them on past data and attempted to “predict the past” — for example, using data from 2010-2012 to predict water main breaks in 2013-14.

With the best-performing model, the team then used all of the data to assign risk scores to the water mains currently beneath every block in the City of Syracuse. A higher risk score for an individual block didn’t guarantee that a main would break — no model is that good — but by delivering a ranked list, Syracuse could concentrate preventative efforts on the most precarious pipes. Based on their model’s accuracy at predicting past years’ incidents, the team predicted that 32 of the top 50 highest-risk water mains would break in the next 3 years.

The new model also performed better than simple “rule of thumb” calculations for predicting where breaks might occur in the future. If you simply used the age of pipes as a way to prioritize which city blocks should be replaced first, only 5 percent of the top 50 water mains on your list would go on to break in the next 3 years. If you used the history of breaks at different locations, looking at the number of occurrences of breaks per city block, only half of your top 50 riskiest mains would break. But most water main breaks are “first-time offenders,” without prior breaks at that location. Going by past breaks alone, you would never predict any breaks that have previously had less than three water mains breaks, and replacement efforts would only focus on a handful of neighborhoods.

Comparing the DSSG model to common heuristics for predicting water main breaks.

In the real world, the DSSG model quickly proved itself, at least anecdotally. In the two weeks after the DSSG team delivered the risk scores to their Syracuse partners, two of the water mains listed in the top 50 ruptured. The Syracuse Office of Innovation has quickly integrated the model into their work, using it to guide an infrastructure planning process and decide where to do “dig once” combinations of water main replacement and road resurfacing. They’re also working with students at Syracuse University’s School of Information Studies to continue developing the model as new data is collected.

“Many of the water main breaks that have occurred in 2016 have been on the risky water mains that our DSSG team identified,” said Adria Finch, innovation project manager for the City of Syracuse. “Countless others were within a couple blocks of risky mains, thereby showing us neighborhoods that we should potentially prioritize instead of just specific mains. We are continuing to use the model for preventative maintenance or planned water infrastructure tasks or projects.”

The project could also help other cities facing challenges with aging water infrastructure. All code from the summer’s work is open source and available on Github, and the model can be adapted to work with data from other urban areas and make predictions about their water systems. Hopefully, this tool will help Syracuse and many other cities address this particular aging issue while minimizing expense and inconvenience.