Improving Traffic Safety in Jakarta Through Video Analysis

João Caldeira, Alex Fout, Aniket Kesari, Raesetje Sefala

UPDATE: We are pleased to announce that this project team won a Highlighted Paper Award at the AI For Social Good NIPS2018 Workshop! Congratulations to the Jakarta Fellows!

The World Health Organization (WHO) estimates that over 1.25 million people die each year in traffic accidents. Nearly 2000 such fatalities occur annually in Jakarta, Indonesia alone, making it one of the most dangerous cities in the world for traffic safety. These deaths are tragic, but many of them are preventable through effective city planning. This summer, our team at the Data Science for Social Good Fellowship (DSSG) at the University of Chicago set out to help the city of Jakarta bring this figure down and protect its residents from traffic-related injury and death.

Jakarta is a city in transition, having experienced explosive population growth over the last few decades. In 1970, just over 2 million people inhabited Jakarta. Today, the city’s population eclipses 10 million, and continues to grow. With this growth comes a rise in vehicle ownership and congestion. These factors inevitably lead to traffic incidents. Our goal was to enable the city to reduce the number of these incidents through city planning.

The most important resource that any city needs when planning is good information. From a data science perspective though, one of the core challenges in addressing traffic safety problems is that it is difficult to collect high quality traffic data. This problem is especially intractable in megacities with bustling commuting patterns and millions of inhabitants. Like other megacities, Jakarta was confronted with the same issues surrounding effective data collection. However, the city also has a promising resource: thousands of traffic cameras posted throughout the city, with many more to come.

Fig. 1: A satellite map of Jakarta with traffic cameras marked

Machine learning tools provide the means to use traffic video to their maximum potential primarily by permitting the processing of video at scale. One could imagine hiring a technician to continually watch a video feed of some particular intersection, and count the number and types of vehicles they see, as well as interesting events such as a car driving down the wrong side of the road. This approach would be cumbersome even for one video feed, but becomes unfathomable when scaling to hundreds or thousands of cameras. It simply would not be worth the human resources involved to attempt to glean information from raw videos in a manual fashion.

Figure 2: One might imagine hiring a technician to watch a video and transcribe events of interest

Figure 3: The problem is, having human watch videos doesn’t scale well at all!

Figure 4: Computers are much better at handling tasks at scale, thus freeing up humans to do the hard work of analysis and planning

We saw this problem as an opportunity to showcase the potential for data science by building a system where computers do the work of processing video, freeing humans to do analysis and planning. We sought to design a pipeline that could take in raw video and output a structured database. This database would contain information about activity on the roadways, and thus facilitate policymakers’ work by providing them with rich information about vehicle behaviors throughout the city.

However, building this pipeline was not simply a technical challenge. Successful development relied on a deep understanding of the social context in Jakarta. We could not begin making decisions about pipeline design, computer vision techniques, and validation without understanding our partners’ priorities, and their definition of “traffic safety.” Behaviors such as having more than two people on a motorcycle, carrying a food cart through a street, and weaving through traffic were surprising to some members of our team, and we had to learn the cultural context that was specific to Jakarta. This was a critical step because importing our own preconceptions and biases about traffic patterns in the U.S. and other parts of the world would have ultimately not served Jakarta well. Through several conversations with our partners at Jakarta Smart City and Pulse Lab Jakarta, we gradually defined the scope of the problem that we were facing.

Because traffic safety is such a broad topic, much of our work was driven by the eventual interventions that our partners hoped to inform. From our perspective, several technical decisions could ultimately only be made after we understood the policy interventions that particular techniques were meant to facilitate. We therefore sought to understand the medium- and long-term interventions that our partners had in mind, so that we could focus our work on collecting the data that would help direct those interventions most effectively.

Together with our partners, we honed in on several interesting and useful priorities. In the medium term, we learned that Jakarta was interested in better management of its traffic signals and deployment of “traffic stewards.” More efficient interventions in these two areas would help the city manage overbearing congestion, and possibly prevent accidents arising from poor traffic flow management. In the long term, the city was interested in understanding what types of infrastructure changes could improve safety. For instance, installing a median might prevent vehicles from crossing into oncoming traffic, or building a bus lane might improve congestion along a popular route.

Once we understood the scope of interventions, we were then able to develop a plan for the data science techniques that could best address the general problem of improving traffic safety. In particular, we identified three central goals:

Detect and classify objects in a frame
Estimate the direction of movement for objects across frames
Understand the “context” of a scene (i.e. what is a road, a sidewalk etc.)

We settled on these goals because achieving these would enable the interventions that Jakarta was ultimately interested in deploying. Consider, for example, that one of the main problems that our partners were concerned with was vehicles driving on the wrong side of the road, and therefore creating an immediate safety hazard. For a computer to successfully record such an event, it needs to understand that there is a vehicle, understand that it is moving in a particular direction, and know what the “wrong” direction is on a roadway. Alternatively, another issue might be motorcycles and scooters driving on sidewalks and endangering pedestrians. Again, the computer would need to understand what a motorcycle is, know that it is moving and not parked, and recognize that it is traveling on a sidewalk and not a road.

Raw Video Footage

This is a car…

…that is moving…

…on the road

Figure 5: Our task was to take raw video footage (pictured top), and extract information like what type of vehicle is being detected, what direction is it moving in, and what surface is it traveling on.

Once we defined goals, we then turned to pulling together the best techniques for each goal into a cohesive pipeline. Combining this information allowed us to create a powerful tool that we hope will aid Jakarta in its efforts to improve traffic safety throughout the city. By the end of the summer, our pipeline was able to successfully determine when a vehicle was moving on the wrong side of the road, and flag this event in the database. We expect that in the future, Jakarta will be able to identify a range of events of interest, and build a database that contains rich information about traffic behaviors throughout the city.

Figure 6: To demonstrate the power of our pipeline, notice how in this one intersection we see multiple examples of vehicles traveling on the wrong side of the road. Our pipeline automatically flagged these events within the same three-day span, and in fact all of these except the top left occurred within the same two hours. Policymakers will be able to use the knowledge that people tend to drive on the wrong side of the road at this particular time to better understand when to deploy traffic stewards, or what types of infrastructure to build.

We want our work this summer to provide a blueprint for how cities around the world might approach the task of developing effective smart city initiatives. Traffic cameras are fantastic examples of the sensors that several cities have experimented with in the past few years. More than any time in history, cities have an unprecedented capacity to gather high-quality information about how people move in an urban landscape. Our code and technical approaches should illustrate the potential for using such sensors in 21st-century urban planning.

Beyond our technical approach, we hope our work teaches policymakers, academics, and other stakeholders about best practices in applying data science to social problems. Our biggest challenges were not necessarily technical in nature. Instead, we spent much of our time understanding the parameters of the social problem. Once we understood this aspect, we were able to map technical solutions to policy interventions. This exercise was indispensable, and we cannot emphasize enough the importance of proper scoping of social context in successfully designing a technical system.

Over the summer, we gained a strong understanding of why traffic safety is a difficult problem to solve, what the core challenges are in addressing it, and how data science might help. Going forward, we hope these insights will pave the way for cities to eliminate traffic safety hazards altogether. We hope that our work moves us closer to a world where traffic accidents are not considered an inevitability, and we give people the security of knowing that they can safely move about their cities for work and play.