By Brian McInnis, Ivana Petrovic, and Jan Vlachy

Every year, some 1 million people die because they do not have access to hygienic sanitation. Among other problems, poor sanitation causes diarrhoea, which is the second largest cause of death worldwide for children under 5 years and the seventh largest cause of death overall. Poor sanitation goes beyond dirty toilets; it’s characterized by an entire system that does not allow for the safe collection, removal, transport, and treatment of human waste.  While 2.4 billion people lack access to improved sanitation solutions, and almost a billion people have access to no toilet at all, over 4 billion people live in communities where waste is not safely contained and removed. This problem is particularly acute in growing urban areas in Sub-Saharan Africa and Southern Asia. Despite the breadth of this problem, the United Nations Millennium Development Goal aiming for improved sanitation was the largest failure of the MDGs. However, global efforts to address the sanitation crisis have been redoubled in the Sustainable Development Goals by 2030.


The sanitation crisis is particularly grave in the urban slums of Sub-Saharan Africa. Two million people live in the slums of Nairobi, which are some of the most densely populated areas on the Earth, with some parts reaching 125,000 slum-dwellers per km2; by comparison, Hong Kong has a “mere” 6,690 inhabitants per km2. On average, in Nairobi there are 57 slum-dwellers for every toilet and 6% without any toilet access.


The Nairobi-based startup Sanergy seeks to address this problem through a full value chain approach, developing a financially sustainable model for the provision of safe sanitation in urban areas, for everyone, forever.

Sanergy started in 2011 as a startup from a university project and has since expanded rapidly to provide about 750 toilets (as of August 2016) in Nairobi, shown in the figure on the left. These toilets are branded as Fresh Life. Sanergy sells these toilets to franchisees who pay a yearly waste collection fee to Sanergy. They are responsible for maintenance and cleaning, while Sanergy supports the franchisees with marketing and operational support and safely collects the waste (feces and urine). For example, in 2015, Sanergy collected about 1.5M kg of feces — equivalent to the weight of 13 blue whales! Sanergy consolidates the collected waste in a processing plant and further converts it to fertilizer and other end products. Sanergy is currently growing rapidly, with plans to expand in more Nairobi slums and possibly beyond. To handle this anticipated growth, Sanergy needs to improve the efficiency of their logistics processes and reduce the cost of collecting toilet waste from the communities.

Our DSSG team consists of the fellows Brian McInnis, Ivana Petrovic, and Jan Vlachy, technical mentors Joe Walsh and Jennifer Helsby and project manager Kevin Lo. This summer, we hope to help Sanergy improve their logistics operations, enabling future expansion of their services.

The specific objective of our summer is to focus on improving the collection operations and particularly their collection schedule, which is currently not as efficient as it could be. For example, Sanergy employees currently collect waste from every toilet every day, regardless of usage or volume. But as the figure below suggests, many toilets of their about 700 toilets are rather empty when collected, and so Sanergy may consider collecting them less often. A more efficient use of resources might be to spread out collections of less-used toilets instead of this daily collection. But it’s a complex problem: we also must ensure that the schedules do not conflict with operational constraints such as smell, time, and weight limits per worker or other staffing constraints. If time allows this summer, we will explore further operational improvements such as data-driven staffing (crew scheduling) and routing.


Sanergy currently fixes its collection schedules on a weekly basis. Our team will initially try to create a smart schedule to accommodate this practice. In contrast to the current practice, the smart schedule will avoid everyday, wasteful collections, but instead can vary from day to day and toilet to toilet, as shown in the figure below.


The variance in toilet usage means that individual toilets will need different collection schedules. Our aim is to provide an algorithm that produces flexible collection schedules, such that Sanergy picks up low-fill toilets less often, perhaps once every two or three days, and high-fill toilets, such as those in marketplaces, on a daily basis. Therefore, we’re using several different data science methods to predict daily toilet usage and to infer the most efficient collection schedules.

Our team has so far considered the following models:

  • A baseline static model that considers toilets based on their average fill rate per day and schedules them to be collected appropriately (e.g., if average less than 30% per day, then collect every third day; if average more than 50%, then collect every day).
  • Time series models that for each toilet derive likely fill rate based on the past fill rates from recent days. We have initially tested the autoregressive model. With the estimate, the collection schedule can be determined naively:
    1. Predict how much waste each toilet will accumulate every day in the next week.
    2. For each toilet, Iterate over the days and count how the (predicted) waste accumulates, schedule a collection whenever the predicted accumulated waste would exceed 100% (probably allowing for uncertainty, so e.g., collect if the predicted 90 percentile exceeds 100%; also: not later than after 3 days because of the “smell constraints”). At that point, reset the counter and continue iterating.
  • Machine Learning models from the scikit-learn suite: Using these models, we could try to directly predict which toilets will be full on which days, bypassing the need to predict fill rates. Initially, we have tried the random forest model.

Sanergy also wants to ensure that various constraints are satisfied: for instance, that no collectors are assigned excessive workload (in terms of weight or distance) and proper teams are scheduled to collect all routes. We will build these constraints into our algorithm as well, to make sure that the schedules it suggests can actually work in the real world, with real people.

Our team will provide not only the smart schedule algorithm itself, but the entire pipeline needed for the schedule. An outline of this pipeline is in the figure below. We developed a module to load, transform, and clean the data from Sanergy databases. This input data is then used to create features that might predict toilet usage, including weights of the waste collected in the past, weather, toilet and franchise type, season and day of the week, and the number and type of nearby toilets.  Next in the pipeline, the data features are used to select the best prediction model using temporal cross-validation, after which the best model is used on all available data to create the next week’s smart schedule for Sanergy to use.


Sanergy and DSSG hope that our team’s project to improve Sanergy operations will be piloted soon, and that the realized efficiencies will be used to expand access to hygienic sanitation solutions to residents  in Nairobi and beyond.

Hopefully, these changes will help Sanergy with their logistics operations, allowing them to expand more rapidly and provide more residents with access to hygienic sanitaiton. As a second, optional summer objective, we would like to build models to support the location of new toilets and analyze how new toilets added in the past affected operations and the demand for existing toilets. Subsequently, we will also try to expand the models to also estimate demand and operational changes for potential new toilets.