Introductions and Definitions – Data Science for Social Good Fellowship

Rob Mitchum

“Data science is not a real field, it’s just statistics done by people with weird hair.”

You might not expect a fellowship called Data Science for Social Good to begin their summer with a debate over the legitimacy of data science. But before the fellows even had a chance to introduce themselves to the group, they were given provocative claims and forced to take a stand…somewhere between a white board marked “AGREE” and another marked “DISAGREE.”

In the now-traditional opening activity for the DSSG fellowship, the new fellows and mentors participated in a “human statistical distribution” game. Fed a statement such as “Privacy is dead, get over it.” or “Stephen Colbert was the best replacement for David Letterman,” the participants rearranged themselves along the Agree/Disagree spectrum based on their belief. They then defended their views, arguing for the minimum wage or the value of maintaining some measure of privacy in the increasingly digital world.

Even the data science statement created a wide distribution, with some arguing that most of the methods data scientists use were developed by statisticians long ago, while others argued for technical, cultural, and even narrative differences that define data science as a new discipline. But the best definition was provided by Rayid Ghani in the first session of the day, a broad overview of the summer’s purpose:

“It’s not about the data, it’s about the problems you’re solving.”

Most of the day was filled with the routines of moving into a new space: unwinding the bubble wrap from monitors, finding the nearest outlets and the most comfortable chair, trying to figure out what was causing the strange smell from the sink. Teams set up camp around the third floor space on Chicago’s State Street, bringing out personal totems to personalize their desks.

In small groups, the fellows and mentors burned through stacks of post-it notes. First they wrote down their goals for the summer, ranging from learning new programming languages and how to use cloud and parallel computing to more casual activities such as karaoke, swimming in Lake Michigan, watching the World Cup, and sampling Chicago cuisine: deep dish pizza, hot dogs, and Indian food on Devon Avenue. Then, they ran a “manual” k-means analysis on the goals, organizing the post-its into related clusters.

A second round of post-its covered the supply and demand of expertise in specific techniques — what methods fellows want to learn, and which methods they are willing to teach to others. A third round of notes proposed names for the various meeting rooms, classrooms, and strangle-shaped nooks around the space, drawing from these including math terms, superheroes, Chicago neighborhoods, coffee brands, and imprisoned Chicago politicians.

When the day finally wheeled their chairs into a big circle for the traditional introductions, less technical goals emerged. Nearly every fellow and mentor stated their desire to not just learn new skills, but build relationships with other people within and without the fellowship, creating a new community of data scientists interested in real world problems. From this network of similarly-minded by diversely-talented people, a clearer definition of the work they do may spontaneously emerge, answering perhaps the most poignant post-it goal of the day:

“Learn what the hell data science is!”