Data Science and Public Policy Seminar Series
April 27, 2016 @ 1:00 am - 2:00 pm
In recent years, researchers have mined social media streams to forecast and “nowcast” a wide variety of population statistics, including the rates of influenza, obesity, drug consumption, and even roadkill. While our ability to discover such correlational knowledge is well established, the question I pose in this talk is whether it might also be possible to extract causal knowledge from these noisy data. Along with the typical difficulties of making causal inferences from observational data, social media present many additional threats to validity, most notably that the variables of interest are rarely observed and so must be estimated by text classification models. In this talk, I will review some progress we have made in this area, including (1) identifying synthetic control groups, (2) inferring user demographics with distantly supervised models, and (3) designing text classification algorithms that are robust to confounding variables.
Aron Culotta is an Assistant Professor of Computer Science at the Illinois Institute of Technology in Chicago, where he leads the Text Analysis in the Public Interest lab (http://tapilab.github.io/). He obtained his Ph.D. in Computer Science from the University of Massachusetts, Amherst in 2008, where he developed machine learning algorithms for natural language processing. He was a Microsoft Live Labs Fellow from 2006-2008, and completed research internships at IBM, Google, and Microsoft Research. His work has received best paper awards at AAAI and CSCW.