Spark for Social Scientists

Presenter: Alex Engler (Center for Data Science and Public Policy)

Level:  All

This training will introduce attendees to working with massive datasets in Spark, a cutting-edge framework for big data statistical analysis. Students will follow along and write their own code in a big data cluster using SparkR – the R implementation of Spark.  The session will also overview the Urban Institute’s open source approach to launching Spark clusters using Amazon Web Services. Students will develop skills using cloud services, as well as big data manipulation and analysis.

Audience: No technical skills are required as a prerequisite, though an introductory knowledge of R or Python would be helpful.

This work is generously funded by the Alfred P. Sloan Foundation.