Pew Research Center is pleased to host two parallel training workshops on Tuesday August 27 at their office in downtown D.C. These workshops will focus on two key areas: 1) natural language (NLP) processing techniques and their application to social science questions; and 2) deep learning techniques with applications to image analysis for social scientists. Participants will be given hands-on experience building and training models in these subject areas and will also be able to meet members of Pew Research Center’s Data Labs teamWorkshops will run from 2pm to 5pm Tuesday August 27. Light snacks and coffee will be provided. Cost per participant is $25.

Natural Language Processing (NLP) 

This workshop will offer a soup-to-nuts overview of text-as-data methods for political, social, and policy applications using R, split into two sections. The first section will provide both a practical and theoretical introduction to text-as-data approaches including: how to acquire text data, how to clean and organize text data, and challenges and considerations for working with text sources in combination with other types of data. In particular, the first section will introduce dictionary-based techniques (and discuss their pitfalls and alternatives), as well as visualization strategies for summarizing corpora. The second section will focus on more advanced approaches to evaluating text data, including clustering, topic modeling, and word embeddings. 

This workshop will be led by Sarah Bouchat. Bouchat is an Assistant Professor at Northwestern University, and is core faculty in the Political Science Department as well as the Institute for Complex Systems (NICO). At Northwestern, Bouchat teaches courses in machine learning, text-as-data, Bayesian statistics, and research design for social science. They completed their PhD in the Department of Political Science at the University of Wisconsin–Madison in 2017. With research interests in political methodology, comparative political economy, and authoritarian politics with a regional focus on Southeast Asia, Bouchat’s research has focused on elicited priors, as well as machine learning and Bayesian statistical applications for the study of low information, authoritarian regimes like Myanmar.

Image Processing

As the volume and accessibility of media content has increased in recent years, computational methods for content analysis at scale have gained popularity. Until recently, these studies limited their use of computational methods to text analysis. As a result, current innovations and developments in the field of deep learning for computer vision, or automated image content analysis, remain unfamiliar and inaccessible for most social science researchers. This workshop aims to bridge this gap, by providing a brief theoretical background along with hands-on experience with deep learning for image analysis. The focus of the workshop is on feature extraction and image classification. It will be of particular interest to scholars studying social media or any domain where large numbers of images (e.g. tens of thousands or more) need to be labeled. It also serves as a useful introduction for scholars interested in studying video data. The first part of the workshop introduces deep learning and Convolutional Neural Networks (CNNs). The second part is a hands-on tutorial using Python. It will cover how to 1) use pre-trained CNNs to extract features of interest from an image (e.g. identifying particular objects), and 2) how to train CNNs to classify images into user-determined categories using labeled examples. Experience with Python will be helpful but is not required.

This workshop will be led by Nora Webb Williams.  Williams is a PhD Candidate in the University of Washington Department of Political Science studying comparative politics, methods, and political economy. Her dissertation research addresses economic resilience and the long-term impacts of colonialism on social trust, with a regional focus on the former Soviet Union. She also writes about the impact of social media and images on protest mobilization, examining diverse cases such as the 2010 revolution in Kyrgyzstan and the Black Lives Matter movement in the United States. Her primary methodological interest is in images as data for social science research, with related interests in machine learning, text as data, and causal inference. Notable experiences outside of the university setting include serving as a Peace Corps volunteer in Kazakhstan and Liberia. Her work has been published or is forthcoming in Political Research Quarterly, Central Asian Survey and Nationalities Papers.