More and more unstructured, organic data related to human behavior, beliefs, and opinions are being shared online. Because of their availability and richness, these data are an important source of information for social scientists attempting to characterize and predict human and societal dynamics. They give insights that traditional survey data can miss and are less costly to collect. To help facilitate a broader reach of text analytic methods and tools across social, behavioral and economic research, we are creating a community of social scientists across disciplines that will work on different data blending projects to advance their research. 

Contact: Lisa Singh, Professor of Computer Science & Research Professor (MDI)

Current Projects

The Social Science and Social Media Collaborative (S3MC)
The Social Science and Social Media Collaborative (S3MC) aims to incorporate three parallel projects on survey methodology, political communication patterns, and parenting information and misinformation to address issues concerning the use of data from social media. This study explores if data from social media is representative, if its users are honest about their thoughts, and if the collection and processing of the data unbiased and accurate. 

Forced Migration
This interdisciplinary research project develops algorithms, methodologies, models, and tools to help policy makers better understand potential forced migration. Specifically, this project explores how the use of ‘big data’ can inform the development of near real-time models for forced displacement decision-making.

The #metoo Movement
This interdisciplinary research project develops algorithms, methodologies, models, and tools to help policy makers better understand potential forced migration. Specifically, this project explores how the use of ‘big data’ can inform the development of near real-time models for forced displacement decision-making.

The MDI Data Blending Portal

Given the range of data maintained at the MDI, we have developed a portal that integrates data from different text data sources to create variables that social scientists can use within their traditional research portfolios. This allows researchers to blend knowledge obtained from unstructured text data, including social media data, with more well-structured variables. The portal gives researchers the flexibility to generate variables at different time scales (daily, monthly, annually), for different subsets of data, by using different data matching, data mining and machine learning algorithms. The portal is being used by researchers analyzing the 2016 U.S. Presidential Election and researchers investigating movement patterns in Iraq and Syria. 

The Expandable Open-Source Database (EOS)

The EOS database consists of more than 700 million publicly available, open-source media articles and blog posts actively compiled since 2006. Currently, data is collected from over 20,000 Internet sources at a rate of approximately 100,000 articles per day. EOS includes a web-based search engine that 1) searches the archive by keyword or by using geospatial maps, 2) saves searches, 3) publishes searches, and 4) exports articles in bulk for more detailed analysis. Researchers working in a number of different disciplines use EOS for collecting raw data, frequency counts, topic buzz level, and other variables to support their research goals. Project domains currently using the archive include forced migration, human trafficking, infectious disease, and religious conflict.