Student Experience

Massive Data Institute Scholars Showcase Research Bridging Big Data and Public Policy

Scholars at the McCourt School’s Massive Data Institute leveraged big data research on timely topics, like online misinformation and gun violence, to advance evidence-based policy solutions.

The McCourt School’s Massive Data Institute Scholar program, also known as the MDI Scholars, is designed to increase interdisciplinary public policy research and engage students in projects analyzing big data to inform evidence-based public policy solutions. Every semester, Georgetown undergraduate and graduate students apply to the program, and selected students, work with leading experts to study data related to timely topics from forced migration, social movements, and gun violence to elections, misinformation, data privacy, and more.

“The MDI Scholars program demonstrates the collective power of collaboration across disciplines,” explains Mike Bailey, McCourt School professor and director of the Massive Data Institute. “Students gain networking and mentoring opportunities while working closely with Georgetown experts to unearth data-driven insights that help shape the future of evidence-based public policy.”

This fall, 19 scholars jumped right into bold, large-scale data analysis projects to advance knowledge in several pressing public policy issues. The Fall 2020 cohort represented nearly every school at Georgetown with ten undergraduate students and nine graduate students –– including four McCourt students.

At the end of the semester, the Scholars showcased their research and outlined key findings and important next steps.

Learn more about several of the projects below:

What Does it Mean to be a Moderate?

Abhi Ramaswamy (SFS’22) examined Twitter timeline data to review how moderate Congressional candidates have differentiated themselves from liberals and conservatives on key topics. Arjun Ravi, a Fritz Family Fellow (C’22), and Luke Dascoli (C’21) examined both official Congressional websites and campaign websites to understand if partisanship affected how members of Congress communicate and if their campaign language could predict their electoral outcome.

Detecting COVID-19 Misinformation

Alexander Chen (COL’22) developed an iterative machine learning approach using tweets from 2020 related to COVID-19 to improve label data and eventually build towards a “myth detector” to detect the spread of misinformation related to COVID-19.

Forecasting Forced Migration

Rashid Haddad (COL’21) and Douglas Post (G’22) explored how big data can improve upon current models for forced displacement. Their project aims to provide close to real-time analysis and develop better predictive methods to aid humanitarian relief efforts.

Analyzing Police Use of Force Data in New Jersey

McCourt student Matt Ring (MS-DSPP’22) worked with McCourt Professor Andrea Headley to format New Jersey’s Police Use of Force Data to merge it with US Census data. The dataset provides more granular-level information on use-of-force incidents and allows researchers to analyze the data and identify trends.

Building a Sensor to Predict Instability in India

Ruchika Bhatia (MPP’21) worked with McCourt assistant teaching professor Eric Dunford to review datasets like internet search history, Google trends, and national-level surveys to build a model to predict instability in India using time-varying conceptual metrics.

Tracking U.S. Gun-Related Deaths and Gun Ownership

Sonya Hu (C’22) analyzed Twitter data to determine whether we can improve gun ownership estimates from social media data. She also sought to understand better the relationship between mass-shooting events and gun-related death estimates in America.

Universal Web Crawler Pipeline Construction

Andrew Chen (C’22) and Yang Chen (G’21) worked with MDI Postdoctoral Fellow Jaren Haber, Ph.D. to improve “Spiders,” which crawl thousands of websites to extract sub-links and
items (text, images, pdf, doc, etc.). Information like that can be difficult to extract with existing web index software. The research aims to help standardize a data crawling design and big data pipeline compatible with any data crawling project.

Online misinformation related to the 2020 election candidates

In collaboration with MDI research professor Lisa Singh, Meena Balan (SFS’22), Ryan Callahan (G’20), and Brandon Herren (C’21) set out to characterize misinformation emerging from the presidential debates and understand Twitter, newspapers, and cable television mentions.

#MeToo Online Discussions

Samantha Dies (C’22) worked with Dr. Lisa Singh to analyze Twitter data to understand better the relationship between #MeToo Twitter conversations and offline actions and policy change.

Creating a Civil Justice Data Commons

Madeline Pfister (MSB’23) worked with MDI research professor Amy O’Hara and Georgetown University Law Center professor Tanina Rostain to create a civil justice data commons. This project’s ultimate goal aims to create an open-source database of all 50 states, the territories, and the District of Columbia to aid civil justice researchers.

Carceral Ideology and Public Education Reform

Martha Curi Saveedra (G’21) research focuses on quantifying carceral language and the degree to which schools use the language in student handbooks. Carceral language is a type of language used to justify and maintain a state of detention or imprisonment, such as strict behavioral norms, diminished autonomy, and harsh disciplinary practices.

For more information about the MDI Scholars program, contact MDI Program Manager Jonathan Beam . Applications for the Fall 2021 MDI Scholars Cohort will open late summer.

Big Data
Massive Data Institute
Massive Data Institute Scholars
MDI Scholars