In 2017, the Commission on Evidence-Based Policymaking reported that administrative data should be used to inform evidence-based policy decisions. The Commission unanimously recommended that inter-agency sharing of administrative data for this purpose should be accompanied by enhanced privacy protections. Subsequently, the Federal Data Strategy and the Evidence Act of 2018 doubled down on this push for inter-agency data sharing to promote informed decision making.
Last spring, in a demonstration at the Department of Education, Georgetown’s Massive Data Institute (MDI) illustrated how a specific PPT could be used to prove exactly that.
MPC as a solution to the privacy conundrum
One type of PPT, called secure multiparty computation, can link and perform analysis on disparate datasets while they remain encrypted. Secure multiparty computation, or MPC, assures that no party sees the other party’s underlying sensitive data. Instead, all encrypted datasets are entered by data owners into a secure computing application, which has been programmed to carry out queries specified by the data owners. Each participating data owner only sees their own data and then the aggregate output from the joint computation.
Due to these features, MPC offers a level of security beyond what many standard protocols offer now. It fits perfectly into the Commission’s recommendation that agencies increase their administrative data sharing with one another without giving up privacy protections. Identities of individuals and other sensitive information are never revealed during the process, which is a departure from the typical data sharing model used in many agencies: a trusted party exposed to the decrypted data, rife with identifying information, to perform necessary research and evaluation work.
Window of Opportunity at Department of Education
The National Center for Education Statistics (NCES), the statistical research arm of the Department of Education, is proactively pursuing the Commission’s recommendations. NCES aims to accelerate responsible sharing of Federal data by exploring new applications that protect the privacy of sensitive data during computation, while preserving the utility of the data. With support from the leadership at the Department of Education, as well as funding from the Alfred P. Sloan Foundation, Georgetown’s MDI became part of that NCES effort: using MPC to reproduce commonplace statistics that require sharing of sensitive data among Department of Education activities.
Our prototype demonstration reproduced a portion of the annual 2015–16 National Postsecondary Student Aid Study (NPSAS:16) report, showing statistics on average federal Title IV aid received by undergraduates for the 2015-16 academic year.
In this setting, data comes from two activities within the Department: the National Postsecondary Student Aid Study group (NPSAS) at NCES and the National Student Loan Data System (NSLDS). Today, preparing the NPSAS reports requires that NSLDS must share sensitive student financial information with NPSAS, and NPSAS must share students’ social security numbers with NSLDS. This exchange is evidenced in the Request File and Results File, respectively, in Figure 1.
Figure 1. Simplified Diagram of Current (Non-Privacy-Preserving) Computation of NPSAS:16 Report Table
To avoid those disclosures while successfully and efficiently providing the same statistics, our prototype performed the same data linkage and statistical analysis, without sharing that sensitive information. Instead, our approach relied on sharing the data and computing the necessary statistics while the data remained encrypted, through MPC.
MPC offers cryptographically-proven security at levels comparable to those typically used for encrypting federal government data today. Instead of employees from NPSAS and NSLDS seeing one another’s sensitive data, none of them saw the other’s data, but rather, the MPC application linked and performed statistical calculations on the two datasets, outputting only average aid amounts, to be sent to NCES (Figure 2).
Figure 2. Privacy Preserving Computation Protocol for NPSAS:16 Report Table
In our prototype, we carried out 123 experiments to re-create the NPSAS:16 statistics, taking 4.8 hours of CPU time on two cloud-based virtual machines, typical of those available to the Department of Education. All experiments were conducted by a single statistics analyst from MDI without significant programming or cryptography background, within the Department of Education’s trusted network.
Lessons Learned, and Future Directions
The outcome of this project shows that PPTs can be efficiently and effectively applied to privacy-preserving sharing of sensitive data among distinct activities, while also enabling analyses that provide valuable evidence for policy-making.
Specifically, our prototype:
- produced accurate results for average Title IV federal aid amounts, when compared to results from the NPSAS:16 report;
- operated efficiently, with computation and network burdens well within practical limits;
- demonstrated that users without significant programming or cryptographic experience can use PPTs to protect data privacy in a production-like environment;
- and assured that sensitive data used were kept cryptographically private to the organizations providing that data.
However, PPTs are only useful when put into practice. A planned, well supported introduction of PPTs that addresses certification, legal, technical, and training concerns will be key to successful deployment. In the interim, it would be useful to further demonstrate applicability of PPTs in NCES production workflows, as well as conduct future demonstration projects across agencies.