Photo of the US Capitol
Category: Research

Title: Who is missing from administrative data?

Author: Amy O’Hara, Izzy Youngs, and Lahy Amman
Date Published: August 6, 2020

Government programs and systems keep detailed records of eligibility and other participant information in order to provide benefits, track budgets, comply with laws, and assess program outcomes.

Individuals who pay taxes and get a driver’s license to comply with federal and state laws are included in revenue agency records and DMV databases. Those who enroll in Medicare or apply for food stamps have their benefits tracked in federal and state databases. The primary use of these records is to administer the government programs for which they were collected.

This “administrative data” (PDF) also serves important secondary uses, such as research and evaluation for policy analysis and performance measurement. Administrative data are also critical for statistical uses, such as measuring the population and economy.

The economic census is conducted every five years, relying extensively on administrative data from the Internal Revenue Service (IRS). Demographers also regularly use tax returns and Medicare enrollment files, along with counts of births and death and new legal permanent residents, to measure population changes.

The 2020 Census is currently underway, and it will use more administrative data than any past census, including data from sources such as Housing and Urban Development, IRS, Social Security Administration, and state Supplemental Nutrition Assistance Programs. These large, wide-reaching government programs will reflect most people. But do administrative data systems include everyone? Who is missing?

Administrative data may be missing people who do not participate in government programs, including people who do not have Social Security numbers, people with housing instability, or people working in the informal economy. Even young children (PDF) may be difficult to observe in administrative data. Many of these groups also experience variation in the coverage and accuracy of their data, especially noncitizens (PDF).

Government data

There is no single database of the United States population built from administrative data. Rather, administrative data are held by various state and federal agencies. With legal, secure access and effective linkage methods, the vast majority of the population can be observed in administrative records.

The 2010 Census Match Study (PDF) compared population coverage in administrative data with the results of the 2010 census, finding that 92.6% of addresses and 98.0% of people counted in the census were also observed in administrative data. However, there were a few issues with the study.

The study only included a fraction of federal administrative data sources.  It did not include Veterans Affairs, Medicaid, or SSA Supplemental Security Income data, potentially missing important hard-to-count populations. The matching was also stronger in areas of the country with city-style addresses, suggesting people living in rural populations that were hard to match may be missing from administrative data.

The study demonstrated the need for state-level administrative data to complement federal sources. The inclusion of SNAP, WIC, and TANF could improve coverage of adults and children missing from other sources (especially tax data), but administrative data are only as expansive as state and federal programs are.  A combination of sources will be needed to accurately count everyone.

For instance, counting children in administrative data is challenging. Nearly all children get a birth certificate, but some children fail to appear in administrative systems until they are school-aged. Linking birth records with immunization registers, preschool programs, and K-12 education systems would ensure the most accurate reflection of the youth population, but coverage would be asymmetric and depend on each state’s program eligibility requirements and the Census Bureau’s ability to access these sources.  Census does not acquire linkable data from vital records, immunization registers, preschools or school districts.

Due to restrictions in access and program eligibility, administrative data frequently misses (PDF) minorities, residents in group quarters, immigrants, recent movers, young children, and un- or underemployed individuals.  The likelihood of someone being in administrative records is tied to various processes of assimilation (PDF), including English proficiency, educational attainment, and full-time employment.

Commercial data

Many nonprofits, universities, and private corporations also collect and maintain a wide net of data, which can be reused through data collaboratives or data intermediaries (PDF). This kind of data is also much more ubiquitous than traditional government data, as those not meeting eligibility requirements to qualify for state or federal programs may still be included.

In the 2010 Census Match Study, commercial data produced more matches to 2010 census addresses compared to federal administrative data. The Census Bureau is exploring (PDF) the utility of new strategies for the 2030 census, including use of sensor and commercial data to enhance their statistical products, and other statistical agencies have purchased commercial data to augment their traditional survey and administrative data approaches. For example, the Economic Research Service relies on commercial data for its Consumer Food and Nutrition Data Infrastructure.  They obtain commercial nutrition data to track families’ food purchases and product prices.

Social media data have been hard to access and contain unknown bias for population measurement, but are also a promising source of data for secondary research use.  Facebook data have been made available for independent research on elections and democracy, and even public social media data has been used to study disaster-related migration.

Seeking more sources and better data governance

Despite the promise of expanded coverage of undocumented, homeless, and child populations, federal, state and local sources are often inaccessible due to legal or policy concerns. Access is further complicated because many datasets reside in legacy data systems at the state and federal level.  Additional investment in infrastructure and capacity is needed.

Fortunately, such investments are called for in the Evidence Act and are recommended by the Commission for Evidence-based Policymaking. The Federal Data Strategy also calls for better data governance. In addition, the ethical framework of an endeavor to facilitate data sharing must include privacy-centered design. Privacy-protecting methods must be tested, including differential privacy, federated data systems, and secure multiparty computation.

Administrative and commercial data can be powerful tools in making evidence-based decisions, but without good governance models, this data can become at best a vacuum, and at worst a tool of actual or perceived surveillance that violates civil liberties and human rights.