Health Economics Data Sources & Databases
A curated directory of the datasets that power health economics research, with coverage, access, and use-case notes.
Most health economics research draws on a small set of public datasets: federal administrative records (CMS claims, AHRQ's HCUP), national health surveys (BRFSS, NHIS, NHANES, MEPS), and economic and demographic series (Census American Community Survey, the Bureau of Labor Statistics, the Bureau of Economic Analysis). California-specific work adds the CHHS Open Data Portal, and cross-country comparison adds the WHO Global Health Observatory. The directory below groups these sources, notes what each covers, and points to the official portal for each.
Last reviewed: June 2026
Choosing a dataset is the first methodological decision in any applied health economics study, and it usually constrains everything that follows: the population that can be studied, the outcomes that can be measured, the granularity of geography and time, and the identification strategies that are feasible. A claims database supports detailed cost and utilization analysis but says little about behavior or biomarkers; a household survey captures self-reported behavior and coverage but rarely the full cost of an episode of care. The sources below are organized by what they were built to measure so that the right match is easier to find.
| Dataset | Source / Owner | Coverage | Access | Typical use |
|---|---|---|---|---|
| CMS Data Portal (opens in a new tab) | Centers for Medicare & Medicaid Services | National; Medicare and Medicaid | Free (public files); restricted (identifiable claims via ResDAC) | Spending, utilization, provider participation |
| HCUP (opens in a new tab) | Agency for Healthcare Research and Quality | National and state; hospital encounters | Purchase / restricted; free summary tools (HCUPnet) | Inpatient, ED, and ambulatory-surgery costs |
| BRFSS (opens in a new tab) | Centers for Disease Control and Prevention | National, state, many counties | Free | Health behaviors, chronic conditions, prevention |
| NHIS (opens in a new tab) | CDC, National Center for Health Statistics | National; households | Free | Coverage, access, self-reported health |
| NHANES (opens in a new tab) | CDC, National Center for Health Statistics | National; interview + exam + labs | Free | Clinical and biomarker measures of health |
| MEPS (opens in a new tab) | Agency for Healthcare Research and Quality | National; household panel | Free (public); restricted (linked files) | Health care cost, use, and insurance |
| American Community Survey (opens in a new tab) | U.S. Census Bureau | National to tract level | Free | Income, insurance, demographics, denominators |
| Bureau of Labor Statistics (opens in a new tab) | U.S. Department of Labor | National, metro, some state | Free | Price indices, wages, employment adjustments |
| Bureau of Economic Analysis (opens in a new tab) | U.S. Department of Commerce | National, state, county | Free | Income, health spending accounts, deflators |
| CHHS Open Data (opens in a new tab) | California Health and Human Services | California; state and program level | Free | State administrative and program data |
| WHO Global Health Observatory (opens in a new tab) | World Health Organization | Cross-country | Free | International comparison of health indicators |
Federal administrative data
Administrative datasets record transactions as they happen, so they capture cost and utilization with a level of detail that surveys cannot. The CMS Data Portal publishes Medicare and Medicaid spending, provider, and quality files; identifiable claims for research are available separately through ResDAC under a data use agreement. The Healthcare Cost and Utilization Project, HCUP, is the largest collection of longitudinal hospital encounter data in the United States, covering inpatient stays, emergency department visits, and ambulatory surgery; its free HCUPnet tool returns national and state summary statistics without a data purchase.
National health surveys
Surveys are the workhorse of population health measurement because they capture behavior, coverage, and self-reported status that administrative records omit. The Behavioral Risk Factor Surveillance System is the largest continuously conducted health survey in the world and supports state and, for many measures, county estimates of risk factors and preventive care. The National Health Interview Survey is the oldest household health survey and a standard source for coverage and access. The National Health and Nutrition Examination Survey adds physical examinations and laboratory tests, making it the reference for clinical and biomarker outcomes. The Medical Expenditure Panel Survey follows households over two years and is the most complete source on the cost and use of care and on insurance coverage.
Economic and demographic data
Economic evaluation needs denominators, prices, and adjustments that come from outside the health system. The American Community Survey provides income, insurance, and demographic detail down to the census-tract level, which is essential for population denominators and for small-area analysis. The Bureau of Labor Statistics publishes the price indices and wage series used to inflate costs to a common year and to value time. The Bureau of Economic Analysis produces income and national health expenditure accounts and the deflators used in cost studies.
State and local data
For California-focused work, the California Health and Human Services Open Data Portal publishes state administrative and program datasets, from licensing and facility data to program enrollment. State-level estimates from BRFSS and the ACS complement these portal datasets when a national survey instrument is needed for comparability across states.
International data
Cross-country comparison relies on harmonized indicators. The WHO Global Health Observatory compiles health indicators across countries and themes, supporting benchmarking and comparative cost-effectiveness work.
How to choose a dataset
The right source follows from the research question rather than from familiarity. A study of what care costs and who pays for it points to MEPS, HCUP, or CMS claims, because those records measure spending directly. A study of how common a condition is, or how a behavior varies across counties, points to BRFSS or NHIS, which are designed for prevalence estimation with known sampling weights. A study that needs a clinical measurement, such as measured blood pressure or a lab value rather than a self-report, points to NHANES, the only national source that examines participants in person. A study that needs income, insurance status, or population counts to build a rate points to the American Community Survey.
Three questions usually settle the choice. First, what is the unit of analysis, the patient, the encounter, the household, or the county? Administrative files are organized around encounters and claims, while surveys are organized around people and households. Second, what geography is required? National estimates are available from every source here, but tract-level detail comes only from the ACS, and county detail in the health surveys is limited to the larger counties or to modeled small-area estimates. Third, what time structure is needed? Repeated cross-sections such as BRFSS describe trends in the population, while panel data such as MEPS follow the same households over time and support within-person comparisons.
Understanding access tiers
Public-use files are de-identified and downloadable without an agreement, and they cover the majority of analyses on this page. Restricted-use files contain more detail, such as exact geography, dates, or identifiable claims, and require a data use agreement and often a secure enclave. CMS identifiable claims are the clearest example: summary and provider files are open on the CMS portal, while research-identifiable claims are released through ResDAC under contract. HCUP follows a similar split, with free HCUPnet summaries alongside purchasable encounter-level databases. Planning for the access tier early matters, because a restricted file can add months to a project timeline for approvals and secure computing.
Linking and combining sources
Most applied work combines a primary dataset with one or more supporting series. A cost analysis built on MEPS or HCUP typically pulls price indices from the Bureau of Labor Statistics to express dollars in a common year, and population denominators from the ACS to convert counts into rates. A county-level study in California often pairs a CHHS Open Data program file with ACS demographics and a BRFSS risk-factor estimate. When linking at the area level, confirm that the geographies align, because survey small-area estimates, administrative service areas, and Census tracts do not share boundaries, and a careless join can attribute one area's denominator to another's numerator.
Common pitfalls
- Ignoring survey weights. BRFSS, NHIS, NHANES, MEPS, and the ACS are complex samples; estimates and standard errors require the published weights and design variables, or they will be biased and overconfident.
- Treating claims as clinical truth. Administrative records capture what was billed, not what was measured, so coding changes and reimbursement incentives can move a series independently of underlying health.
- Mismatched price years. Costs drawn from different years are not comparable until they are inflated to a common year with an appropriate index.
- Over-reading small-area estimates. Modeled county estimates from national surveys carry wide intervals; presenting them as precise point values overstates what the data can support.
Frequently asked questions
- What are the best free datasets for health economics research?
- Widely used free sources include CMS claims and provider data, AHRQ's HCUP and MEPS, the CDC surveys (BRFSS, NHIS, NHANES), and Census ACS for demographic and income detail. Each is public, documented, and maintained by a federal agency.
- What is the difference between MEPS and NHANES?
- MEPS measures health care use, spending, and insurance coverage for a panel of households, making it the standard source for cost and expenditure analysis. NHANES combines interviews with physical examinations and laboratory tests, making it the standard source for clinical and biomarker measures of population health.
- Where can I find California-specific health data?
- The CHHS Open Data Portal publishes state administrative and program datasets. National sources such as BRFSS and the Census ACS also release state and, for some products, county-level estimates for California.
- Which datasets support cost-effectiveness analysis?
- Cost and utilization come from MEPS, HCUP, and CMS claims; population denominators and risk factors come from BRFSS, NHIS, and the ACS; and price and wage adjustments come from BLS and BEA series. The Methods Lab cost-effectiveness tutorials walk through how these pieces combine.
Pair these datasets with the CAPHE Methods Lab tutorials to move from raw data to a defensible causal estimate, or apply them in the ROI Calculator and Access Explorer.