The Data

California's Department of Public Health is evaluating a Community Health Worker (CHW) program. County-level data show a clear positive correlation: counties with more CHW hours have better health outcomes. (Data are simulated for illustration.)

CHW Program Intensity vs Preventable Hospitalizations

Each dot = 1 California county

Next: Two analysts will interpret this data. Both are competent professionals. Both will reach confident conclusions. Yet their recommendations will be opposites.

Analyst A's Interpretation

Analyst A approaches the data with a focus on statistical association. The question: Is the relationship between CHW programs and health outcomes real and meaningful?

A

Analyst A

Adjustment-Based Perspective

Reasoning Process

1
The correlation is strong (r = -0.72) and statistically significant. This is unlikely due to chance.
2
I control for observable confounders: income, insurance rates, urban/rural status. The association persists.
3
CHWs provide evidence-based services (care coordination, patient education) known to improve outcomes.
4
The dose-response relationship (more CHW hours = better outcomes) supports causality.
Recommendation
Expand CHW funding statewide

Next: Analyst B looks at the exact same data and controls for the same variables. But Analyst B asks a different question entirely.

Analyst B's Interpretation

Analyst B asks: Why do some counties have CHW programs while others don't? The answer to this question changes everything.

B

Analyst B

Design-Based Perspective

Reasoning Process

1
Which counties adopted CHW programs? Ones with grant-writing capacity, health department infrastructure, and political will.
2
These same factors also improve health outcomes directly, regardless of CHWs. They're unmeasured confounders.
3
The association may reflect "counties that have their act together" rather than CHW effectiveness.
4
Mandating CHW programs for struggling counties won't give them the underlying capacity that drives success.
Recommendation
Don't expand until effect is identified

Next: Same data. Same statistical methods. Opposite conclusions. What explains the difference?

The Difference

Both analysts agree on the data. They disagree on what the data can tell us. The difference lies in one word: identification.

A

Analyst A's Question

Asks
"Is the association between CHWs and outcomes real?"
Assumes
"If we control for observable confounders, remaining association reflects causation."
Ignores
"Why did some counties get CHW programs and others didn't?"
B

Analyst B's Question

Asks
"Is the variation in CHWs independent of factors that affect outcomes?"
Assumes
"Counties that adopted CHWs differ in ways we cannot fully measure."
Requires
"Find variation in CHW adoption that's unrelated to underlying county capacity."

What Is Identification?

A causal effect is identified when the variation in treatment (here, CHW program intensity) is independent of the factors that also affect outcomes.

In this case, identification would require:

  • Variation in CHW programs driven by something other than county capacity or political will
  • For example: a policy that randomly assigned CHW funding, or a budget cutoff that affected otherwise-similar counties differently

Without identification, we can't distinguish:

  • "CHWs caused better outcomes" from
  • "Counties that adopt CHWs would have better outcomes anyway"

Next: Why does this distinction matter so much? Because policies based on unidentified effects may fail spectacularly when scaled.

Key Insight

The question "Is the association real?" is necessary but insufficient for policy. The question "Is the effect identified?" determines whether we can act on that association.

Causal diagram showing County Capacity affecting both CHW Program Adoption and Health Outcomes, with a dashed arrow from CHW Programs to Health Outcomes indicating unidentified causal effect

County capacity affects both program adoption and health outcomes. The CHW effect cannot be separated from selection effects.

Questions Economists Ask That Others Might Not

Selection: Why did some units get treatment? Could those reasons also affect outcomes?
Counterfactual: What would have happened to treated units if they hadn't been treated?
Identification: Is there variation in treatment that's independent of outcome determinants?
External validity: Even if the effect is real here, will it work the same when scaled elsewhere?

Key Takeaway

Statistical significance doesn't equal causal identification. An association can be real, robust, and replicated while still failing to identify a causal effect. The crucial question is whether treatment variation is independent of outcome determinants. Policy changes, randomization, and natural experiments can provide this independence. This is what economists mean by "identification."