The Data

Each dot represents a California county. Counties with higher food insecurity rates tend to have higher diabetes hospitalization rates. Is this evidence that food insecurity worsens diabetes outcomes? (Data are simulated for illustration.)

Food Insecurity Rate vs Diabetes Hospitalizations

Low Poverty
High Poverty

Next: The correlation is clear. But can we conclude that food insecurity causes worse diabetes outcomes?

The Problem

Counties with high food insecurity also tend to have high poverty rates. That's not a coincidence. Poverty limits access to food. But poverty also restricts access to healthcare, medications, and disease management resources. And here is the deeper problem: illness itself can cause poverty, creating a vicious cycle.

Causal diagram showing Poverty affecting both Food Insecurity and Diabetes Hospitalization, with a dashed arrow from Food Insecurity to Diabetes Hospitalization indicating uncertain causal effect

Arrows show potential causal relationships. The dashed arrow is what we want to know about.

Bidirectional Causation

Bidirectional causation occurs when:

  • X causes Y (food insecurity may worsen diabetes)
  • Y causes X (diabetes complications may cause job loss, medical debt, and poverty)

With cross-sectional data, we cannot distinguish these directions. We see both X and Y at the same moment in time.

Next: Can we isolate the food insecurity effect by comparing counties with similar poverty levels?

Seeing the Confounder

If poverty drives both food insecurity and hospitalization, then comparing counties within the same poverty level should show a weaker relationship. Use the filters below to stratify the data.

Filter by Poverty Level:

Next: The effect shrinks when we account for poverty. But what about factors we couldn't measure?

What We Can't Measure

Stratification helped with what we measured. But some factors that influence both food insecurity and diabetes hospitalization don't appear in any dataset. And the direction of causation itself remains uncertain.

Causal diagram showing both measured (Poverty) and unmeasured (Health Literacy, Transportation Access) confounders affecting food insecurity and diabetes hospitalization

Health Literacy & Transportation Access

Counties with low health literacy may have populations who struggle both to navigate food assistance programs AND to manage complex chronic diseases—creating a spurious link between food insecurity and hospitalization.

Similarly, limited transportation affects both grocery store access (driving food insecurity) AND access to outpatient diabetes care (driving hospitalizations).

These aren't in any standard dataset. We can't stratify by them. We can't adjust for them. Yet they could be driving both the "exposure" and the "outcome."

This is the problem of unmeasured confounding. No matter how carefully we adjust for what we can see, there may be hidden factors we can't account for.

The implication: Even if we had perfect data on poverty, we still couldn't be sure the remaining food insecurity effect is causal. Unmeasured confounding can't be fixed by collecting more of the same type of data. It requires a different approach entirely.

Unmeasured Confounding

Unmeasured confounding occurs when a variable that affects both treatment and outcome is not observed in the data. Unlike measured confounders (which we can adjust for), unmeasured confounders remain invisible threats to causal claims.

Statistical methods cannot solve this problem. The solution requires finding variation in treatment that is independent of the confounder, often through study design rather than statistical adjustment.

Next: What questions should we ask to identify these hidden threats?

Questions to Consider

These questions help identify confounding threats that statistical adjustment cannot fix. They won't prove causation, but they'll reveal where the analysis is most vulnerable to bias.

What else differs between these counties?

Counties with high food insecurity might differ from other counties in many ways beyond just food access.

What other characteristics might explain why some counties have both high food insecurity AND high diabetes hospitalization?

Is this a fair comparison?

Cross-sectional data shows us a snapshot, not a story. We're comparing different counties at one point in time.

Are food-insecure counties truly comparable to food-secure counties? What would make them more comparable?

Which came first?

In this data, we see food insecurity and hospitalization rates at the same time. We don't know the sequence.

Did food insecurity lead to higher hospitalizations, or do counties with more hospitalizations have sicker populations who then face food insecurity?

What study design would help?

This observational data has limitations. A different approach might give stronger evidence.

What kind of study would give you more confidence that food insecurity actually causes diabetes hospitalization?

Concepts Demonstrated in This Lab

Confounding: when a third variable affects both treatment and outcome
Bidirectional causation: when X may cause Y and Y may also cause X
Unmeasured confounding: hidden factors we can't adjust for
Cross-sectional data: a single snapshot in time, not a before/after comparison
Identification: finding treatment variation that is independent of confounders

Key Takeaway

No amount of statistical adjustment can fix a flawed comparison. When unmeasured factors drive both treatment and outcomes, the solution isn't better adjustment—it's finding sources of treatment variation that operate independently of unmeasured confounders. Policy changes, eligibility cutoffs, and timing differences can provide this. This is what economists mean by "identification."