The Data
We analyzed how bank branch closures affect SNAP (food stamp) participation across 1,408 US counties. Bank closures might affect SNAP enrollment by increasing transaction costs for benefit delivery. Our parallel trends test passed with a remarkably high p-value. (Based on real analysis, simplified for illustration.)
Event Study: SNAP Participation After Bank Closures
View data table
| Event Time | ATT (pp) | 95% CI |
|---|---|---|
| e = -3 | -0.06 | [-1.27, +1.16] |
| e = -2 | -0.03 | [-1.17, +1.11] |
| e = -1 | +0.00 | [-1.08, +1.08] |
| e = 0 | -0.08 | [-0.90, +0.74] |
| e = 3 | -0.47 | [-0.90, -0.04] |
| e = 6 | -0.90 | [-1.64, -0.16] |
The crucial question: A p-value of 0.9997 seems like strong validation. Why might this be misleading?
The Problem: Statistical Power
The parallel trends test asks whether pre-treatment coefficients are jointly different from zero. With wide standard errors and few pre-periods, even meaningful violations can fail to be detected. A "passing" test may simply reflect low statistical power, not satisfied assumptions.
Pre-Treatment Coefficients
| Event Time | Coefficient (pp) | Standard Error | 95% CI |
|---|---|---|---|
| e = -3 | -0.056 | 0.620 | [-1.27, +1.16] |
| e = -2 | -0.030 | 0.582 | [-1.17, +1.11] |
| e = -1 | +0.000 | 0.549 | [-1.08, +1.08] |
Why Low Power Matters
The confidence intervals are about 2.2 percentage points wide. The treatment effect we estimated is only -0.47 pp. A pre-trend of similar magnitude would not be detectable.
With standard errors this large, pre-treatment coefficients between -1.0 and +1.0 would all "pass" the test. This is not evidence of parallel trends. It is evidence that we lack the power to detect violations.
Beyond the p-value: If the standard parallel trends test lacks power, how can we assess whether our causal claims are robust?
We need methods that explicitly characterize how sensitive our results are to violations.
Rambachan-Roth Sensitivity Analysis
Instead of assuming parallel trends hold exactly, Rambachan and Roth (2023) parameterize potential violations. The key parameter M measures how large violations can be relative to observed pre-treatment movement. Use the slider to see how identified bounds expand as we allow for larger violations.
Sensitivity of ATT to Parallel Trends Violations
One more check: Sensitivity analysis shows our result is fragile. Can we directly test whether county-specific trends explain the effect?
County-Specific Trends
An alternative robustness check: add county-specific linear time trends to absorb pre-existing trajectories. If the treatment effect survives, it is identified off deviations from each county's own trend. If it disappears, pre-existing trajectories explain the result.
Effect Under Different Specifications
| Specification | ATT (pp) | SE | p-value | Interpretation |
|---|---|---|---|---|
| Baseline (County + Year FE) | -0.47 | 0.22 | 0.036 | Significant |
| With County Trends | +0.003 | 0.016 | 0.87 | Effect disappears |
What This Tells Us
The entire "effect" is absorbed by county-specific trends. Counties that experienced bank closures were already on declining SNAP trajectories before treatment.
The treatment did not cause the decline. It happened in places that were already declining. This is selection into treatment, not a causal effect.
The lesson: A parallel trends test with p = 0.9997 did not protect us from bias. What should we do differently?
Key Insight
Passing a parallel trends test is not validation. It is one piece of evidence. These questions help assess whether your causal identification is credible.
Before Claiming Causality: A Checklist
-
How many pre-treatment periods do you have?
Three or fewer pre-periods often mean low power to detect violations. Consider whether your test can actually detect meaningful pre-trends.
-
How large are your pre-treatment standard errors?
Wide confidence intervals around pre-treatment coefficients mean even large violations would not be detected. Compare SE magnitude to your treatment effect.
-
What is your Rambachan-Roth breakdown M?
If M < 1, your effect is sensitive to violations smaller than observed pre-treatment movement. M > 1 suggests more robustness. M < 0.5 is a warning sign.
-
Does your effect survive unit-specific trends?
Adding unit-specific linear (or quadratic) trends absorbs pre-existing trajectories. If the effect disappears, you may have selection into treatment.
-
Do alternative diagnostic tests agree?
Fake timing tests, placebo treatments, and pre-trend extrapolation provide additional evidence. If they disagree with the joint test, investigate why.
Summary: What We Learned
| Test | Result | What It Actually Tells Us |
|---|---|---|
| Parallel trends joint test | p = 0.9997 | Low power with 3 pre-periods and SEs of 0.5-0.6 |
| Fake timing test | p = 0.04 | Correct warning signal that pre-trends exist |
| Rambachan-Roth sensitivity | M = 0.35 | Effect only robust to small violations |
| County-specific trends | Effect disappears | Pre-existing trajectories explain the result |
Key Takeaway
A high p-value on a parallel trends test often reflects low statistical power, not satisfied assumptions. When pre-treatment periods are limited and standard errors are wide, even large violations will not be detected. Sensitivity analysis (Rambachan-Roth) and alternative specifications (unit-specific trends) provide more informative evidence about whether your causal claims are robust. In this case, all three additional checks pointed in the same direction: the effect was not robust. Reporting the association is appropriate. Claiming causation is not.
References & Data Sources
The analysis in this lab draws on publicly available banking data and builds on established methods in the causal inference literature.
Data Source
Federal Reserve Bank of New York
Bank balance sheet and income statement data used in this analysis comes from the NY Fed's Banking Research Data repository. This comprehensive dataset tracks the financial health and branch networks of US banking institutions over time.
Access the Data →Successful Application
Correia, Luck & Verner (2026)
For an example of how banking data can support credible causal claims when the identification strategy is sound, see "Failing Banks" in the Quarterly Journal of Economics (Volume 141, Issue 1, pp. 147–204).
Read the Paper →Methods References
| Method | Citation | Key Contribution |
|---|---|---|
| Staggered DiD Estimator | Callaway & Sant'Anna (2021) | Handles heterogeneous treatment timing without negative weights |
| Sensitivity Analysis | Rambachan & Roth (2023) | Parameterizes violations to assess robustness of causal claims |
| Parallel Trends Testing | Roth (2022) | Documents low power of pre-trend tests in typical settings |