A Python library for Difference-in-Differences (DiD) causal inference - sklearn-like estimators with statsmodels-style outputs, built for econometricians, marketing analysts, and data scientists running campaign-lift, policy, and staggered-rollout analyses.
pip install diff-diffFor development:
git clone https://github.com/igerber/diff-diff.git
cd diff-diff
pip install -e ".[dev]"import pandas as pd
from diff_diff import DifferenceInDifferences # or: DiD
data = pd.DataFrame({
'outcome': [10, 11, 15, 18, 9, 10, 12, 13],
'treated': [1, 1, 1, 1, 0, 0, 0, 0],
'post': [0, 0, 1, 1, 0, 0, 1, 1],
})
did = DifferenceInDifferences()
results = did.fit(data, outcome='outcome', treatment='treated', time='post')
print(results) # DiDResults(ATT=3.0000, SE=1.7321, p=0.1583)
results.print_summary() # full statsmodels-style table- Quickstart - basic 2x2 DiD with column-name and formula interfaces, covariates, fixed effects, cluster-robust SEs
- Choosing an Estimator - decision flowchart for picking the right estimator
- Tutorials - hands-on Jupyter notebooks covering every estimator and design pattern
- Troubleshooting - common issues and solutions
- R Comparison | Python Comparison | Benchmarks - validation results vs
did,synthdid,fixest - API Reference - full API for all estimators, results classes, diagnostics, utilities
If you are an AI agent or LLM using this library, call diff_diff.get_llm_guide() for a concise API reference with an 8-step practitioner workflow (based on Baker et al. 2025). The workflow ensures rigorous DiD analysis - testing assumptions, running sensitivity analysis, and checking robustness, not just calling fit().
from diff_diff import get_llm_guide
get_llm_guide() # concise API reference
get_llm_guide("practitioner") # 8-step workflow (Baker et al. 2025)
get_llm_guide("full") # comprehensive documentation
get_llm_guide("autonomous") # autonomous-agent variantThe guides are bundled in the wheel - accessible from a pip install with no network access. After estimation, call practitioner_next_steps(results) for context-aware guidance on remaining diagnostic steps.
Measuring campaign lift? Evaluating a product launch? Rolling out a policy in waves? diff-diff handles the causal inference so you can focus on the business question.
- Which method fits my problem? - start from your business scenario (campaign in some markets, staggered rollout, survey data) and find the right estimator
- Getting started for practitioners - end-to-end walkthrough from marketing campaign to causal estimate to stakeholder-ready result
- Brand awareness survey tutorial - full example with complex survey design, brand funnel analysis, and staggered rollouts
- Have BRFSS/ACS/CPS individual records? Use
aggregate_survey()to roll respondent-level microdata into a geographic-period panel with inverse-variance precision weights for second-stage DiD
BusinessReport and DiagnosticReport are experimental preview classes that produce plain-English output and a structured to_dict() schema from any fitted result - wording and schema will evolve. See docs/methodology/REPORTING.md for usage and stability notes.
For rigorous DiD analysis, follow these 8 steps. Skipping diagnostic steps produces unreliable results.
- Define target parameter - ATT, group-time ATT(g,t), or event-study ATT_es(e). State whether weighted or unweighted.
- State identification assumptions - which parallel trends variant (unconditional, conditional, PT-GT-Nev, PT-GT-NYT), no-anticipation, overlap.
- Test parallel trends - simple 2x2:
check_parallel_trends(),equivalence_test_trends(); staggered: inspect CS event-study pre-period coefficients (generic PT tests are invalid for staggered designs). Insignificant pre-trends do NOT prove PT holds. - Choose estimator - staggered adoption -> CS/SA/BJS (NOT plain TWFE); few treated units -> SDiD; factor confounding -> TROP; simple 2x2 -> DiD. Run
BaconDecompositionto diagnose TWFE bias. - Estimate -
estimator.fit(data, ...). Always print the cluster count first and choose inference method based on the result (cluster-robust if >= 50 clusters, wild bootstrap if fewer). - Sensitivity analysis -
compute_honest_did(results)for bounds under PT violations (MultiPeriodDiD, CS, or dCDH),run_all_placebo_tests()for 2x2 falsification, specification comparisons for staggered designs. - Heterogeneity - CS:
aggregate='group'/'event_study'; SA:results.event_study_effects/to_dataframe(level='cohort'); subgroup re-estimation. - Robustness - compare 2-3 estimators (CS vs SA vs BJS), report with and without covariates (shows whether conditioning drives identification), present pre-trends and sensitivity bounds.
Full guide: diff_diff.get_llm_guide("practitioner").
- DifferenceInDifferences - basic 2x2 DiD with robust/cluster-robust SEs, wild bootstrap, formula interface, and fixed effects
- TwoWayFixedEffects - panel data DiD with unit and time fixed effects via within-transformation or dummies
- MultiPeriodDiD - event study design with period-specific treatment effects for dynamic analysis
- CallawaySantAnna - Callaway & Sant'Anna (2021) group-time ATT estimator for staggered adoption
- ChaisemartinDHaultfoeuille - de Chaisemartin & D'Haultfœuille (2020/2022) for reversible (non-absorbing) treatments with multi-horizon event study, normalized effects, cost-benefit delta, sup-t bands, and dynamic placebos. The only library option for treatments that switch on AND off. Alias
DCDH. - SunAbraham - Sun & Abraham (2021) interaction-weighted estimator for heterogeneity-robust event studies
- ImputationDiD - Borusyak, Jaravel & Spiess (2024) imputation estimator, most efficient under homogeneous effects
- TwoStageDiD - Gardner (2022) two-stage estimator with GMM sandwich variance
- SyntheticDiD - Synthetic DiD combining standard DiD and synthetic control for few treated units
- TripleDifference - triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility
- ContinuousDiD - Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves
- HeterogeneousAdoptionDiD - de Chaisemartin, Ciccia, D'Haultfœuille & Knau (2026) for designs where no unit remains untreated; local-linear estimator at the dose support boundary returning Weighted Average Slope (WAS) on Design 1' (
d̲ = 0/ QUG) orWAS_{d̲}on Design 1 (d̲ > 0, continuous-near-d̲ or mass-point), with a multi-period event-study extension (last-treatment cohort, pointwise CIs). Panel-only in this release - repeated cross-sections rejected by the validator. AliasHAD. - StackedDiD - Wing, Freedman & Hollingsworth (2024) stacked DiD with Q-weights and sub-experiments
- EfficientDiD - Chen, Sant'Anna & Xie (2025) efficient DiD with optimal weighting for tighter SEs
- TROP - Triply Robust Panel estimator (Athey et al. 2025) with nuclear norm factor adjustment
- StaggeredTripleDifference - Ortiz-Villavicencio & Sant'Anna (2025) staggered DDD with group-time ATT
- WooldridgeDiD - Wooldridge (2023, 2025) ETWFE: saturated OLS, logit/Poisson QMLE (ASF-based ATT). Alias
ETWFE. - BaconDecomposition - Goodman-Bacon (2021) decomposition for diagnosing TWFE bias in staggered settings
- Parallel Trends Testing - simple and Wasserstein-robust parallel trends tests, equivalence testing (TOST)
- Placebo Tests - placebo timing, group, permutation, leave-one-out
- Honest DiD - Rambachan & Roth (2023) sensitivity analysis: robust CI under PT violations, breakdown values
- Pre-Trends Power Analysis - Roth (2022) minimum detectable violation and power curves
- Power Analysis - analytical and simulation-based MDE, sample size, power curves for study design
Most estimators accept an optional survey_design parameter (or survey= / weights= for HeterogeneousAdoptionDiD) for design-based variance estimation. Coverage and supported weight types vary by estimator - see the Survey Design Support compatibility matrix for the per-estimator support table.
- Design elements available across the supported set: strata, PSU, FPC, lonely PSU handling, nest. Weight types vary by estimator: some surfaces (e.g. CallawaySantAnna, StackedDiD, the HAD continuous path) accept
pweightonly; others acceptpweight/fweight/aweight. - Variance methods: Taylor Series Linearization (TSL via Binder 1983), replicate weights (BRR / Fay / JK1 / JKn / SDR), survey-aware bootstrap
- Diagnostics: DEFF per coefficient, effective n, subpopulation analysis, weight trimming, CV on estimates
- Repeated cross-sections:
CallawaySantAnna(panel=False)for BRFSS, ACS, CPS
No other Python or R DiD package offers design-based variance estimation for modern heterogeneity-robust estimators.
- Python 3.9 - 3.14
- numpy >= 1.20
- pandas >= 1.3
- scipy >= 1.7
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Format code
black diff_diff tests
ruff check diff_diff testsThis library implements methods from a wide body of econometric and causal-inference research. See the full bibliography on Read the Docs for citations spanning DiD foundations, modern staggered estimators, sensitivity analysis, and synthetic controls.
If you use diff-diff in your research, please cite it:
@software{diff_diff,
title = {diff-diff: Difference-in-Differences Causal Inference for Python},
author = {Gerber, Isaac},
year = {2026},
url = {https://github.com/igerber/diff-diff},
doi = {10.5281/zenodo.19646175},
license = {MIT},
}The DOI above is the Zenodo concept DOI - it always resolves to the latest release. To cite a specific version, look up its versioned DOI on the Zenodo project page.
See CITATION.cff for the full citation metadata.
Note on authorship: academic citation (CITATION.cff, the BibTeX above) lists individual authors with ORCIDs per scholarly convention. Package metadata surfaces (pyproject.toml, Sphinx docs) list "diff-diff contributors" to acknowledge the collective - see CONTRIBUTORS.md for the full list.
MIT License
