diff-diff

A Python library for Difference-in-Differences (DiD) causal inference - sklearn-like estimators with statsmodels-style outputs, built for econometricians, marketing analysts, and data scientists running campaign-lift, policy, and staggered-rollout analyses.

Installation

pip install diff-diff

For development:

git clone https://github.com/igerber/diff-diff.git
cd diff-diff
pip install -e ".[dev]"

Quick Start

import pandas as pd
from diff_diff import DifferenceInDifferences  # or: DiD

data = pd.DataFrame({
    'outcome': [10, 11, 15, 18, 9, 10, 12, 13],
    'treated': [1, 1, 1, 1, 0, 0, 0, 0],
    'post': [0, 0, 1, 1, 0, 0, 1, 1],
})

did = DifferenceInDifferences()
results = did.fit(data, outcome='outcome', treatment='treated', time='post')
print(results)              # DiDResults(ATT=3.0000, SE=1.7321, p=0.1583)
results.print_summary()     # full statsmodels-style table

Documentation

Quickstart - basic 2x2 DiD with column-name and formula interfaces, covariates, fixed effects, cluster-robust SEs
Choosing an Estimator - decision flowchart for picking the right estimator
Tutorials - hands-on Jupyter notebooks covering every estimator and design pattern
Troubleshooting - common issues and solutions
R Comparison | Python Comparison | Benchmarks - validation results vs did, synthdid, fixest
API Reference - full API for all estimators, results classes, diagnostics, utilities

For AI Agents

If you are an AI agent or LLM using this library, call diff_diff.get_llm_guide() for a concise API reference with an 8-step practitioner workflow (based on Baker et al. 2025). The workflow ensures rigorous DiD analysis - testing assumptions, running sensitivity analysis, and checking robustness, not just calling fit().

from diff_diff import get_llm_guide

get_llm_guide()                 # concise API reference
get_llm_guide("practitioner")   # 8-step workflow (Baker et al. 2025)
get_llm_guide("full")           # comprehensive documentation
get_llm_guide("autonomous")     # autonomous-agent variant

The guides are bundled in the wheel - accessible from a pip install with no network access. After estimation, call practitioner_next_steps(results) for context-aware guidance on remaining diagnostic steps.

For Data Scientists

Measuring campaign lift? Evaluating a product launch? Rolling out a policy in waves? diff-diff handles the causal inference so you can focus on the business question.

Which method fits my problem? - start from your business scenario (campaign in some markets, staggered rollout, survey data) and find the right estimator
Getting started for practitioners - end-to-end walkthrough from marketing campaign to causal estimate to stakeholder-ready result
Brand awareness survey tutorial - full example with complex survey design, brand funnel analysis, and staggered rollouts
Have BRFSS/ACS/CPS individual records? Use aggregate_survey() to roll respondent-level microdata into a geographic-period panel with inverse-variance precision weights for second-stage DiD

BusinessReport and DiagnosticReport are experimental preview classes that produce plain-English output and a structured to_dict() schema from any fitted result - wording and schema will evolve. See docs/methodology/REPORTING.md for usage and stability notes.

Practitioner Workflow (Baker et al. 2025)

For rigorous DiD analysis, follow these 8 steps. Skipping diagnostic steps produces unreliable results.

Define target parameter - ATT, group-time ATT(g,t), or event-study ATT_es(e). State whether weighted or unweighted.
State identification assumptions - which parallel trends variant (unconditional, conditional, PT-GT-Nev, PT-GT-NYT), no-anticipation, overlap.
Test parallel trends - simple 2x2: check_parallel_trends(), equivalence_test_trends(); staggered: inspect CS event-study pre-period coefficients (generic PT tests are invalid for staggered designs). Insignificant pre-trends do NOT prove PT holds.
Choose estimator - staggered adoption -> CS/SA/BJS (NOT plain TWFE); few treated units -> SDiD; factor confounding -> TROP; simple 2x2 -> DiD. Run BaconDecomposition to diagnose TWFE bias.
Estimate - estimator.fit(data, ...). Always print the cluster count first and choose inference method based on the result (cluster-robust if >= 50 clusters, wild bootstrap if fewer).
Sensitivity analysis - compute_honest_did(results) for bounds under PT violations (MultiPeriodDiD, CS, or dCDH), run_all_placebo_tests() for 2x2 falsification, specification comparisons for staggered designs.
Heterogeneity - CS: aggregate='group'/'event_study'; SA: results.event_study_effects / to_dataframe(level='cohort'); subgroup re-estimation.
Robustness - compare 2-3 estimators (CS vs SA vs BJS), report with and without covariates (shows whether conditioning drives identification), present pre-trends and sensitivity bounds.

Full guide: diff_diff.get_llm_guide("practitioner").

Estimators

DifferenceInDifferences - basic 2x2 DiD with robust/cluster-robust SEs, wild bootstrap, formula interface, and fixed effects
TwoWayFixedEffects - panel data DiD with unit and time fixed effects via within-transformation or dummies
MultiPeriodDiD - event study design with period-specific treatment effects for dynamic analysis
CallawaySantAnna - Callaway & Sant'Anna (2021) group-time ATT estimator for staggered adoption
ChaisemartinDHaultfoeuille - de Chaisemartin & D'Haultfœuille (2020/2022) for reversible (non-absorbing) treatments with multi-horizon event study, normalized effects, cost-benefit delta, sup-t bands, and dynamic placebos. The only library option for treatments that switch on AND off. Alias DCDH.
SunAbraham - Sun & Abraham (2021) interaction-weighted estimator for heterogeneity-robust event studies
ImputationDiD - Borusyak, Jaravel & Spiess (2024) imputation estimator, most efficient under homogeneous effects
TwoStageDiD - Gardner (2022) two-stage estimator with GMM sandwich variance
SyntheticDiD - Synthetic DiD combining standard DiD and synthetic control for few treated units
TripleDifference - triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility
ContinuousDiD - Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves
HeterogeneousAdoptionDiD - de Chaisemartin, Ciccia, D'Haultfœuille & Knau (2026) for designs where no unit remains untreated; local-linear estimator at the dose support boundary returning Weighted Average Slope (WAS) on Design 1' (d̲ = 0 / QUG) or WAS_{d̲} on Design 1 (d̲ > 0, continuous-near-d̲ or mass-point), with a multi-period event-study extension (last-treatment cohort, pointwise CIs). Panel-only in this release - repeated cross-sections rejected by the validator. Alias HAD.
StackedDiD - Wing, Freedman & Hollingsworth (2024) stacked DiD with Q-weights and sub-experiments
EfficientDiD - Chen, Sant'Anna & Xie (2025) efficient DiD with optimal weighting for tighter SEs
TROP - Triply Robust Panel estimator (Athey et al. 2025) with nuclear norm factor adjustment
StaggeredTripleDifference - Ortiz-Villavicencio & Sant'Anna (2025) staggered DDD with group-time ATT
WooldridgeDiD - Wooldridge (2023, 2025) ETWFE: saturated OLS, logit/Poisson QMLE (ASF-based ATT). Alias ETWFE.
BaconDecomposition - Goodman-Bacon (2021) decomposition for diagnosing TWFE bias in staggered settings

Diagnostics & Sensitivity

Parallel Trends Testing - simple and Wasserstein-robust parallel trends tests, equivalence testing (TOST)
Placebo Tests - placebo timing, group, permutation, leave-one-out
Honest DiD - Rambachan & Roth (2023) sensitivity analysis: robust CI under PT violations, breakdown values
Pre-Trends Power Analysis - Roth (2022) minimum detectable violation and power curves
Power Analysis - analytical and simulation-based MDE, sample size, power curves for study design

Survey Support

Most estimators accept an optional survey_design parameter (or survey= / weights= for HeterogeneousAdoptionDiD) for design-based variance estimation. Coverage and supported weight types vary by estimator - see the Survey Design Support compatibility matrix for the per-estimator support table.

Design elements available across the supported set: strata, PSU, FPC, lonely PSU handling, nest. Weight types vary by estimator: some surfaces (e.g. CallawaySantAnna, StackedDiD, the HAD continuous path) accept pweight only; others accept pweight / fweight / aweight.
Variance methods: Taylor Series Linearization (TSL via Binder 1983), replicate weights (BRR / Fay / JK1 / JKn / SDR), survey-aware bootstrap
Diagnostics: DEFF per coefficient, effective n, subpopulation analysis, weight trimming, CV on estimates
Repeated cross-sections: CallawaySantAnna(panel=False) for BRFSS, ACS, CPS

No other Python or R DiD package offers design-based variance estimation for modern heterogeneity-robust estimators.

Requirements

Python 3.9 - 3.14
numpy >= 1.20
pandas >= 1.3
scipy >= 1.7

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black diff_diff tests
ruff check diff_diff tests

References

This library implements methods from a wide body of econometric and causal-inference research. See the full bibliography on Read the Docs for citations spanning DiD foundations, modern staggered estimators, sensitivity analysis, and synthetic controls.

Citing diff-diff

If you use diff-diff in your research, please cite it:

@software{diff_diff,
  title = {diff-diff: Difference-in-Differences Causal Inference for Python},
  author = {Gerber, Isaac},
  year = {2026},
  url = {https://github.com/igerber/diff-diff},
  doi = {10.5281/zenodo.19646175},
  license = {MIT},
}

The DOI above is the Zenodo concept DOI - it always resolves to the latest release. To cite a specific version, look up its versioned DOI on the Zenodo project page.

See CITATION.cff for the full citation metadata.

Note on authorship: academic citation (CITATION.cff, the BibTeX above) lists individual authors with ORCIDs per scholarly convention. Package metadata surfaces (pyproject.toml, Sphinx docs) list "diff-diff contributors" to acknowledge the collective - see CONTRIBUTORS.md for the full list.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 2,100 Commits
.claude		.claude
.github		.github
benchmarks		benchmarks
carousel		carousel
diff_diff		diff_diff
docs		docs
rust		rust
tests		tests
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
.zenodo.json		.zenodo.json
BRIEFING.md		BRIEFING.md
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTORS.md		CONTRIBUTORS.md
LICENSE		LICENSE
METHODOLOGY_REVIEW.md		METHODOLOGY_REVIEW.md
README.md		README.md
ROADMAP.md		ROADMAP.md
TODO.md		TODO.md
diff-diff.png		diff-diff.png
llms.txt		llms.txt
paper.bib		paper.bib
paper.md		paper.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

diff-diff

Installation

Quick Start

Documentation

For AI Agents

For Data Scientists

Practitioner Workflow (Baker et al. 2025)

Estimators

Diagnostics & Sensitivity

Survey Support

Requirements

Development

References

Citing diff-diff

License

About

Uh oh!

Releases 62

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

diff-diff

Installation

Quick Start

Documentation

For AI Agents

For Data Scientists

Practitioner Workflow (Baker et al. 2025)

Estimators

Diagnostics & Sensitivity

Survey Support

Requirements

Development

References

Citing diff-diff

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 62

Uh oh!

Contributors

Uh oh!

Languages