# did_multiplegt_stat **Repository Path**: econometric/did_multiplegt_stat ## Basic Information - **Project Name**: did_multiplegt_stat - **Description**: https://github.com/chaisemartinPackages/did_multiplegt_stat - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-04-24 - **Last Updated**: 2025-04-24 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # did_multiplegt_stat did_multiplegt_stat -- Estimation of heterogeneity-robust difference-in-differences (DID) estimators, with a binary, discrete, or continuous treatment or instrument, in designs with stayers, assuming that past treatments do not affect the current outcome. ([de Chaisemartin, C, D'Haultfoeuille, X, Pasquier, F, Sow, D, Vazquez-Bare, G, 2024](https://ssrn.com/abstract=4011782)). [Description](#Description) | [Setup](#Setup) | [Syntax](#Syntax) | [Options](#Options) [FAQ](#FAQ) | [Example](#Example) | [References](#References) | [Authors](#Authors) | [Contact](#Contact) | # Description **did_multiplegt_stat** estimates difference-in-differences estimators for continuous treatments with heterogeneous effects, assuming that between consecutive periods, the treatment of some units, the switchers, changes, while the treatment of other units does not change. It computes the three estimators (including an IV-related estimator) introduced in [de Chaisemartin, C, D'Haultfoeuille, X, Pasquier, F, Sow, D, Vazquez‐Bare, G (2024)](https://ssrn.com/abstract=4011782). The estimators computed by the command assume static effects and rely on a parallel trends assumptions. + **Data and design.** The command uses panel data at the (G,T) level to estimate heterogeneity-robust DID estimators, with a binary, discrete, or continuous treatment (or instrument). The command can be used in designs where there is at least one pair of consecutive time periods between which the treatment of some units, the switchers, changes, while the treatment of some other units, the stayers, does not change. + **Target parameters.** The command can estimate the Average Slope (AS) and the Weighted Average Slope (WAS) parameters introduced in de Chaisemartin et al (2022). The AS is the average, across switchers, of (Y_t(D_t)-Y_t(D_{t-1})/(D_t-D_{t-1}), the effect on their period-t outcome of moving their period-t treatment from its period-(t-1) to its period-t value, scaled by the difference between these two values. The WAS is a weighted average of switchers' slopes (Y_t(D_t)-Y_t(D_{t-1})/(D_t-D_{t-1}), where slopes receive a weight proportional to |D_t-D_{t-1}|, switchers' absolute treatment change from period-(t-1) to period-t. The variance of the WAS estimator is often smaller than that of the AS estimator, especially when there are switchers that experience a small treatment change. The WAS estimator is also amenable to doubly-robust estimation, unlike the AS estimator. + **Assumptions.** When the data has more than two time periods, the command assumes a static model: units' outcome at period t only depends on their period-t treatment, not on their lagged treatments. See the did_multiplegt_dyn command for estimators allowing for dynamic effects. The command also makes a parallel trends assumption: the counterfactual outcome evolution switchers would have experienced if their treatment had not changed is assumed to be equal to the outcome evolution of stayers with the same baseline treatment. Importantly, this parallel-trends assumption is conditional on the baseline treatment: comparing switchers and stayers with different baseline treatments would implicitly amount to assuming that the treatment's effect is constant over time. To test the parallel trends assumption underlying the estimators, the command can compute placebo estimators comparing the outcome evolution of switchers and stayers with the same baseline treatment before switchers' treatment changes. + **Estimators, when the exact_match option is specified.** With a binary or discrete treatment, if the exact_match option is specified, the estimators computed by the command compare the outcome evolution of switchers and stayers with the same period-(t-1) treatment. Then, the WAS estimator computed by did_multiplegt_stat is numerically equivalent to the DID_M estimator proposed by de Chaisemartin and D'Haultfoeuille (2020), and already computed by the did_multiplegt_old command. did_multiplegt_stat uses an analytic formula to compute the estimator's variance, while did_multiplegt_old uses the bootstrap. Thus, the run time of did_multiplegt_stat is typically much lower. The exact_match option can only be specified when the treatment is binary or discrete: with a continuously distributed treatment, one cannot find switchers and stayers with the exact same period-(t-1) treatment. With a discrete treatment taking a large number of values, specifying this option may be undesirable: then, there may only be few switchers that can be matched to a stayer with the exact same period-(t-1) treatment, thus restricting the estimation sample. + **Estimators, when the exact_match option is not specified.** When the exact_match option is not specified, the command can use a regression adjustment to recover switchers' counterfactual outcome evolution: for all t, it runs an OLS regression of Y_t-Y_{t-1} on a polynomial in D_{t-1} in the sample of (t-1)-to-t stayers, and uses that regression to predict switchers' counterfactual outcome evolution. Alternatively, when it estimates the WAS, the command can also use propensity-score reweighting to recover switchers' counterfactual outcome evolution. First, for all t it estimates a logistic regression of an indicator for (t-1)-to-t switchers on a polynomial in D_{t-1}, to predict units' probability of being a switcher. Then, it computes a weighted average of stayers' outcome evolution, upweighting stayers with a large probability of being switchers, and downweighting stayers with a low probability of being switchers. Finally, when it estimates the WAS, the command can also combine regression-adjustment and propensity-score reweighting, thus yielding a doubly-robust estimator. + **Instrumental-variable case.** There may be instances where the parallel-trends assumption fails, but one has at hand an instrument satisfying a similar parallel-trends assumption. For instance, one may be interested in estimating the price-elasticity of a good's consumption, but prices respond to supply and demand shocks, and the counterfactual consumption evolution of units experiencing and not experiencing a price change may therefore not be the same. On the other hand, taxes may not respond to supply and demand shocks and may satisfy a parallel-trends assumption. In such cases, the command can compute the IV-WAS estimator introduced in de Chaisemartin et al (2022). The IV-WAS estimator is equal to the WAS estimator of the instrument's reduced-form effect on the outcome controlling for D_{t-1}, divided by the WAS estimator of the instrument's first-stage effect on the treatment controlling for D_{t-1}. See de Chaisemartin et al (2024) for some explanations as to why controlling for D_{t-1} is desirable in IV estimation. # Setup ### Stata ```s net install did_multiplegt_stat, from("https://raw.githubusercontent.com/chaisemartinPackages/did_multiplegt_stat/main/STATA") replace ``` ### R ```s library(devtools) install_github("chaisemartinPackages/did_multiplegt_stat/R", force = TRUE) ``` # Syntax ## Stata [**bysort varlist:**] **did_multiplegt_stat Y G T D** [**Z**] [*if*] [*in*] [, **estimator**(string) **as_vs_was exact_match estimation_method**(*string*) **order**(*#/####/########*) **controls**(*varlist*) **weights**( *varname*) **cluster**(*varlist*) **noextrapolation by_fd**(*#*) **by_baseline**(*#*) **other_treatments**( *varlist*) **switchers**(*string*) **placebo**(*#*) **disaggregate graph_off bys_graph_off bootstrap**(*#*) **seed**(*#*) **cross_validation**(*cv_suboptions*) **twfe**(*twfe_suboptions*)] ## R did_multiplegt_stat(df, Y, ID, Time, D, Z = NULL, estimator = NULL, estimation_method = NULL, order = 1, noextrapolation = FALSE, placebo = NULL, weight = NULL, switchers = NULL, disaggregate = FALSE, aoss_vs_waoss = FALSE) # Options + **Main options:** - **estimator**(*string*) gives the name(s) of the estimator(s) to be estimated. The allowed arguments are: (1) as, (2) was, and (3) iv-was. - **exact_match:** with this option, the DID estimators computed by the command compare the outcome evolution of switchers and stayers with the same period-(t-1) treatment (or instrument) value. This option can only be used when the treatment (or instrument) is binary or discrete: with a continuously distributed treatment (or instrument), one cannot find switchers and stayers with the exact same period-(t-1) treatment (or instrument). With a discrete treatment taking a large number of values, specifying this option may be undesirable: then, there may only be few switchers that can be matched to a stayer with the exact same period-(t-1) treatment, thus restricting the estimation sample. - **estimation_method**(*string*): when the exact_match option is not specified and estimation of the WAS or IV-WAS is requested, this option can be used to specify which estimation method to use when estimating the WAS or IV-WAS. The allowed arguments are: (1) ra (regression adjustment), (2) ps (propensity-based reweighting), and (3) dr (doubly-robust). By default, a doubly-robust estimator is used, when WAS or IV-WAS is requested, and the regression adjustment estimator if AS is requested. - **order**(*#/####/########*): when the exact_match option is not specified, this option specifies the polynomial orders to be used in the OLS regressions of Y_t-Y_{t-1} on a polynomial in D_{t-1} and/or in the logistic regressions of an indicator for (t-1)-to-t switchers on a polynomial in D_{t-1}. This option allows for either 1, 4, or 8 arguments, with 8 arguments only allowed when IV-WAS requested. E.g.: order(1), order(1 4 3 2) or order(1 4 3 2 1 2 3 4) are allowed (the last one only if IV-WAS specified), but order(1 2 3) is not allowed. With 4 arguments, argument 1, 2, 3 and 4, is the order used to estimate E(Y_t-Y_{t-1}|D_{t-1}), P(S_{t}=0|D_{t-1}), P(S_{+, t}=1|D_{t-1}), and P(S_{-, t}=1|D_{t-1}), respectively. With 8 arguments the same logic is applied, but the first 4 arguments are for the first stage, and the next 4 for the reduced form. Finally, if IV-WASis requested but order has 4 arguments, we apply the same orders to first stage and reduced form. By default, a polynomial of order 1 is used. - **placebo**(#): when this option is specified, the command computes the placebo version of each estimator requested. Actual estimators compare the t-1-to-t outcome evolution of period t-1-to-t switchers and stayers with the same baseline treatment. When # is equal to 1, placebo estimators compare the t-2-to-t-1 outcome evolution of period t-1-to-t switchers and stayers with the same baseline treatment, restricting attention to t-2-to-t-1 stayers. Thus, placebos assess whether switchers and stayers were on parallel trends just before switchers switched treatment. When # is strictly larger than 1, placebos comparing the outcome evolutions of t-1-to-t switchers and stayers from t-3 to t-2, from t-4 to t-3,... , and from t-#-1 to t-# are also reported, always restricting attention to stayers between those pairs of periods. - **as_vs_was**: shows a test that the AS and WAS are equal. This option can only be used when estimation of the AS and WAS is requested. - **controls**(*varlist*): the command can compute estimators with control variables. They rely on a conditional parallel trends assumption: the counterfactual outcome evolution switchers would have experienced if their treatment had not changed is assumed to be equal to the outcome evolution of stayers with the same baseline treatment, and with the same value of varlist. When time-varying control variables are inputted to the command, the command compares the t-1-to-t outcome evolution of switchers and stayers with the same baseline treatment, and with the same controls at period t-1. Specifying too many control variables may lead to noisy estimators. If placebo estimators are small, insignificant and precisely estimated without control variables, including control variables may not be necessary. - **weights**(*varname*) : This option allows to compute estimators weighted by varname. + **Options to estimate heterogeneous treatment effects** - **switchers**(*string*): if the argument up is inputted, the command estimates the AS, WAS, or IV-WAS for switchers-up only, i.e for units whose treatment (or instrument) increases from period (t-1) to t. If the argument down is inputted, the command estimates the AS, WAS, or IV-WAS for switchers-down only, i.e. for units whose treament (or instrument) decreases from period (t-1) to t. By default, the command estimates those parameters for all switchers. - **by_fd**(*#*): This option can be used if one wants to assess the heterogeneity of the effect according to the absolute value of the treatment's (or instrument's) change. For example, if by_fd(5) is specified, the command splits switchers into 5 groups: the 20% with the lowest |Delta D_t| (or |Delta Z_t|), and then the next 20%, etc.. Then the command estimates treatment effects for each subsample. If |Delta D_t| has mass points, the command splits switchers into groups with as-equal-as-possible sizes. - **by_baseline**(*#*): This option is similar to the option by_fd(#), except that switchers are split into subsamples according to their values of D_{t-1} (or Z_{t-1}). - **[bysort varlist:]** makes did_multiplegt_stat byable. See [D] by. Only time-invariant variables are allowed in varlist. + **Standard-error options** - **bootstrap**(*#*): If the number of switchers or the number of stayers is low, one may want to check if the analytic standard errors produced by the command are close to bootstrap standard errors. If they are not, this may indicate that the asymptotic approximation underlying the analytic standard errors may not be reliable. In that case, there is of course no guarantee that bootstrap standard errors are valid: this comparison is just a diagnostic check researchers may use to assess if they need to resort to inference methods that do not rely on asymptotic approximations, like permutation tests. The bootstrap option takes as argument the number of replications. Currently, it is only allowed when the IV-WAS is requested, as failure of asymptotic approximations are more likely to arise with IV estimators, when the first-stage is weak. - **seed**(*#*): This option is only needed when one is using the bootstrap, and it allows to set the seed so as to ensure replicability of the results. - **cluster**(*varlist*) : This option allows clustering standard errors at the level of varlist. + **Advanced options** - **other_treatments**(*varlist*) : This option allows controlling for other treatments (in varlist ) that may also change over the panel, see de Chaisemartin and D'Haultfoeuille (2021) for further details. - **noextrapolation**: when this option is specified, the command only keeps switchers whose period-(t-1) treatment (or instrument) is between the minimum and the maximum values of the period-(t-1) treatment (or instrument) of stayers, thus enforcing the overlap condition. + **TWFE Comparison** The command allows to compare the estimator specified in estimator() to a TWFE-estimator, computed via a regression of Y_{i,t} on D_{i,t} and unit and year fixed effects. If the iv-was is specified in estimator(), a 2SLS-TWFE is used, using Z_{i,t} as the instrument. With the option twfe, did_multiplegt_stat, on top of the main results, diplays a table showing the difference between the estimator requested and the TWFE-estimator, the p-value of the test of the difference and the corresponding condidence interval. By default, the command runs the test using 100 bootstrap replications. To increase the number of replications, the user can use the option bootstrap(#) of did_multiplegt_stat. Also for replicability you should consider using the seed(#) option. To use the twfe(**twfe_suboptions**) option you need to specify which sample you want to estimate the TWFE regression on, so you should always specify either the full_sample or the same_sample suboption. - **same_sample:** Sometimes, did_multiplegt_stat might not use all time periods in the estimation. For instance, one might have a panel data where at a particular time (say p) there is no switcher. In that case, did_multiplegt_stat does not use the pair of periods (p-1, p). Then, the ( as, was, or iv-was) estimator and the TWFE-estimator will rely on different samples. To avoid such discrepancy, the option same_sample allows to estimate the TWFE-estimator using the same sample as did_multiplegt_stat. - **full_sample:** Counterpart to the same_sample option. Use this in case you do not want to impose the sample restrictions described in same_sample and estimate the TWFE regression in the full sample instead. - **percentile:** By default, did_multiplegt_stat computes the s.e and p-value of the test of the difference between the (as, was, or iv-was) estimator and the TWFE-estimator using a t-test and a normal approximation. Instead, one may use the percentile boostrap. Then, one can specify the option percentile to compute p-values and confidence intervals using the percentile bootstrap method. + **Cross-validation**: If the treatment is continuous (or if the option exact_match is not specified), and the doubly-robust WAS estimator is used, instead of specifying the order of the polynomial series that did_multiplegt_stat uses to estimate E(Y_t-Y_{t-1}|D_{t-1}), P(S_{t}=0|D_{t-1}), P(S_{+, t}=1|D_{t-1}), and P(S_{-, t}=1|D_{t-1}), one may use cross validation. Then, the command will choose the polynomial order with the best fit. This option can only be used to compute the doubly-robust WAS estimator: cross-validation does not have proven theoretical guarantees for the other estimators. This option can also not be used together with the by_fd and by_baseline options. *To use cross validation you have to specify cross_validation(algorithm(string) cv_suboptions). The algorithm(string) suboption is required for the cross_validation(cv_suboptions) to function and has therefore to be specified in any case.* - **algorithm**(*string*): This option specifies which cross-validation algorithm to use. The allowed arguments are loocv (leave-one-out) and kfolds. By default, loocv is used in linear regressions and kfolds in logit regressions. The leave-one-out method can only be used in linear regressions. - **tolerance**(*#*): This option allows to set a stop criterion based on the gain in prediction power. By default, tolerance is set to 0.01, i.e., the cross-validation stops when the gain in prediction power when increasing the polynomial order is less than 1%. - **max_k**(*#*): This is another stop criterion based on the maximum order to test (the grid-search of the hyperparameter). By default, the value is set to 5, meaning that the algorithm will look for a best model starting from a polynomial of order 1 to a polynomial of order 5 as long as the tolerance is not reached. - **kfolds**(*#*): If kfolds is specified in algorithm(), this option specifies the number of folds to consider. By default, the number of folds is set to 5. - **same_order_all_logits**: When this option is specified, the cross-validation is done for only P(S_{t}=0|D_{t-1}), and the optimal order found is used to predict P(S_{+, t}=1|D_{t-1}) and P(S_{-, t}=1|D_{t-1}). - **seed**(*#*): This option allows to set the seed so as to ensure replicability of the results. + **Display** - **disaggregate**: when this option is specified, the command shows the estimated AS, WAS, or IV-WAS effects for each pair of consecutive time periods, on top of the effects aggregated across all time periods. By default, the command only shows effects aggregated across all time periods. - **graph_off**: The program displays by default a graph of the aggregated results (coefficients, effects and placebos, and their confidence intervals). If graph_off is specified, the graph is not displayed. - **bys_graph_off**: If the program is by'd (i.e. ran with **bysort varlist:**), or used with the option by_fd(#) or by_baseline(#), it automatically displays a graph of the aggregated results (coefficients and confidence intervals) by level of varlist, or quantiles, respectively. If **bys_graph_off** is specified, the graph is not displayed. # FAQ TBD # Example In the following example, we use data from Li et al. (2014). The dataset can be downloaded from the ApplicationData GitHub Repository.We first estimate the effect of gasoline taxes on gasoline consumption and prices. Then, we estimate the price-elasticity of gasoline consumption using taxes as an instrument. ## Stata ```s use "https://github.com/chaisemartinPackages/ApplicationData/raw/main/data_gazoline.dta", clear // Example 1 // did_multiplegt_stat lngca id year tau, or(2) estimator(aoss waoss) estimation_method(dr) aoss_vs_waoss placebo noextra // Example 2 // did_multiplegt_stat lngpinc id year tau, or(2) estimator(aoss waoss) estimation_method(dr) aoss_vs_waoss placebo noextra // Example 3 // did_multiplegt_stat lngca id year lngpinc tau, or(2) estimator(iwaoss) estimation_method(ra) placebo noextra ``` ## R ```s library(haven) gazoline <- haven::read_dta("https://github.com/chaisemartinPackages/ApplicationData/raw/main/data_gazoline.dta") # Example 1 summary(did_multiplegt_stat(df = gazoline, Y = "lngca", ID = "id", T = "year", D = "tau", order = 2, estimator = c("aoss", "waoss"), estimation_method = "dr", aoss_vs_waoss = TRUE, placebo = TRUE, noextrapolation = TRUE)) # Example 2 summary(did_multiplegt_stat(df = gazoline, Y = "lngpinc", ID = "id", T = "year", D = "tau", order = 2, estimator = c("aoss", "waoss"), estimation_method = "dr", aoss_vs_waoss = TRUE, placebo = TRUE, noextrapolation = TRUE)) # Example 3 summary(did_multiplegt_stat(df = gazoline, Y = "lngca", ID = "id", T = "year", D = "lngpinc", Z = "tau", order = 2, estimator = "iwaoss", estimation_method = "ra", placebo = TRUE, noextrapolation = TRUE)) ``` ## Disclaimer The ending results may vary between R and Stata (especially for the IWAOSS estimation) due to the different conventions adopted for logistic regressions by the glm and logit functions, repsectively. # References de Chaisemartin, C, D'Haultfoeuille, X, Pasquier, F, Sow, D, Vazquez‐Bare, G (2024). [Difference-in-Differences for Continuous Treatments and Instruments with Stayers](https://ssrn.com/abstract=4011782) The development of this package was funded by the European Union (ERC, REALLYCREDIBLE,GA N°101043899). # Authors + Clément de Chaisemartin, Economics Department, Sciences Po, France. + Diego Ciccia, Sciences Po, France. + Xavier D'Haultfoeuille, CREST-ENSAE, France. + Felix Knau, Sciences Po, France. + Felix Pasquier, CREST-ENSAE, France. + Doulo Sow, Sciences Po, France. + Gonzalo Vazquez-Bare, UCSB, USA. # Contact chaisemartin.packages@gmail.com