Randomized experiments are increasingly used to study political phenomena because they can credibly estimate the average effect of a treatment on a population of interest. But political scientists are often interested in how effects vary across subpopulations—heterogeneous treatment effects—and how differences in the content of the treatment affects responses—the response to heterogeneous treatments. Several new methods have been introduced to estimate heterogeneous effects, but it is difficult to know if a method will perform well for a particular data set. Rather than using only one method, we show how an ensemble of methods—weighted averages of estimates from individual models increasingly used in machine learning—accurately measure heterogeneous effects. Building on a large literature on ensemble methods, we show how the weighting of methods can contribute to accurate estimation of heterogeneous treatment effects and demonstrate how pooling models lead to superior performance to individual methods across diverse problems. We apply the ensemble method to two experiments, illuminating how the ensemble method for heterogeneous treatment effects facilitates exploratory analysis of treatment effects.
- [Replication materials](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/BQMLQW).
- [Software implementation](https://github.com/SolomonMg/HetSL) (under development)