Decision Theory for the Archetype Discovery Problem
Joint with José Luis Montiel Olea , Zhuoheng Xu, Haomin Yu,and Shunqi Zhang
Abstract: In the \emph{archetype discovery problem} a researcher wants to summarize $N$ heterogeneous policy effects of interest that vary over a discrete set of covariates. The goal is to partition the set of covariates into $K<N$ groups—the \emph{archetype sets}—and to provide a summary of the policy effects for each group. We use decision theory to show that, under a weighted mean-squared-error criterion, a procedure analogous to the \emph{Sorted Group Average Treatment Effects} (GATES) solves the archetype discovery problem. The key difference is that, in the optimal procedure, archetype sets are obtained by weighted $K$-means clustering of the $N$ heterogeneous policy effects, instead of relying on $K$ equally-spaced quantiles. We show that the procedure that minimizes average risk for a given prior can be obtained by clustering the different values of the posterior mean estimate of the policy effects of interest. Similarly, an approximately minimax procedure in large samples can be obtained by clustering a consistent estimator of the policy effects. In both of these cases, an exact solution to the weighted $K$-means clustering problem can be found using a simple and well-known dynamic programming algorithm.