On the Asymptotic Properties of Debiased Machine Learning Estimators
Last version JMP version JMP Online Appendix
Abstract: This paper studies debiased machine learning (DML) under a novel asymptotic framework, providing insights that inform applied practice and explain simulation findings. DML is a two-step estimation method applicable to many econometric models where the parameter of interest depends on unknown nuisance functions. It uses $K$-fold sample splitting to estimate the nuisance functions and attains standard asymptotic properties under weaker conditions than classical semiparametric methods, accommodating flexible machine-learning estimators in the first step. Practitioners implementing DML confront two main decisions: whether to use DML1 or DML2 (the two variants of DML estimators), and how to choose $K$? Existing practice favors DML2 with large $K$ based on simulation evidence, but these recommendations lack theoretical justification, as existing theory shows both variants are asymptotically equivalent for any fixed $K$. Under an asymptotic framework in which $K$ grows with the sample size $n$, we demonstrate that DML2 offers theoretical advantanges over DML1 in terms of bias, mean squared error, and inference. We provide conditions under which increasing $K$ reduces DML2’s second-order asymptotic bias and MSE. These results support using DML2 with $K$ as large as feasible, and in particular with $K=n$, for which we propose a computationally simple procedure.