On randomized Decision Trees: new training strategies

Recently, a new paradigm has been proposed in the literature for building randomized multivariate classification and regression trees. Such trees are characterized by soft probabilistic splitting rules, which make them more general and informative. First we investigate a nonlinear continuous formulation proposed by Blanquero et al. (EJOR Vol. 284, 2020) for building sparse optimal randomized classification trees, where sparsity is induced by the l1 and l∞ norms. We consider alternative methods to sparsify such trees based on concave approximations of the l0 ”norm”, we derive bounds on the VC dimension of multivariate randomized classification trees, and we propose a simple decomposition training method suited to reduce training times on large dimensional instances, without compromising the accuracy. Then, we present a variant of a multivariate randomized regression trees formulation proposed by Blanquero et al. (EJOR Vol. 299, 2022). The formulation is well suited not only to decomposition but also to induce fairness measures. The proposed decomposition training algorithm includes a specific initialization strategy and a heuristic for the reassignment of the input vectors along the branching nodes of the tree. Under mild assumptions, we also establish asymptotic convergence guarantees. Computational results are reported and compared with those of the original version and of an alternative mixed-integer linear optimization method