A Branch and Bound Algorithm for Bilevel Optimization Utilizing Reinforcement Learning

Mixed integer bilevel optimization is a language for describing game-theoretic problems in which one decision-maker, called the leader, must decide on a course of action, taking into account the optimal response of a second decision-maker, called the follower. The most successful algorithms to date for solving such problems are variants of branch and bound that require the solution of a mixed integer linear problems (MILP) subproblems in order to determine the optimal response of the follower to various strategies of the leader. We devise a branch and bound that uses a reinforcement learning algorithm to derive approximate solutions to these MILP subproblems. The objective is to accelerate the computation of primal bounds, hopefully resulting in a reduction in the size of the search tree and an improvement in the solution time for larger instances. Since reinforcement learning does not produce solutions with a guarantee of optimality, the algorithm proceeds in two phases. In the first phase, the reinforcement learning is treated as an exact oracle, with nodes that would have been pruned instead saved for an optional second phase in which they are re-processed as needed. We test the algorithm on an interdiction binary knapsack bilevel problem