Queen Mary University of London
My background lies in physics and statistical mathematics with a later specialization in optimization in the fields of Reinforcement Learning (RL) and Causal Inference. My first encounters with RL occurred during my Masters when studying how to create strong policies in perfect information games using algorithms, such as MinMax, MCTS, DQN, and later AlphaZero variants. My favorite game application remains the board game ‘Stratego’. In the meantime I investigated the estimation of causal parents influencing a target variable from interventional datasets for my Master’s thesis. Specifically, how well Deep Learning estimations could replace exponentially scaling graph search methods with approximations requiring only polynomial runtime.
A description of Michael's research:
My research focuses on the state-of-the-art in game-playing solutions for imperfect information games (think games like Poker, Stratego, Liar’s Dice etc.). I am particularly interested in the application of No-Regret (and related) methods which seek to learn those actions that provided the most benefit (or least regret) compared to the benefit all possible actions provided on average. These methods learn such via iterative play to find a Nash-Equilibrium (NE), a game-theoretic concept comparable to an optimal policy known from Single-Agent RL, but for all partaking players at once. Particularly, variants of Counterfactual Regret Minimization (CFR) remain the state-of-the-art algorithms for computing NEs in 2-player zero-sum games due to their success in tabular form so far. Yet, prohibitive complexity and memory scaling bars them from large-scale applications. Hence, research of recent years seeks to couple CFR (and other No-Regret methods) with function approximation, such as Deep Learning, to scale up the size of applicable games with already notable successes (Deepstack, Libratus, Pluribus, DeepNash). My research seeks to contribute to this endeavour by first analyzing the specifics of established methods and finding ways to introduce Hierarchical RL concepts to No-Regret learning.