Primal-Dual Pi Learning Using State and Action Features
We survey recent advances on the complexity and methods for solving Markov decision problems (MDP) and Reinforcement Learning (RL) with finitely many states and actions - a basic mathematical model for reinforcement learning.
For model reduction of large scale MDP in reinforcement learning, we propose a bilinear primal-dual pi learning method that utilizes given state and action features. The method is motivated from a saddle point formulation of the Bellman equation. The sample complexity of bilinear pi learning depends only on the number of parameters and is variant with respect to the dimension of the problem.
In the second part we study the statistical state compression of general Markov processes. We propose a spectral state compression method for learning the state features from data. The state compression method is able to "sketch" a black-box Markov process from its empirical data and output state features, for which we provide both minimax statistical guarantees and scalable computational tools.