Approximation theory and regularization for deep learning
This talk introduces new approximation theories for deep learning in parallel computing and high dimensional problems. We will explain the power of function composition in deep neural networks and characterize the approximation capacity of shallow and deep neural networks for various functions on a high-dimensional compact domain. Combining parallel computing, our analysis leads to an important point of view, which was not paid attention to in the literature of approximation theory, for choosing network architectures, especially for large-scale deep learning training in parallel computing: deep is good but too deep might be less attractive. Our analysis also inspires a new regularization method that achieves state-of-the-art performance in most kinds of network architectures.