CS-ECE Colloquium: Scalable Learning Over Distributions
A great deal of attention has been applied to studying new and better ways to perform learning tasks involving static finite vectors. Indeed, over the past century the fields of statistics and machine learning have amassed a vast understanding of various learning tasks like clustering, classification, and regression using simple real valued vectors. However, we do not live in a world of simple objects. From the contact lists we keep, the sound waves we hear, and the distribution of cells we have, complex objects such as sets, distributions, sequences, and functions are all around us. Furthermore, with ever-increasing data collection capacities at our disposal, not only are we collecting more data, but richer and more bountiful complex data are becoming the norm.
In this presentation we analyze regression problems where input covariates, and possibly output responses, are probability distribution functions from a nonparametric function class. Such problems cover a large range of interesting applications including learning the dynamics of cosmological particles and general tasks like parameter estimation.
However, previous nonparametric estimators for functional regression problems scale badly computationally with the number of input/output pairs in a data-set. Yet, given the complexity of distributional data it may be necessary to consider large data-sets in order to achieve a low estimation risk.