Intrinsic geometry and latent low dimensionality in data: manifolds and beyond
Many modern datasets exhibit low-dimensional organization despite living in high ambient dimensions, yet this structure is often implicit, noisy, and not explicitly modeled. In this talk, I discuss statistical inference in such settings through two complementary examples. I first present recent theoretical results on Gaussian process regression for data supported on unknown low-dimensional structures. Using a real-domain small-bandwidth analysis, I show how intrinsic geometry governs approximation and posterior contraction behavior without relying on spectral or Laplacian machinery, leading to adaptive rates driven by intrinsic rather than ambient dimension. I then turn to a biological application in single-cell genomics, where local neighborhood graphs are used to define optimal-transport-based gene affinities and reveal gene trajectories and underlying biological processes. These examples illustrate how latent geometric structure, when present but only weakly revealed by data, can nonetheless be harnessed to enable reliable statistical inference in high dimensions.





