My research is motivated by the challenge of extracting useful information from large-scale and high-dimensional data in sample-starved or resource-starved environments. I am interested exploiting and developing low-dimensional data representations that reveal geometric structures for reducing the sampling cost of signal acquisition and improving the performance of decision making. Using tools in statistical signal processing, machine learning, optimization, and information theory, I develop provably efficient algorithms, both statistically and computationally, for sensing and imaging applications broadly spanning network inference, inverse problems, optical imaging, and streaming data processing.

Some particular problems of current interest are

  • inverse methods for high-dimensional mixture and bilinear models: identifiability, fundamental limits, parameter estimation under interference and uncertainties, blind deconvolution and dictionary learning;

  • statistical inference of high-dimensional streaming data: covariance estimation, subspace tracking, sketching and sparse sampling, online learning, handling missing data and corruptions;

  • anomaly detection, detection and estimation for network inference and radar applications;

  • biomedical signal processing and bioinformatics: super-resolution microscopy imaging, healthcare data;

High-Resolution Parameter Estimation

The goal of source localization is to invert the source locations of the wavefields that produced the image acquired at the sensor suite, where the image broadly takes the form of a time series, a space series, a space-time series, a 2-D image, and so on. Typically, one is interested in super-resolution, such that the desired resolution for source locations exceeds the temporal or spatial resolution of the image itself. Conventional methods assume a separable linear model, wherein a sparse parametric modal representation for the source signal is posited and estimates of linear parameters (complex amplitudes of modes) and nonlinear mode parameters (frequency, wavenumber, delay, and/or Doppler) are extracted.

Compressive Sensing (CS) suggests that if a signal can be represented in some basis with a few nonzero coefficients (i.e. admits a sparse representation), then it can be reconstructed from a much smaller number of samples than the ambient dimension. This opens up exciting possibilities in signal processing to consider sparsity as a new prior in algorithm design to increase resolution and reduce complexity. However, the success of CS relies on the assumption that the signal is sparse in an a priori known basis, yet, in spectrum analysis or parameter estimation problems in radar and sonar, there is an inevitable mismatch between the basis assumed in CS and the true basis imposed by the physics of scattering. The performance of CS degenerates considerably in the presence of this basis mismatch, no matter how fine the grid is in the assumed basis.

To address this problem, we develop novel nonparametric algorithms for spectrum estimation from incomplete data, based on structured matrix completion and atomic norm minimization. The proposed algorithms discover and make advantages of structured low-rank matrices embedded in the data, including Hankel and Toeplitz structures due to shift invariance of complex harmonics. Many emerging applications in applied science and engineering generate signals with multiple modes, where each mode is governed by the respective physical field that produced it, e.g. the Green's function, the point spread functions, the signature waveform, etc. Motivating applications range from three-dimensional super-resolution single-molecule imaging, spike sorting in neural recording and DNA sequencing, to multi-user multi-path channel identification in communication systems, to blind calibration of multi-channel samplers, and many others. We have developed efficient convex optimization algorithms with guaranteed near-optimal performance for stably inverting such parametric mixture models. Moreover, the physical field is typically not known a priori and needs to be jointly calibrated or estimated, resulting in the so-called blind deconvolution problem.

Super-Resolution Off the Grid Bilinear Inverse Problems and Blind Super-Resolution Compressed Sensing and Basis Mismatch Single-Molecule Localization Microscopy Imaging

Reduced-Rank Signal Processing

One major challenge in the data deluge is identifying patterns of activity from the flow of information in a network, examples including sensor networks, social networks and biological networks, by the prospect of systematically learning and predicting the network behavior. Network inference is hard because of its massive scale and time dynamics of the flow of information. How can we reduce network dimensionality without compromising the potential for learning and control within limited processing capabilities?

What saves the day is the geometry of information flow, the fact that it can be viewed as a low dimensional manifold in a very high dimensional space. However there is still the problem of tapping into that manifold. When the manifold is linear, we developed an online algorithm with low-complexity, called PETRELS (Parallel Subspace Estimation and Tracking using Recursive Least Squares), that can estimate and track the evolution of a low-rank subspace from partial observations and demonstrated superior performance on the problem of delay estimation in array processing. What is truly exciting is the recognition that the geometry of information flow, expressed in terms of the top principal components of the data covariance matrix, is preserved even from highly incomplete observations.

In order to further overcome the barrier between the high data generation rate and limited processing capabilities facing many modern data-intensive applications, we realized that by leveraging structural assumptions of covariance structures, a single sketch per sample suffices for accurately reconstructing the covariance matrix rather than the original data set. This offers a radical breakthrough for learning covariance structures requiring substantially fewer storage of the data than conventional methods. The key approach is to develop efficient sketching strategies of streaming data tailored to specific inference tasks with a recognition of the embedded low-dimensional structures. This idea can be extended to sketch and analyze graph-structured data, as well as detect anomalies and change-points. Our new framework has the potential to advance the paradigm of high-dimensional data analysis from simple post-processing toward an integration of data priors, agile acquisition, and adaptive processing.

Covariance Sketching using Quadratic Sampling Statistical Inference for Streaming Data Subspace Learning from Bits Sparsity for Signal Processing and Machine Learning

Nonconvex Optimization

Solving Quadratic Systems of Equations

Earlier Work

Waveform Design for Radars and Wireless Communications

Research Support

I gratefully acknowledge ongoing and past support from NSF (CAREER ECCS-1650449, CCF-1527456, ECCS-1462191, CCF-1422966), AFOSR (YIP), ONR (YIP), Center for Surveillance Research, ORAU, Simons Foundation, Google and Microsoft for my research group.