Overview

SMLR-AMP is a collection of AMP-based algorithms that perform multinomial logistic regression for multiclass linear classification and feature selection.

In linear classification, one is given training data of the form $$\{y_m , \boldsymbol{a}_m \}_{m=1}^M$$ where $$y_m \in \{1,...,D\}$$ are labels and $$\boldsymbol{a}_m \in \mathbb{R}^N$$ are feature vectors, and the goal is to design a weight matrix $$\boldsymbol{X} \in \mathbb{R}^{N \times D}$$ that best predicts the unknown label $$y_0$$ corresponding to a test feature vector $$\boldsymbol{a}_0$$ using a linear prediction of the form $$\widehat{y}_0 = \text{argmax}_d [\boldsymbol{z}_0 ]_d$$, where $$\boldsymbol{z}_0 = \boldsymbol{X}^{\text{T}}\boldsymbol{a}_0 \in \mathbb{R}^D$$ is known as the "score" vector. In feature selection, the goal is to identify the indices of the most discriminatory features in the training data. Linear feature selection can be performed by extracting the indices of the largest rows of the learned weight matrix $$\boldsymbol{X}$$.

Many modern "big-data" applications (e.g. text-mining, micro-array gene expression analysis, multi-voxel pattern analysis) are complicated by the fact that the number of training samples $$M$$ is much smaller than the dimensionality of the feature vectors $$N$$. However, accurate classification may still be possible if relatively few of the $$N$$ features are discriminatory. In the latter case, the optimal weight matrix $$\boldsymbol{X}$$ should to be row-sparse (or approximately row-sparse)---a property that can be exploited in weight-matrix design.

Multinomial logistic regression (MLR) is a well known approach to multiclass linear classification and feature selection. In MLR, the probability of the label $$y_m$$ given the score vector $$\boldsymbol{z}_m$$ is modeled using the multinomial logistic activation function, i.e., $p(y_m=y~|~ \boldsymbol{z}_m) = \frac{\exp([\boldsymbol{z}_m]_y)}{\sum_{d=1}^D \exp ([ \boldsymbol{z}_m]_d)}.$ We propose two new algorithms for sparse MLR.

• Our first algorithm solves the convex optimization problem typically associated with sparse MLR, i.e., $\boldsymbol{\widehat{X}} = \text{argmin}_{\boldsymbol{X}} -\sum_{m=1}^M \log p(y_m ~|~ \boldsymbol{z}=\boldsymbol{X}^{\text{T}} \boldsymbol{a}_m) + \lambda \|\boldsymbol{X}\|_1 ,$ using an approximate message passing (AMP) technique. In particular, it is based on a simplification of the min-sum algorithm (MSA) variant of the Hybrid Generalized Approximate Message Passing (HyGAMP) algorithm, and so we call it MSA-SHyGAMP. Furthermore, it tunes $$\lambda$$ using a Stein's unbiased risk estimate (SURE) method, avoiding the need for cross-validation.

• Our second algorithm computes a weight matrix $$\boldsymbol{X}$$ that (approximately) minimizes the test-error rate $$\text{Pr}(\hat{y}_t\neq y_t)$$. To do this, it performs joint marginal inference of the test label(s) and weight-matrix coefficients via belief-propagation on the factor graph associated with joint probability model $p(\boldsymbol{y}, \boldsymbol{X};\boldsymbol{A}) \propto \prod_{m=1}^{M+T} p(y_m ~|~ \boldsymbol{z}_m=\boldsymbol{X}^{\text{T}}\boldsymbol{a}_m)\prod_{n=1}^N p(\boldsymbol{x}_n),$ where $$p(\boldsymbol{x}_n)$$ is a spike-and-slab prior, $$\{y_m,\boldsymbol{a}_m\}_{m=1}^{M}$$ are known training data, and $$\{y_t,\boldsymbol{a}_t\}_{t=M+1}^{M+T}$$ are test data for which only the features $$\boldsymbol{a}_t$$ are known. In particular, it is based on a simplification of the sum-product algorithm (SPA) variant of the HyGAMP algorithm, and so we call it SPA-SHyGAMP. Furthermore, it tunes the spike-and-slab model parameters using the expectation-maximization algorithm, avoiding the need for cross-validation.