Probabilistic Sparse Matrix Factorization (PSMF) [Dueck2005], [Dueck2004].
PSMF allows for varying levels of sensor noise in the data, uncertainty in the hidden prototypes used to explain the data and uncertainty as to the prototypes selected to explain each data vector stacked in target matrix (V).
This technique explicitly maximizes a lower bound on the loglikelihood of the data under a probability model. Found sparse encoding can be used for a variety of tasks, such as functional prediction, capturing functionally relevant hidden factors that explain gene expression data and visualization. As this algorithm computes probabilities rather than making hard decisions, it can be shown that a higher data loglikelihood is obtained than from the versions (iterated conditional modes) that make hard decisions [Srebro2001].
Given a target matrix (V [n, m]), containing n mdimensional data points, basis matrix (factor loading matrix) (W) and mixture matrix (matrix of hidden factors) (H) are found under a structural sparseness constraint that each row of W contains at most N (of possible factorization rank number) nonzero entries. Intuitively, this corresponds to explaining each row vector of V as a linear combination (weighted by the corresponding row in W) of a small subset of factors given by rows of H. This framework includes simple clustering by setting N = 1 and ordinary lowrank approximation N = factorization rank as special cases.
A probability model presuming Gaussian sensor noise in V (V = WH + noise) and uniformly distributed factor assignments is constructed. Factorized variational inference method is used to perform tractable inference on the latent variables and account for noise and uncertainty. The number of factors, r_g, contributing to each data point is multinomially distributed such that P(r_g = n) = v_n, where v is a user specified Nvector. PSMF model estimation using factorized variational inference has greater computational complexity than basic NMF methods [Dueck2004].
Example of usage of PSMF for identifying gene transcriptional modules from gene expression data is described in [Li2007].
import numpy as np
import nimfa
V = np.random.rand(40, 100)
psmf = nimfa.Psmf(V, seed=None, rank=10, max_iter=12, prior=np.random.rand(10))
psmf_fit = psmf()
Bases: nimfa.models.nmf_std.Nmf_std
Parameters: 


Stopping criterion
Factorization terminates if any of specified criteria is satisfied.
Parameters: 


Return the matrix of basis vectors.
Return the matrix of mixture coefficients.
Parameters:  idx (None) – Used in the multiple NMF model. In standard NMF idx is always None. 

Compute the connectivity matrix for the samples based on their mixture coefficients.
The connectivity matrix C is a symmetric matrix which shows the shared membership of the samples: entry C_ij is 1 iff sample i and sample j belong to the same cluster, 0 otherwise. Sample assignment is determined by its largest metagene expression value.
Return connectivity matrix.
Parameters:  idx (None or str with values ‘coef’ or ‘coef1’ (int value of 0 or 1, respectively)) – Used in the multiple NMF model. In factorizations following standard NMF model or nonsmooth NMF model idx is always None. 

Compute consensus matrix as the mean connectivity matrix across multiple runs of the factorization. It has been proposed by [Brunet2004] to help visualize and measure the stability of the clusters obtained by NMF.
Tracking of matrix factors across multiple runs must be enabled for computing consensus matrix. For results of a single NMF run, the consensus matrix reduces to the connectivity matrix.
Parameters:  idx (None or str with values ‘coef’ or ‘coef1’ (int value of 0 or 1, respectively)) – Used in the multiple NMF model. In factorizations following standard NMF model or nonsmooth NMF model idx is always None. 

Compute cophenetic correlation coefficient of consensus matrix, generally obtained from multiple NMF runs.
The cophenetic correlation coefficient is measure which indicates the dispersion of the consensus matrix and is based on the average of connectivity matrices. It measures the stability of the clusters obtained from NMF. It is computed as the Pearson correlation of two distance matrices: the first is the distance between samples induced by the consensus matrix; the second is the distance between samples induced by the linkage used in the reordering of the consensus matrix [Brunet2004].
Return real number. In a perfect consensus matrix, cophenetic correlation equals 1. When the entries in consensus matrix are scattered between 0 and 1, the cophenetic correlation is < 1. We observe how this coefficient changes as factorization rank increases. We select the first rank, where the magnitude of the cophenetic correlation coefficient begins to fall [Brunet2004].
Parameters:  idx (None or str with values ‘coef’ or ‘coef1’ (int value of 0 or 1, respectively)) – Used in the multiple NMF model. In factorizations following standard NMF model or nonsmooth NMF model :param:`idx` is always None. 

Return triple containing the dimension of the target matrix and matrix factorization rank.
Parameters:  idx (None or str with values ‘coef’ or ‘coef1’ (int value of 0 or 1, respectively)) – Used in the multiple NMF model. In factorizations following standard NMF model or nonsmooth NMF model idx is always None. 

Compute dispersion coefficient of consensus matrix
Dispersion coefficient [Park2007] measures the reproducibility of clusters obtained from multiple NMF runs.
Return the real value in [0,1]. Dispersion is 1 for a perfect consensus matrix and has value in [0,0] for a scattered consensus matrix.
Parameters:  idx (None or str with values ‘coef’ or ‘coef1’ (int value of 0 or 1, respectively)) – Used in the multiple NMF model. In standard NMF model or nonsmooth NMF model idx is always None. 

Return the loss function value.
Parameters: 


Compute the entropy of the NMF model given a priori known groups of samples [Park2007].
The entropy is a measure of performance of a clustering method in recovering classes defined by a list a priori known (true class labels).
Return the real number. The smaller the entropy, the better the clustering performance.
Parameters: 


Choosing factorization parameters carefully is vital for success of a factorization. However, the most critical parameter is factorization rank. This method tries different values for ranks, performs factorizations, computes some quality measures of the results and chooses the best value according to [Brunet2004] and [Hutchins2008].
Note
The process of rank estimation can be lengthy.
Note
Matrix factors are tracked during rank estimation. This is needed for computing cophenetic correlation coefficient.
Return a dict (keys are values of rank from range, values are `dict`s of measures) of quality measures for each value in rank’s range. This can be passed to the visualization model, from which estimated rank can be established.
Parameters: 


Compute the explained variance of the NMF estimate of the target matrix.
This measure can be used for comparing the ability of models for accurately reproducing the original target matrix. Some methods specifically aim at minimizing the RSS and maximizing the explained variance while others not, which one should note when using this measure.
Parameters:  idx (None or str with values ‘coef’ or ‘coef1’ (int value of 0 or 1, respectively)) – Used in the multiple NMF model. In factorizations following standard NMF model or nonsmooth NMF model idx is always None. 

Compute matrix factorization.
Return fitted factorization model.
Compute the estimated target matrix according to the NMF algorithm model.
Parameters:  idx (None) – Used in the multiple NMF model. In standard NMF idx is always None. 

Compute the satisfiability of the stopping criteria based on stopping parameters and objective function value.
Return logical value denoting factorization continuation.
Parameters: 


Compute squared Frobenius norm of a target matrix and its NMF estimate.
Compute the dominant basis components. The dominant basis component is computed as the row index for which the entry is the maximum within the column.
If prob is not specified, list is returned which contains computed index for each sample (feature). Otherwise tuple is returned where first element is a list as specified before and second element is a list of associated probabilities, relative contribution of the maximum entry within each column.
Parameters: 


Compute the purity given a priori known groups of samples [Park2007].
The purity is a measure of performance of a clustering method in recovering classes defined by a list a priori known (true class labels).
Return the real number in [0,1]. The larger the purity, the better the clustering performance.
Parameters: 


Return residuals matrix between the target matrix and its NMF estimate.
Parameters:  idx (None) – Used in the multiple NMF model. In standard NMF idx is always None. 

Compute Residual Sum of Squares (RSS) between NMF estimate and target matrix [Hutchins2008].
This measure can be used to estimate optimal factorization rank. [Hutchins2008] suggested to choose the first value where the RSS curve presents an inflection point. [Frigyesi2008] suggested to use the smallest value at which the decrease in the RSS is lower than the decrease of the RSS obtained from random data.
RSS tells us how much of the variation in the dependent variables our model did not explain.
Return real value.
Parameters:  idx (None or str with values ‘coef’ or ‘coef1’ (int value of 0 or 1, respectively)) – Used in the multiple NMF model. In factorizations following standard NMF model or nonsmooth NMF model idx is always None. 

Score features in terms of their specificity to the basis vectors [Park2007].
A row vector of the basis matrix (W) indicates contributions of a feature to the r (i.e. columns of W) latent components. It might be informative to investigate features that have strong componentspecific membership values to the latent components.
Return array with feature scores. Feature scores are realvalued from interval [0,1]. Higher value indicates greater feature specificity.
Parameters:  idx (None or str with values ‘coef’ or ‘coef1’ (int value of 0 or 1, respectively)) – Used in the multiple NMF model. In standard NMF model or nonsmooth NMF model idx is always None. 

Compute the most basisspecific features for each basis vector [Park2007].
[Park2007] scoring schema and feature selection method is used. The features are first scored using the score_features(). Then only the features that fulfill both the following criteria are retained:
Return a boolean array indicating whether features were selected.
Parameters:  idx (None or str with values ‘coef’ or ‘coef1’ (int value of 0 or 1, respectively)) – Used in the multiple NMF model. In standard NMF model or nonsmooth NMF model idx is always None. 

Compute sparseness of matrix (basis vectors matrix, mixture coefficients) [Hoyer2004].
Sparseness of a vector quantifies how much energy is packed into its components. The sparseness of a vector is a real number in [0, 1], where sparser vector has value closer to 1. Sparseness is 1 iff the vector contains a single nonzero component and is equal to 0 iff all components of the vector are equal.
Sparseness of a matrix is mean sparseness of its column vectors.
Return tuple that contains sparseness of the basis and mixture coefficients matrices.
Parameters:  idx (None or str with values ‘coef’ or ‘coef1’ (int value of 0 or 1, respectively)) – Used in the multiple NMF model. In standard NMF model or nonsmooth NMF model idx is always None. 

Return the target matrix to estimate.
Parameters:  idx (None) – Used in the multiple NMF model. In standard NMF idx is always None. 

Update basis and mixture matrix.