Welcome to Nimfa

Nimfa is a Python library for nonnegative matrix factorization. It includes implementations of several factorization methods, initialization approaches, and quality scoring. Both dense and sparse matrix representation are supported.

The sample script using Nimfa on medulloblastoma gene expression data is given below. It uses alternating least squares nonnegative matrix factorization with projected gradient method for subproblems [Lin2007] and Random Vcol [Albright2006] initialization algorithm. An object returned by nimfa.mf_run is fitted factorization model through which user can access matrix factors and estimate quality measures.

import nimfa

V = nimfa.examples.medulloblastoma.read(normalize=True)

lsnmf = nimfa.Lsnmf(V, seed='random_vcol', rank=50, max_iter=100)
lsnmf_fit = lsnmf()

print('Rss: %5.4f' % lsnmf_fit.fit.rss())
print('Evar: %5.4f' % lsnmf_fit.fit.evar())
print('K-L divergence: %5.4f' % lsnmf_fit.distance(metric='kl'))
print('Sparseness, W: %5.4f, H: %5.4f' % lsnmf_fit.fit.sparseness())

Running this script produces the following output, where slight differences in reported scores across different runs can be attributed to randomness of the Random Vcol initialization method:

Rss: 0.2668
Evar: 0.9997
K-L divergence: 38.8744
Sparseness, W: 0.7297, H: 0.8796

See also Examples, sample scripts on this page and a short presentation about nimfa.


Matrix Factorization Methods

  • BD - Bayesian nonnegative matrix factorization Gibbs sampler [Schmidt2009]
  • BMF - Binary matrix factorization [Zhang2007]
  • ICM - Iterated conditional modes nonnegative matrix factorization [Schmidt2009]
  • LFNMF - Fisher nonnegative matrix factorization for learning local features [Wang2004], [Li2001]
  • LSNMF - Alternating nonnegative least squares matrix factorization using projected gradient method for subproblems [Lin2007]
  • NMF - Standard nonnegative matrix factorization with Euclidean / Kullback-Leibler update equations and Frobenius / divergence / connectivity cost functions [Lee2001], [Brunet2004]
  • NSNMF - Nonsmooth nonnegative matrix factorization [Montano2006]
  • PMF - Probabilistic nonnegative matrix factorization [Laurberg2008], [Hansen2008]
  • PSMF - Probabilistic sparse matrix factorization [Dueck2005], [Dueck2004], [Srebro2001], [Li2007]
  • SNMF - Sparse nonnegative matrix factorization based on alternating nonnegativity constrained least squares [Park2007]
  • SNMNMF - Sparse network-regularized multiple nonnegative matrix factorization [Zhang2011]
  • PMFCC - Penalized Matrix Factorization for Constrained Clustering [FWang2008]

Initialization Methods

Quality Measures

  • Distance
  • Residuals
  • Connectivity matrix
  • Consensus matrix
  • Entropy of the fitted NMF model [Park2007]
  • Dominant basis components computation
  • Explained variance
  • Feature score computation representing its specificity to basis vectors [Park2007]
  • Computation of most basis specific features for basis vectors [Park2007]
  • Purity [Park2007]
  • Residual sum of squares (rank estimation) [Hutchins2008], [Frigyesi2008]
  • Sparseness [Hoyer2004]
  • Cophenetic correlation coefficient of consensus matrix (rank estimation) [Brunet2004]
  • Dispersion [Park2007]
  • Factorization rank estimation
  • Selected matrix factorization method specific


  • Fitted factorization model tracker across multiple runs
  • Residuals tracker across multiple factorizations / runs


Nimfa is compatible with Python 2 and Python 3 versions. The recommended way to install Nimfa is by issuing:

pip install nimfa

from the command line.

Nimfa makes extensive use of SciPy and NumPy libraries for fast and convenient dense and sparse matrix manipulation and some linear algebra operations. There are not any additional prerequisites.

Alternatively, you can download source code from Github.

Unzip the archive. To build and install run:

python setup.py install


Methods configuration goes through:

  1. runtime specific options (e. g. tracking fitted model across multiple runs, tracking residuals across iterations, etc.);
  2. algorithm specific options (e. g. prior information with PSMF, type of update equations with NMF, initial value for noise variance with ICM, etc.).

For details and descriptions on algorithm specific options see specific algorithm documentation. For details on runtime specific options and explanation of the general model parameters see mf_run.


Following are basic usage examples that employ different implemented factorization algorithms.

Standard NMF - Divergence on scipy.sparse matrix with matrix factors estimation.

import numpy as np
import scipy.sparse as spr

import nimfa

V = spr.csr_matrix([[1, 0, 2, 4], [0, 0, 6, 3], [4, 0, 5, 6]])
print('Target:\n%s' % V.todense())

nmf = nimfa.Nmf(V, max_iter=200, rank=2, update='euclidean', objective='fro')
nmf_fit = nmf()

W = nmf_fit.basis()
print('Basis matrix:\n%s' % W.todense())

H = nmf_fit.coef()
print('Mixture matrix:\n%s' % H.todense())

print('Euclidean distance: %5.3f' % nmf_fit.distance(metric='euclidean'))

sm = nmf_fit.summary()
print('Sparseness Basis: %5.3f  Mixture: %5.3f' % (sm['sparseness'][0], sm['sparseness'][1]))
print('Iterations: %d' % sm['n_iter'])
print('Target estimate:\n%s' % np.dot(W.todense(), H.todense()))

LSNMF on numpy dense matrix with quality and performance measures.

import numpy as np

import nimfa

V = np.array([[1, 2, 3], [4, 5, 6], [6, 7, 8]])
print('Target:\n%s' % V)

lsnmf = nimfa.Lsnmf(V, max_iter=10, rank=3)
lsnmf_fit = lsnmf()

W = lsnmf_fit.basis()
print('Basis matrix:\n%s' % W)

H = lsnmf_fit.coef()
print('Mixture matrix:\n%s' % H)

print('K-L divergence: %5.3f' % lsnmf_fit.distance(metric='kl'))

print('Rss: %5.3f' % lsnmf_fit.fit.rss())
print('Evar: %5.3f' % lsnmf_fit.fit.evar())
print('Iterations: %d' % lsnmf_fit.n_iter)
print('Target estimate:\n%s' % np.dot(W, H))

LSNMF with Random VCol initialization and error tracking.

import numpy as np

import nimfa

V = np.array([[1, 2, 3], [4, 5, 6], [6, 7, 8]])
print('Target:\n%s' % V)

lsnmf = nimfa.Lsnmf(V, seed='random_vcol', max_iter=10, rank=3, track_error=True)
lsnmf_fit = lsnmf()

W = lsnmf_fit.basis()
print('Basis matrix:\n%s' % W)

H = lsnmf_fit.coef()
print('Mixture matrix:\n%s' % H)

# Objective function value for each iteration
print('Error tracking:\n%s' % lsnmf_fit.fit.tracker.get_error())

sm = lsnmf_fit.summary()
print('Rss: %5.3f' % sm['rss'])
print('Evar: %5.3f' % sm['evar'])
print('Iterations: %d' % sm['n_iter'])

LSNMF with Random VCol initialization and rank estimation.

import numpy as np

import nimfa

V = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])
print('Target:\n%s' % V)

lsnmf = nimfa.Lsnmf(V, seed='random_vcol', max_iter=10, rank=3, track_error=True)
lsnmf_fit = lsnmf()

W = lsnmf_fit.basis()
print('Basis matrix:\n%s' % W)

H = lsnmf_fit.coef()
print('Mixture matrix:\n%s' % H)

r = lsnmf.estimate_rank(rank_range=[2,3,4], what=['rss'])
pp_r = '\n'.join('%d: %5.3f' % (rank, vals['rss']) for rank, vals in r.items())
print('Rank estimate:\n%s' % pp_r)

ICM with Random C initialization and passed callback initialization function.

import nimfa

import numpy as np

V = np.array([[1, 2, 3], [4, 5, 6], [6, 7, 8]])
print('Target:\n%s' % V)

# Initialization callback function
def init_info(model):
    print('Initialized basis matrix:\n%s' % model.basis())
    print('Initialized  mixture matrix:\n%s' % model.coef())

# Callback is called after initialization and prior to factorization in each run
icm = nimfa.Icm(V, seed='random_c', max_iter=10, rank=3, callback_init=init_info)
icm_fit = icm()

W = icm_fit.basis()
print('Basis matrix:\n%s' % W)

H = icm_fit.coef()
print('Mixture matrix:\n%s' % H)

sm = icm_fit.summary()
print('Rss: %5.3f' % sm['rss'])
print('Evar: %5.3f' % sm['evar'])
print('Iterations: %d' % sm['n_iter'])
print('KL divergence: %5.3f' % sm['kl'])
print('Euclidean distance: %5.3f' % sm['euclidean'])

BMF with default parameters, multiple runs and factor tracking for consensus matrix computation.

import numpy as np

import nimfa

V = np.random.rand(23, 200)

# Factorization will be run 3 times (n_run) and factors will be tracked for computing
# cophenetic correlation. Note increased time and space complexity
bmf = nimfa.Bmf(V, max_iter=10, rank=30, n_run=3, track_factor=True)
bmf_fit = bmf()

print('K-L divergence: %5.3f' % bmf_fit.distance(metric='kl'))

sm = bmf_fit.summary()
print('Rss: %5.3f' % sm['rss'])
print('Evar: %5.3f' % sm['evar'])
print('Iterations: %d' % sm['n_iter'])
print('Cophenetic correlation: %5.3f' % sm['cophenetic'])

Standard NMF - Euclidean update equations and fixed initialization (passed matrix factors).

import numpy as np

import nimfa

V = np.random.rand(30, 20)

init_W = np.random.rand(30, 4)
init_H = np.random.rand(4, 20)

# Fixed initialization of latent matrices
nmf = nimfa.Nmf(V, seed="fixed", W=init_W, H=init_H, rank=4)
nmf_fit = nmf()

print("Euclidean distance: %5.3f" % nmf_fit.distance(metric="euclidean"))
print('Initialization type: %s' % nmf_fit.seeding)
print('Iterations: %d' % nmf_fit.n_iter)


[Schmidt2009](1, 2) Mikkel N. Schmidt, Ole Winther, and Lars K. Hansen. Bayesian non-negative matrix factorization. In Proceedings of the 9th International Conference on Independent Component Analysis and Signal Separation, pages 540-547, Paraty, Brazil, 2009.
[Zhang2007]Zhongyuan Zhang, Tao Li, Chris H. Q. Ding and Xiangsun Zhang. Binary Matrix Factorization with applications. In Proceedings of 7th IEEE International Conference on Data Mining, pages 391-400, Omaha, USA, 2007.
[Wang2004]Yuan Wang, Yunde Jia, Changbo Hu and Matthew Turk. Fisher non-negative matrix factorization for learning local features. In Proceedings of the 6th Asian Conference on Computer Vision, pages 27-30, Jeju, Korea, 2004.
[Li2001]Stan Z. Li, Xinwen Huo, Hongjiang Zhang and Qian S. Cheng. Learning spatially localized, parts-based representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 207-212, Kauai, USA, 2001.
[Lin2007](1, 2) Chin J. Lin. Projected gradient methods for nonnegative matrix factorization. Neural Computation, 19(10): 2756-2779, 2007.
[Lee2001]Daniel D. Lee and H. Sebastian Seung. Algorithms for non-negative matrix factorization. In Proceedings of the Neural Information Processing Systems, pages 556-562, Vancouver, Canada, 2001.
[Lee1999]Daniel D. Lee and H. Sebastian Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755): 788-791, 1999.
[Brunet2004](1, 2) Jean-P. Brunet, Pablo Tamayo, Todd R. Golub and Jill P. Mesirov. Metagenes and molecular pattern discovery using matrix factorization. In Proceedings of the National Academy of Sciences of the USA, 101(12): 4164-4169, 2004.
[Montano2006]Alberto Pascual-Montano, J. M. Carazo, Kieko Kochi, Dietrich Lehmann and Roberto D. Pascual-Marqui. Nonsmooth nonnegative matrix factorization (nsnmf). In IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(3): 403-415, 2006.
[Laurberg2008]Hans Laurberg, Mads G. Christensen, Mark D. Plumbley, Lars K. Hansen and Soren H. Jensen. Theorems on positive data: on the uniqueness of NMF. Computational Intelligence and Neuroscience, doi: 10.1155/2008/764206, 2008.
[Hansen2008]Lars K. Hansen. Generalization in high-dimensional factor models. Web: http://www.stanford.edu/group/mmds/slides2008/hansen.pdf, 2008.
[Dueck2005]Delbert Dueck, Quaid D. Morris and Brendan J. Frey. Multi-way clustering of microarray data using probabilistic sparse matrix factorization. Bioinformatics, 21(Suppl 1): 144-151, 2005.
[Dueck2004]Delbert Dueck and Brendan J. Frey. Probabilistic sparse matrix factorization. University of Toronto Technical Report PSI-2004-23, Probabilistic and Statistical Inference Group, University of Toronto, 2004.
[Srebro2001]Nathan Srebro and Tommi Jaakkola. Sparse matrix Factorization of gene expression data. Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 2001.
[Li2007]Huan Li, Yu Sun and Ming Zhan. The discovery of transcriptional modules by a two-stage matrix decomposition approach. Bioinformatics, 23(4): 473-479, 2007.
[Park2007](1, 2, 3, 4, 5, 6) Hyuonsoo Kim and Haesun Park. Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics, 23(12): 1495-1502, 2007.
[Zhang2011]Shihua Zhang, Qingjiao Li and Xianghong J. Zhou. A novel computational framework for simultaneous integration of multiple types of genomic data to identify microRNA-gene regulatory modules. Bioinformatics, 27(13): 401-409, 2011.
[Boutsidis2007]Christos Boutsidis and Efstratios Gallopoulos. SVD-based initialization: A head start for nonnegative matrix factorization. Pattern Recognition, 41(4): 1350-1362, 2008.
[Albright2006](1, 2, 3) Russell Albright, Carl D. Meyer and Amy N. Langville. Algorithms, initializations, and convergence for the nonnegative matrix factorization. NCSU Technical Report Math 81706, NC State University, Releigh, USA, 2006.
[Hoyer2004]Patrik O. Hoyer. Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research, 5: 1457-1469, 2004.
[Hutchins2008]Lucie N. Hutchins, Sean P. Murphy, Priyam Singh and Joel H. Graber. Position-dependent motif characterization using non-negative matrix factorization. Bioinformatics, 24(23): 2684-2690, 2008.
[Frigyesi2008]Attila Frigyesi and Mattias Hoglund. Non-negative matrix factorization for the analysis of complex gene expression data: identification of clinically relevant tumor subtypes. Cancer Informatics, 6: 275-292, 2008.
[FWang2008]Fei Wang, Tao Li, Changshui Zhang. Semi-Supervised Clustering via Matrix Factorization. SDM 2008, 1-12, 2008.
[Schietgat2010]Leander Schietgat, Celine Vens, Jan Struyf, Hendrik Blockeel, Dragi Kocev and Saso Dzeroski. Predicting gene function using hierarchical multi-label decision tree ensembles. BMC Bioinformatics, 11(2), 2010.
  1. Schachtner, D. Lutter, P. Knollmueller, A. M. Tome, F. J. Theis, G. Schmitz, M. Stetter, P. Gomez Vilda and E. W. Lang. Knowledge-based gene expression classification via matrix factorization. Bioinformatics, 24(15): 1688-1697, 2008.


We would like to acknowledge support for this project from the Google Summer of Code 2011 program and from the Slovenian Research Agency grants P2-0209, J2-9699, and L2-1112.

The nimfa - A Python Library for Nonnegative Matrix Factorization Techniques was part of the Google Summer of Code 2011. It is authored by Marinka Zitnik and Blaz Zupan.


  author    = {Marinka Zitnik and Blaz Zupan},
  title     = {NIMFA: A Python Library for Nonnegative Matrix Factorization},
  journal   = {Journal of Machine Learning Research},
  volume    = {13},
  year      = {2012},
  pages     = {849-853},


This software and data is provided as-is, and there are no guarantees that it fits your purposes or that it is bug-free. Use it at your own risk!


nimfa - A Python Library for Nonnegative Matrix Factorization Techniques Copyright (C) 2011-2015 Marinka Zitnik and Blaz Zupan.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.



Related Topics

Fork me on GitHub