nimfa  A Python Library for Nonnegative Matrix Factorization Techniques
Nimfa is an opensource Python library that provides a unified interface to nonnegative matrix factorization algorithms.
It includes implementations of stateoftheart factorization methods, initialization approaches, and quality scoring.
Both dense and sparse matrix representation are supported.
The sample script using Nimfa on medulloblastoma gene expression data is given below. It uses alternating least squares nonnegative matrix
factorization with projected gradient method for subproblems [Lin2007] and Random Vcol [Albright2006] initialization algorithm. An object returned by nimfa.mf_run is
fitted factorization model through which user can access matrix factors and estimate quality measures.
import nimfa
V = nimfa.examples.medulloblastoma.read(normalize = True)
fctr = nimfa.mf(V, seed = 'random_vcol', method = 'lsnmf', rank = 40, max_iter = 65)
fctr_res = nimfa.mf_run(fctr)
print 'Rss: %5.4f' % fctr_res.fit.rss()
print 'Evar: %5.4f' % fctr_res.fit.evar()
print 'KL divergence: %5.4f' % fctr_res.distance(metric = 'kl')
print 'Sparseness, W: %5.4f, H: %5.4f' % fctr_res.fit.sparseness()
Running this script produces the following output, where slight differences in reported scores across different
runs can be attributed to randomness of the Random Vcol initialization method:
Rss: 0.2668
Evar: 0.9997
KL divergence: 38.8744
Sparseness, W: 0.7297, H: 0.8796
Further realworld examples and demonstrations of library capabilities are provided in the Examples section and in sample scripts on this page.
Content
Matrix Factorization Methods
 BD  Bayesian nonnegative matrix factorization Gibbs sampler [Schmidt2009]
 BMF  Binary matrix factorization [Zhang2007]
 ICM  Iterated conditional modes nonnegative matrix factorization [Schmidt2009]
 LFNMF  Fisher nonnegative matrix factorization for learning local features [Wang2004], [Li2001]
 LSNMF  Alternating nonnegative least squares matrix factorization using projected gradient method for subproblems [Lin2007]
 NMF  Standard nonnegative matrix factorization with Euclidean / KullbackLeibler update equations and Frobenius / divergence / connectivity cost functions [Lee2001], [Brunet2004]
 NSNMF  Nonsmooth nonnegative matrix factorization [Montano2006]
 PMF  Probabilistic nonnegative matrix factorization [Laurberg2008], [Hansen2008]
 PSMF  Probabilistic sparse matrix factorization [Dueck2005], [Dueck2004], [Srebro2001], [Li2007]
 SNMF  Sparse nonnegative matrix factorization based on alternating nonnegativity constrained least squares [Park2007]
 SNMNMF  Sparse networkregularized multiple nonnegative matrix factorization [Zhang2011]
 PMFCC  Penalized Matrix Factorization for Constrained Clustering [FWang2008]
Quality Measures
 Distance
 Residuals
 Connectivity matrix
 Consensus matrix
 Entropy of the fitted NMF model [Park2007]
 Dominant basis components computation
 Explained variance
 Feature score computation representing its specificity to basis vectors [Park2007]
 Computation of most basis specific features for basis vectors [Park2007]
 Purity [Park2007]
 Residual sum of squares (rank estimation) [Hutchins2008], [Frigyesi2008]
 Sparseness [Hoyer2004]
 Cophenetic correlation coefficient of consensus matrix (rank estimation) [Brunet2004]
 Dispersion [Park2007]
 Factorization rank estimation
 Selected matrix factorization method specific
Utils
 Fitted factorization model tracker across multiple runs
 Residuals tracker across multiple factorizations / runs
Installation
Source
No special installation procedure is specified. However, the nimfa library makes extensive use of SciPy and NumPy
libraries for fast and convenient dense and sparse matrix manipulation and some linear
algebra operations. There are not any additional prerequisites.
Download source code (zipped archive) from Github repository.
Unzip the archive. To build and install run:
Documentation
For building the documentation use Sphinx 1.0 or newer. Sphinx is available at Sphinx home page and
nimfa library documentation sources are available at Github repository. Before building documentation,
please install the nimfa library.
Documentation can be built by issuing:
sphinxbuild b html E <source dir [docs/source]> <build dir [html]>
from the nimfa root directory. After completion, the documentation should reside in the html directory. The built documentation
should resemble closely the one you are currently reading.
Note
The nimfa library documentation is contained in docs source directory and and scripts are in nimfa directory.
Configuration
Methods configuration goes through:
 runtime specific options (e. g. tracking fitted model across multiple runs, tracking residuals across iterations, etc.);
 algorithm specific options (e. g. prior information with PSMF, type of update equations with NMF, initial value for noise variance with ICM, etc.).
For details and descriptions on algorithm specific options see specific algorithm documentation. For details on runtime specific options and explanation of the general model parameters see mf_run.
Usage
Following are basic usage examples that employ different implemented factorization algorithms. For examples with real world
applications see Examples section and methods’ documentation.
Note
Consider these examples as Hello World examples for the nimfa library.
Standard NMF  Divergence on scipy.sparse matrix with matrix factors estimation.
# Import nimfa library entry point for factorization
import nimfa
# Construct sparse matrix in CSR format, which will be our input for factorization
from scipy.sparse import csr_matrix
from scipy import array
from numpy import dot
V = csr_matrix((array([1,2,3,4,5,6]), array([0,2,2,0,1,2]), array([0,2,3,6])), shape=(3,3))
# Print this tiny matrix in dense format
print V.todense()
# Run Standard NMF rank 4 algorithm
# Update equations and cost function are Standard NMF specific parameters (among others).
# If not specified the Euclidean update and Frobenius cost function would be used.
# We don't specify initialization method. Algorithm specific or random initialization will be used.
# In Standard NMF case, by default random is used.
# Returned object is fitted factorization model. Through it user can access quality and performance measures.
# The fctr_res's attribute `fit` contains all the attributes of the factorization.
fctr = nimfa.mf(V, method = "nmf", max_iter = 30, rank = 4, update = 'divergence', objective = 'div')
fctr_res = nimfa.mf_run(fctr)
# Basis matrix. It is sparse, as input V was sparse as well.
W = fctr_res.basis()
print "Basis matrix"
print W.todense()
# Mixture matrix. We print this tiny matrix in dense format.
H = fctr_res.coef()
print "Coef"
print H.todense()
# Return the loss function according to KullbackLeibler divergence. By default Euclidean metric is used.
print "Distance KullbackLeibler: %5.3e" % fctr_res.distance(metric = "kl")
# Compute generic set of measures to evaluate the quality of the factorization
sm = fctr_res.summary()
# Print sparseness (Hoyer, 2004) of basis and mixture matrix
print "Sparseness Basis: %5.3f Mixture: %5.3f" % (sm['sparseness'][0], sm['sparseness'][1])
# Print actual number of iterations performed
print "Iterations: %d" % sm['n_iter']
# Print estimate of target matrix V
print "Estimate"
print dot(W.todense(), H.todense())
LSNMF on numpy dense matrix with quality and performance measures.
# Import nimfa library entry point for factorization
import nimfa
# Here we will work with numpy matrix
import numpy as np
V = np.matrix([[1,2,3],[4,5,6],[6,7,8]])
# Print this tiny matrix
print V
# Run LSNMF rank 3 algorithm
# We don't specify any algorithm specific parameters. Defaults will be used.
# We don't specify initialization method. Algorithm specific or random initialization will be used.
# In LSNMF case, by default random is used.
# Returned object is fitted factorization model. Through it user can access quality and performance measures.
# The fctr_res's attribute `fit` contains all the attributes of the factorization.
fctr = nimfa.mf(V, method = "lsnmf", max_iter = 10, rank = 3)
fctr_res = nimfa.mf_run(fctr)
# Basis matrix.
W = fctr_res.basis()
print "Basis matrix"
print W
# Mixture matrix.
H = fctr_res.coef()
print "Coef"
print H
# Print the loss function according to KullbackLeibler divergence. By default Euclidean metric is used.
print "Distance KullbackLeibler: %5.3e" % fctr_res.distance(metric = "kl")
# Compute generic set of measures to evaluate the quality of the factorization
sm = fctr_res.summary()
# Print residual sum of squares (Hutchins, 2008). Can be used for estimating optimal factorization rank.
print "Rss: %8.3f" % sm['rss']
# Print explained variance.
print "Evar: %8.3f" % sm['evar']
# Print actual number of iterations performed
print "Iterations: %d" % sm['n_iter']
# Print estimate of target matrix V
print "Estimate"
print np.dot(W, H)
LSNMF with Random VCol initialization and error tracking.
# Import nimfa library entry point for factorization
import nimfa
# Here we will work with numpy matrix
import numpy as np
V = np.matrix([[1,2,3],[4,5,6],[6,7,8]])
# Print this tiny matrix
print V
# Run LSNMF rank 3 algorithm
# We don't specify any algorithm specific parameters. Defaults will be used.
# We specify Random V Col initialization algorithm.
# We enable tracking the error from each iteration of the factorization, by default only the final value of objective function is retained.
# Perform initialization.
fctr = nimfa.mf(V, seed = "random_vcol", method = "lsnmf", max_iter = 10, rank = 3, track_error = True)
# Returned object is fitted factorization model. Through it user can access quality and performance measures.
# The fctr_res's attribute `fit` contains all the attributes of the factorization.
fctr_res = nimfa.mf_run(fctr)
# Basis matrix.
W = fctr_res.basis()
print "Basis matrix"
print W
# Mixture matrix.
H = fctr_res.coef()
print "Coef"
print H
# Error tracking.
print "Error tracking"
# A list of objective function values for each iteration in factorization is printed.
# If error tracking is enabled and user specifies multiple runs of the factorization, get_error(run = n) return a list of objective values from nth run.
# fctr_res.fit.tracker is an instance of Mf_track  isinstance(fctr_res.fit.tracker, nimfa.models.mf_track.Mf_track)
print fctr_res.fit.tracker.get_error()
# Compute generic set of measures to evaluate the quality of the factorization
sm = fctr_res.summary()
# Print residual sum of squares (Hutchins, 2008). Can be used for estimating optimal factorization rank.
print "Rss: %8.3f" % sm['rss']
# Print explained variance.
print "Evar: %8.3f" % sm['evar']
# Print actual number of iterations performed
print "Iterations: %d" % sm['n_iter']
ICM with Random C initialization and passed callback initialization function.
# Import nimfa library entry point for factorization
import nimfa
# Here we will work with numpy matrix
import numpy as np
V = np.matrix([[1,2,3],[4,5,6],[6,7,8]])
# Print this tiny matrix
print V
# This will be our callback_init function called prior to factorization.
# We will only print the initialized matrix factors.
def init_info(model):
print "Initialized basis matrix\n", model.basis()
print "Initialized mixture matrix\n", model.coef()
# Run ICM rank 3 algorithm
# We don't specify any algorithm specific parameters. Defaults will be used.
# We specify Random C initialization algorithm.
# We specify callback_init parameter by passing a init_info function
# This function is called after initialization and prior to factorization in each run.
fctr = nimfa.mf(V, seed = "random_c", method = "icm", max_iter = 10, rank = 3, callback_init = init_info)
# Returned object is fitted factorization model. Through it user can access quality and performance measures.
# The fctr_res's attribute `fit` contains all the attributes of the factorization.
fctr_res = nimfa.mf_run(fctr)
# Basis matrix.
W = fctr_res.basis()
print "Resulting basis matrix"
print W
# Mixture matrix.
H = fctr_res.coef()
print "Resulting mixture matrix"
print H
# Compute generic set of measures to evaluate the quality of the factorization
sm = fctr_res.summary()
# Print residual sum of squares (Hutchins, 2008). Can be used for estimating optimal factorization rank.
print "Rss: %8.3e" % sm['rss']
# Print explained variance.
print "Evar: %8.3e" % sm['evar']
# Print actual number of iterations performed
print "Iterations: %d" % sm['n_iter']
# Print distance according to KullbackLeibler divergence
print "KL divergence: %5.3e" % sm['kl']
# Print distance according to Euclidean metric
print "Euclidean distance: %5.3e" % sm['euclidean']
BMF with default parameters, multiple runs and factor tracking for consensus matrix computation.
# Import nimfa library entry point for factorization.
import nimfa
# Here we will work with numpy matrix.
import numpy as np
V = np.random.random((23, 200))
# Run BMF.
# We don't specify any algorithm parameters or initialization method. Defaults will be used.
# Factorization will be run 3 times (n_run) and factors will be tracked for computing
# cophenetic correlation. Note increased time and space complexity.
fctr = nimfa.mf(V, method = "bmf", max_iter = 10, rank = 30, n_run = 3, track_factor = True)
fctr_res = nimfa.mf_run(fctr)
# Print the loss function according to KullbackLeibler divergence.
print "Distance KullbackLeibler: %5.3e" % fctr_res.distance(metric = "kl")
# Compute generic set of measures to evaluate the quality of the factorization.
sm = fctr_res.summary()
# Print residual sum of squares.
print "Rss: %8.3f" % sm['rss']
# Print explained variance.
print "Evar: %8.3f" % sm['evar']
# Print actual number of iterations performed.
print "Iterations: %d" % sm['n_iter']
# Print cophenetic correlation. Can be used for rank estimation.
print "cophenetic: %8.3f" % sm['cophenetic']
Standard NMF  Euclidean update equations and fixed initialization (passed matrix factors).
# Import nimfa library entry point for factorization.
import nimfa
# Here we will work with numpy matrix.
import numpy as np
# Generate random target matrix.
V = np.random.rand(30, 20)
# Generate random matrix factors which we will pass as fixed factors to Nimfa.
init_W = np.random.rand(30, 4)
init_H = np.random.rand(4, 20)
# Obviously by passing these factors we want to use rank = 4.
# Run NMF.
# We don't specify any algorithm parameters. Defaults will be used.
# We specify fixed initialization method and pass matrix factors.
fctr = nimfa.mf(V, method = "nmf", seed = "fixed", W = init_W, H = init_H, rank = 4)
fctr_res = nimfa.mf_run(fctr)
# Print the loss function (Euclidean distance between target matrix and its estimate).
print "Euclidean distance: %5.3e" % fctr_res.distance(metric = "euclidean")
# It should print 'fixed'.
print fctr_res.seeding
# By default, max 30 iterations are performed.
print fctr_res.n_iter
References
[Schmidt2009]  (1, 2) Mikkel N. Schmidt, Ole Winther, and Lars K. Hansen. Bayesian nonnegative matrix factorization. In Proceedings of the 9th International Conference on Independent Component Analysis and Signal Separation, pages 540547, Paraty, Brazil, 2009. 
[Zhang2007]  Zhongyuan Zhang, Tao Li, Chris H. Q. Ding and Xiangsun Zhang. Binary Matrix Factorization with applications. In Proceedings of 7th IEEE International Conference on Data Mining, pages 391400, Omaha, USA, 2007. 
[Wang2004]  Yuan Wang, Yunde Jia, Changbo Hu and Matthew Turk. Fisher nonnegative matrix factorization for learning local features. In Proceedings of the 6th Asian Conference on Computer Vision, pages 2730, Jeju, Korea, 2004. 
[Li2001]  Stan Z. Li, Xinwen Huo, Hongjiang Zhang and Qian S. Cheng. Learning spatially localized, partsbased representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 207212, Kauai, USA, 2001. 
[Lin2007]  (1, 2) Chin J. Lin. Projected gradient methods for nonnegative matrix factorization. Neural Computation, 19(10): 27562779, 2007. 
[Lee2001]  Daniel D. Lee and H. Sebastian Seung. Algorithms for nonnegative matrix factorization. In Proceedings of the Neural Information Processing Systems, pages 556562, Vancouver, Canada, 2001. 
[Lee1999]  Daniel D. Lee and H. Sebastian Seung. Learning the parts of objects by nonnegative matrix factorization. Nature, 401(6755): 788791, 1999. 
[Brunet2004]  (1, 2) JeanP. Brunet, Pablo Tamayo, Todd R. Golub and Jill P. Mesirov. Metagenes and molecular pattern discovery using matrix factorization. In Proceedings of the National Academy of Sciences of the USA, 101(12): 41644169, 2004. 
[Montano2006]  Alberto PascualMontano, J. M. Carazo, Kieko Kochi, Dietrich Lehmann and Roberto D. PascualMarqui. Nonsmooth nonnegative matrix factorization (nsnmf). In IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(3): 403415, 2006. 
[Laurberg2008]  Hans Laurberg, Mads G. Christensen, Mark D. Plumbley, Lars K. Hansen and Soren H. Jensen. Theorems on positive data: on the uniqueness of NMF. Computational Intelligence and Neuroscience, doi: 10.1155/2008/764206, 2008. 
[Dueck2005]  Delbert Dueck, Quaid D. Morris and Brendan J. Frey. Multiway clustering of microarray data using probabilistic sparse matrix factorization. Bioinformatics, 21(Suppl 1): 144151, 2005. 
[Dueck2004]  Delbert Dueck and Brendan J. Frey. Probabilistic sparse matrix factorization. University of Toronto Technical Report PSI200423, Probabilistic and Statistical Inference Group, University of Toronto, 2004. 
[Srebro2001]  Nathan Srebro and Tommi Jaakkola. Sparse matrix Factorization of gene expression data. Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 2001. 
[Li2007]  Huan Li, Yu Sun and Ming Zhan. The discovery of transcriptional modules by a twostage matrix decomposition approach. Bioinformatics, 23(4): 473479, 2007. 
[Park2007]  (1, 2, 3, 4, 5, 6) Hyuonsoo Kim and Haesun Park. Sparse nonnegative matrix factorizations via alternating nonnegativityconstrained least squares for microarray data analysis. Bioinformatics, 23(12): 14951502, 2007. 
[Zhang2011]  Shihua Zhang, Qingjiao Li and Xianghong J. Zhou. A novel computational framework for simultaneous integration of multiple types of genomic data to identify microRNAgene regulatory modules. Bioinformatics, 27(13): 401409, 2011. 
[Boutsidis2007]  Christos Boutsidis and Efstratios Gallopoulos. SVDbased initialization: A head start for nonnegative matrix factorization. Pattern Recognition, 41(4): 13501362, 2008. 
[Albright2006]  (1, 2, 3) Russell Albright, Carl D. Meyer and Amy N. Langville. Algorithms, initializations, and convergence for the nonnegative matrix factorization. NCSU Technical Report Math 81706, NC State University, Releigh, USA, 2006. 
[Hoyer2004]  Patrik O. Hoyer. Nonnegative matrix factorization with sparseness constraints. Journal of Machine Learning Research, 5: 14571469, 2004. 
[Hutchins2008]  Lucie N. Hutchins, Sean P. Murphy, Priyam Singh and Joel H. Graber. Positiondependent motif characterization using nonnegative matrix factorization. Bioinformatics, 24(23): 26842690, 2008. 
[Frigyesi2008]  Attila Frigyesi and Mattias Hoglund. Nonnegative matrix factorization for the analysis of complex gene expression data: identification of clinically relevant tumor subtypes. Cancer Informatics, 6: 275292, 2008. 
[FWang2008]  Fei Wang, Tao Li, Changshui Zhang. SemiSupervised Clustering via Matrix Factorization. SDM 2008, 112, 2008. 
[Schietgat2010]  Leander Schietgat, Celine Vens, Jan Struyf, Hendrik Blockeel, Dragi Kocev and Saso Dzeroski. Predicting gene function using hierarchical multilabel decision tree ensembles. BMC Bioinformatics, 11(2), 2010. 
[Schachtner2008] 
 Schachtner, D. Lutter, P. Knollmueller, A. M. Tome, F. J. Theis, G. Schmitz, M. Stetter, P. Gomez Vilda and E. W. Lang. Knowledgebased gene expression classification via matrix factorization. Bioinformatics, 24(15): 16881697, 2008.

Acknowledgements
We would like to acknowledge support for this project from the Google Summer of Code 2011 program and
from the Slovenian Research Agency grants P20209, J29699, and L2 1112.
The nimfa  A Python Library for Nonnegative Matrix Factorization Techniques is part of the Google Summer of Code 2011 program by Marinka Zitnik
under supervision of the Orange organization and mentor Prof. Blaz Zupan, PhD.
Citation
@article{ZitnikZ12,
author = {Marinka Zitnik and Blaz Zupan},
title = {NIMFA: A Python Library for Nonnegative Matrix Factorization},
journal = {Journal of Machine Learning Research},
volume = {13},
year = {2012},
pages = {849853},
}
Disclaimer
This software and data is provided asis, and there are no guarantees
that it fits your purposes or that it is bugfree. Use it at your own
risk!
License
nimfa  A Python Library for Nonnegative Matrix Factorization Techniques
Copyright (C) 20112012 Marinka Zitnik and Blaz Zupan
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.