Pmfcc (methods.factorization.pmfcc)
Penalized Matrix Factorization for Constrained Clustering
(PMFCC) [FWang2008].
PMFCC is used for semi-supervised co-clustering. Intra-type information is
represented as constraints to guide the factorization process. The constraints
are of two types: (i) must-link: two data points belong to the same class,
(ii) cannot-link: two data points cannot belong to the same class.
PMFCC solves the following problem. Given a target matrix
V = [v_1, v_2, ..., v_n], it produces W = [f_1, f_2, ... f_rank], containing
cluster centers and matrix H of data point cluster membership values.
Cost function includes centroid distortions and any associated constraint
violations. Compared to the traditional NMF cost function, the only difference
is the inclusion of the penalty term.
import numpy as np
import nimfa
V = np.random.rand(40, 100)
pmfcc = nimfa.Pmfcc(V, seed="random_vcol", rank=10, max_iter=30,
theta=np.random.rand(V.shape[1], V.shape[1]))
pmfcc_fit = pmfcc()
-
class nimfa.methods.factorization.pmfcc.Pmfcc(V, seed=None, W=None, H=None, rank=30, max_iter=30, min_residuals=1e-05, test_conv=None, n_run=1, callback=None, callback_init=None, track_factor=False, track_error=False, Theta=None, **options)
Bases: nimfa.models.smf.Smf
Parameters: |
- V (Instance of the scipy.sparse sparse matrices types,
numpy.ndarray, numpy.matrix or tuple of instances of
the latter classes.) – The target matrix to estimate.
- seed (str naming the method or methods.seeding.nndsvd.Nndsvd
or None) – Specify method to seed the computation of a factorization. If
specified :param:`W` and :param:`H` seeding must be None. If neither seeding
method or initial fixed factorization is specified, random initialization is
used.
- W (scipy.sparse or numpy.ndarray or
numpy.matrix or None) – Specify initial factorization of basis matrix W. Default is None.
When specified, :param:`seed` must be None.
- H (Instance of the scipy.sparse sparse matrices types,
numpy.ndarray, numpy.matrix, tuple of instances of the
latter classes or None) – Specify initial factorization of mixture matrix H. Default is None.
When specified, :param:`seed` must be None.
- rank (int) – The factorization rank to achieve. Default is 30.
- n_run (int) – It specifies the number of runs of the algorithm. Default is
1. If multiple runs are performed, fitted factorization model with the
lowest objective function value is retained.
- callback (function) – Pass a callback function that is called after each run when
performing multiple runs. This is useful if one wants to save summary
measures or process the result before it gets discarded. The callback
function is called with only one argument models.mf_fit.Mf_fit that
contains the fitted model. Default is None.
- callback_init (function) – Pass a callback function that is called after each
initialization of the matrix factors. In case of multiple runs the function
is called before each run (more precisely after initialization and before
the factorization of each run). In case of single run, the passed callback
function is called after the only initialization of the matrix factors.
This is useful if one wants to obtain the initialized matrix factors for
further analysis or additional info about initialized factorization model.
The callback function is called with only one argument
models.mf_fit.Mf_fit that (among others) contains also initialized
matrix factors. Default is None.
- track_factor (bool) – When :param:`track_factor` is specified, the fitted
factorization model is tracked during multiple runs of the algorithm. This
option is taken into account only when multiple runs are executed
(:param:`n_run` > 1). From each run of the factorization all matrix factors
are retained, which can be very space consuming. If space is the problem
setting the callback function with :param:`callback` is advised which is
executed after each run. Tracking is useful for performing some quality or
performance measures (e.g. cophenetic correlation, consensus matrix,
dispersion). By default fitted model is not tracked.
- track_error (bool) – Tracking the residuals error. Only the residuals from
each iteration of the factorization are retained. Error tracking is not
space consuming. By default residuals are not tracked and only the final
residuals are saved. It can be used for plotting the trajectory of the
residuals.
- Theta (numpy.matrix) – Constraint matrix (dimension: V.shape[1] x X.shape[1]). It
contains known must-link (negative) and cannot-link (positive) constraints.
|
Stopping criterion
Factorization terminates if any of specified criteria is satisfied.
Parameters: |
- max_iter (int) – Maximum number of factorization iterations. Note that the
number of iterations depends on the speed of method convergence. Default
is 30.
- min_residuals (float) – Minimal required improvement of the residuals from the
previous iteration. They are computed between the target matrix and its MF
estimate using the objective function associated to the MF algorithm.
Default is None.
- test_conv (int) – It indicates how often convergence test is done. By
default convergence is tested each iteration.
|
-
basis()
Return the matrix of basis vectors (factor 1 matrix).
-
coef(idx=None)
Return the matrix of mixture coefficients (factor 2 matrix).
Parameters: | idx (None) – Used in the multiple MF model. In standard MF idx is always None. |
-
distance(metric='euclidean', idx=None)
Return the loss function value.
Parameters: |
- distance (str with values ‘euclidean’ or ‘kl’) – Specify distance metric to be used. Possible are Euclidean and
Kullback-Leibler (KL) divergence. Strictly, KL is not a metric.
- idx (None) – Used in the multiple MF model. In standard MF idx is always None.
|
-
factorize()
Compute matrix factorization.
Return fitted factorization model.
-
fitted(idx=None)
Compute the estimated target matrix according to the MF algorithm model.
Parameters: | idx (None) – Used in the multiple MF model. In standard MF idx is always None. |
-
is_satisfied(p_obj, c_obj, iter)
Compute the satisfiability of the stopping criteria based on stopping
parameters and objective function value.
Return logical value denoting factorization continuation.
Parameters: |
- p_obj (float) – Objective function value from previous iteration.
- c_obj (float) – Current objective function value.
- iter (int) – Current iteration number.
|
-
objective()
Compute Frobenius distance cost function with penalization term.
-
residuals(idx=None)
Return residuals matrix between the target matrix and its MF estimate.
Parameters: | idx (None) – Used in the multiple MF model. In standard MF idx is always None. |
-
target(idx=None)
Return the target matrix to estimate.
Parameters: | idx (None) – Used in the multiple MF model. In standard MF idx is always None. |
-
update()
Update basis and mixture matrix.