MDS (sa.mds)

Multi dimensional scaling.

class sa.mds.MDS(n_components=2, metric=True, n_init=4, max_iter=300, verbose=0, eps=0.001, n_jobs=1, random_state=None)[source]

Bases: sklearn.base.BaseEstimator

Multidimensional scaling

metric : boolean, optional, default: True
compute metric or nonmetric SMACOF (Scaling by Majorizing a Complicated Function) algorithm
n_components : int, optional, default: 2
number of dimension in which to immerse the similarities overridden if initial array is provided.
n_init : int, optional, default: 4
Number of time the smacof algorithm will be run with different initialisation. The final results will be the best output of the n_init consecutive runs in terms of stress.
max_iter : int, optional, default: 300
Maximum number of iterations of the SMACOF algorithm for a single run
verbose : int, optional, default: 0
level of verbosity
eps : float, optional, default: 1e-6
relative tolerance w.r.t stress to declare converge
n_jobs : int, optional, default: 1

The number of jobs to use for the computation. This works by breaking down the pairwise matrix into n_jobs even slices and computing them in parallel.

If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debuging. For n_jobs below -1, (n_cpus + 1 - n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used.

random_state : integer or numpy.RandomState, optional
The generator used to initialize the centers. If an integer is given, it fixes the seed. Defaults to the global numpy random number generator.
embedding_ : array-like, shape [n_components, n_samples]
Stores the position of the dataset in the embedding space
stress_ : float
The final value of the stress (sum of squared distance of the disparities and the distances for all constrained points)

“Modern Multidimensional Scaling - Theory and Applications” Borg, I.; Groenen P. Springer Series in Statistics (1997)

“Nonmetric multidimensional scaling: a numerical method” Kruskal, J. Psychometrika, 29 (1964)

“Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis” Kruskal, J. Psychometrika, 29, (1964)

fit(X, init=None, y=None)[source]

Computes the position of the points in the embedding space

X: array, shape=[n_samples, n_samples], symetric
Proximity matrice
init: {None or ndarray, shape (n_samples,)}
if None, randomly chooses the initial configuration if ndarray, initialize the SMACOF algorithm with this array
fit_transform(X, init=None, y=None)[source]

Fit the data from X, and returns the embedded coordinates

X: array, shape=[n_samples, n_samples], symetric
Proximity matrice
init: {None or ndarray, shape (n_samples,)}
if None, randomly chooses the initial configuration if ndarray, initialize the SMACOF algorithm with this array
get_params(deep=True)

Get parameters for the estimator

deep: boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
set_params(**params)

Set the parameters of the estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

self

sa.mds.pool_adjacent_violators(distances, similarities, max_iter=300, verbose=0)[source]

Pool adjancent violators

Computes an isotonic regression of distances on similarities.

distances: ndarray, shape (n, 1)
array to fit
similarities: ndarray, shape (n, 1)
array on which to fit
max_iter: int, optional, default:300
Set the maximum number of iteration
verbose: int, optional, default: 0
set the level of verbosity

distances: ndarray, shape (n, 1)

“Modern Multidimensional Scaling - Theory and Applications” Borg, I.; Groenen P. Springer Series in Statistics (1997)

sa.mds.smacof(similarities, metric=True, n_components=2, init=None, n_init=8, n_jobs=1, max_iter=300, verbose=0, eps=0.001, random_state=None)[source]

Computes multidimensional scaling using SMACOF (Scaling by Majorizing a Complicated Function) algorithm

The SMACOF algorithm is a multidimensional scaling algorithm: it minimizes a objective function, the stress, using a majorization technique. The Stress Majorization, also known as the Guttman Transform, guarantees a monotone convergence of Stress, and is more powerful than traditional technics such as gradient descent.

The SMACOF algorithm for metric MDS can summarized by the following steps:

  1. Set an initial start configuration, randomly or not.
  2. Compute the stress
  3. Compute the Guttman Transform
  4. Iterate 2 and 3 until convergence.

The nonmetric algorithm adds a monotonic regression steps before computing the stress.

similarities : symmetric ndarray, shape (n_samples, n_samples)
similarities between the points
metric : boolean, optional, default: True
compute metric or nonmetric SMACOF algorithm
n_components : int, optional, default: 2
number of dimension in which to immerse the similarities overridden if initial array is provided.
init : {None or ndarray of shape (n_samples, n_components)}
if None, randomly chooses the initial configuration if ndarray, initialize the SMACOF algorithm with this array
n_init : int, optional, default: 8
Number of time the smacof algorithm will be run with different initialisation. The final results will be the best output of the n_init consecutive runs in terms of stress.

n_jobs : int, optional, default: 1

The number of jobs to use for the computation. This works by breaking down the pairwise matrix into n_jobs even slices and computing them in parallel.

If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debuging. For n_jobs below -1, (n_cpus + 1 - n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used.

max_iter : int, optional, default: 300
Maximum number of iterations of the SMACOF algorithm for a single run
verbose : int, optional, default: 0
level of verbosity
eps : float, optional, default: 1e-6
relative tolerance w.r.t stress to declare converge
random_state : integer or numpy.RandomState, optional
The generator used to initialize the centers. If an integer is given, it fixes the seed. Defaults to the global numpy random number generator.
X : ndarray (n_samples,n_components)
Coordinates of the n_samples points in a n_components-space
stress : float
The final value of the stress (sum of squared distance of the disparities and the distances for all constrained points)

“Modern Multidimensional Scaling - Theory and Applications” Borg, I.; Groenen P. Springer Series in Statistics (1997)

“Nonmetric multidimensional scaling: a numerical method” Kruskal, J. Psychometrika, 29 (1964)

“Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis” Kruskal, J. Psychometrika, 29, (1964)

Previous topic

analysis (sa.analysis)

Next topic

methods (sa.methods)