analysis (sa.analysis)

This script performs statistical analysis of strains inside plate and in-between plates.

sa.analysis.fss(data_del, data_ts, data_sg, res_path)[source]

Feature subset selection for unsupervised learning. Feature subset selection (FSS) and clustering based on feature subspace with highest score. A low dimensional representation (MDS) of best clustering is saved to directory :param:`res_path`.

Parameters:
  • data_del (tuple (meta_data, plates_data)) – Deletion collection plates data as returned from utilities.read.
  • data_ts (tuple (meta_data, plates_data)) – TS collection plates data as returned from utilities.read.
  • data_sg (tuple (meta_data, plates_data)) – SG collection plates data as returned from utilities.read.
  • res_path (str) – Full path to the directory where results are to be saved.
sa.analysis.fss_post_cluster(data_del, data_ts, data_sg, fss_subset_path, fss_cluster_path, res_path)[source]

Read clustering predictions and description of feature subspace. Run MDS optimization and save plotted coordinates.

Parameters:
  • data_del (tuple (meta_data, plates_data)) – Deletion collection plates data as returned from utilities.read.
  • data_ts (tuple (meta_data, plates_data)) – TS collection plates data as returned from utilities.read.
  • data_sg (tuple (meta_data, plates_data)) – SG collection plates data as returned from utilities.read.
  • fss_subset_path (str) – Full path to the file with feature space description as obtained by analysis.fss.
  • fss_cluster_path (str) – Full path to the file with predictions for observations as obtained by analysis.fss.
  • res_path (str) – Full path to the directory where results are to be saved.

See also

See also functions sa.methods.decompose_MDS() and sa.utilities.std_prep().

sa.analysis.strains_1p_WT(meta, plate, res_path, plot_attr_hist=True)[source]

Analyze WT strains from one plate. First, filter out MT strains, e.g. retain strains with ORF equal to YOR202W. Continue by standardizing the features and detecting outliers using elliptic envelope method. Detected outliers are saved to file named <plate-title>_outliers.csv.

Euclidean distances are computed between WT strains and plotted as heat map. Additionally, mean distances between all strains (WT strains without outliers and MT strains) are plotted in a heat map as located on the plate.

WT strains are clustered to reveal possible structures and assess their homogeneity. Also, the intersection between clusters and WT outliers is printed to the screen. PCA is computed in explained variance is printed.

Standardized plate data are saved in Orange and CSV format.

Parameters:
  • meta (tuple) – Meta data for one plate, (file_name, attr_names), as returned from utilities.read.
  • plate (list) – Plate data as returned from utilities.read.
  • res_path (str) – Full path to the directory where results are to be saved.
sa.analysis.strains_Np_MT(data_del, data_ts, data_sg, repeats_path, res_path, repeats_keys=['RT', '37'], standardize=True)[source]

Find mutants with significantly different profiles than wild-type cells by estimating distance between WT strains, MT strains, WT and MT strains and assess significance with permutation test.

Parameters:
  • data_del (tuple (meta_data, plates_data)) – Deletion collection plates data as returned from utilities.read.
  • data_ts (tuple (meta_data, plates_data)) – TS collection plates data as returned from utilities.read.
  • data_sg (tuple (meta_data, plates_data)) – SG collection plates data as returned from utilities.read.
  • repeats_path (str) – Full path to file with multi-occuring mutant strains specification.
  • res_path (str) – Full path to the directory where results are to be saved.
  • repeats_keys (list) – Names of TS (temperature sensitive mutant strains) plates’ extensions. By default these are [“RT”, “37”].
  • standardize (bool) – Indicator whether to work with standardized or original features. By default, data set is standardized.
sa.analysis.strains_Np_WT(meta, plates, res_path)[source]

Analyze WT strains from many plates by combining them in one set.

Parameters:
  • meta (list) – Meta data, [(file_name1, attr_names1), (file_name2 attr_names2) ...], as returned from utilities.read.
  • plates (list) – Plates data as returned from utilities.read.
  • res_path (str) – Full path to the directory where results are to be saved.
sa.analysis.strains_Np_novelty_MT(data_del, data_ts, data_sg, res_path)[source]

Find mutants with significantly different profiles than wild-type cells by novelty detection using one-class SVM and GMM.

Parameters:
  • data_del (tuple (meta_data, plates_data)) – Deletion collection plates data as returned from utilities.read.
  • data_ts (tuple (meta_data, plates_data)) – TS collection plates data as returned from utilities.read.
  • data_sg (tuple (meta_data, plates_data)) – SG collection plates data as returned from utilities.read.
  • res_path (str) – Full path to the directory where results are to be saved.
sa.analysis.strains_coll(data_del, data_ts, data_sg, res_path)[source]

Preprocess plates from each collection (standardizing features and remove outlier WT strains) and compute distances between strains from the same and different collections. Histograms of distances between collections are saved to directory :param:`res_path`.

Parameters:
  • data_del (tuple (meta_data, plates_data)) – Deletion collection plates data as returned from utilities.read.
  • data_ts (tuple (meta_data, plates_data)) – TS collection plates data as returned from utilities.read.
  • data_sg (tuple (meta_data, plates_data)) – SG collection plates data as returned from utilities.read.
  • res_path (str) – Full path to the directory where results are to be saved.

See also

See also function sa.plotting.plot_hist_coll().

sa.analysis.strains_repl(data_del, data_ts, data_sg, repeats_path, res_path, repeats_keys=['RT', '37'])[source]

Analyze mutants that occur multiple times in the data set. First standardize data and then analyze distance distribution of replicate observations and all observations.

Parameters:
  • data_del (tuple (meta_data, plates_data)) – Deletion collection plates data as returned from utilities.read.
  • data_ts (tuple (meta_data, plates_data)) – TS collection plates data as returned from utilities.read.
  • data_sg (tuple (meta_data, plates_data)) – SG collection plates data as returned from utilities.read.
  • repeats_path (str) – Full path to file with multi-occurring mutants specification.
  • res_path (str) – Full path to the directory where results are to be saved.
  • repeats_keys (list) – Names of TS (temperature sensitive mutants) plates’ extensions. By default these are [“RT”, “37”].

See also

See also function sa.methods.analyze_repl().

Previous topic

SA (sa)

Next topic

MDS (sa.mds)