analysis (sa.analysis)
This script performs statistical analysis of strains inside plate and in-between plates.
-
sa.analysis.fss(data_del, data_ts, data_sg, res_path)[source]
Feature subset selection for unsupervised learning. Feature subset selection (FSS) and clustering
based on feature subspace with highest score. A low dimensional representation (MDS) of best
clustering is saved to directory :param:`res_path`.
Parameters: |
- data_del (tuple (meta_data, plates_data)) – Deletion collection plates data as returned from utilities.read.
- data_ts (tuple (meta_data, plates_data)) – TS collection plates data as returned from utilities.read.
- data_sg (tuple (meta_data, plates_data)) – SG collection plates data as returned from utilities.read.
- res_path (str) – Full path to the directory where results are to be saved.
|
-
sa.analysis.fss_post_cluster(data_del, data_ts, data_sg, fss_subset_path, fss_cluster_path, res_path)[source]
Read clustering predictions and description of feature subspace. Run MDS optimization
and save plotted coordinates.
Parameters: |
- data_del (tuple (meta_data, plates_data)) – Deletion collection plates data as returned from utilities.read.
- data_ts (tuple (meta_data, plates_data)) – TS collection plates data as returned from utilities.read.
- data_sg (tuple (meta_data, plates_data)) – SG collection plates data as returned from utilities.read.
- fss_subset_path (str) – Full path to the file with feature space description
as obtained by analysis.fss.
- fss_cluster_path (str) – Full path to the file with predictions for observations
as obtained by analysis.fss.
- res_path (str) – Full path to the directory where results are to be saved.
|
-
sa.analysis.strains_1p_WT(meta, plate, res_path, plot_attr_hist=True)[source]
Analyze WT strains from one plate. First, filter out MT strains, e.g. retain strains with ORF equal to YOR202W.
Continue by standardizing the features and detecting outliers using elliptic envelope method. Detected outliers are saved
to file named <plate-title>_outliers.csv.
Euclidean distances are computed between WT strains and plotted as heat map. Additionally, mean distances between
all strains (WT strains without outliers and MT strains) are plotted in a heat map as located on the plate.
WT strains are clustered to reveal possible structures and assess their homogeneity. Also, the intersection between clusters and
WT outliers is printed to the screen. PCA is computed in explained variance is printed.
Standardized plate data are saved in Orange and CSV format.
Parameters: |
- meta (tuple) – Meta data for one plate, (file_name, attr_names), as returned from utilities.read.
- plate (list) – Plate data as returned from utilities.read.
- res_path (str) – Full path to the directory where results are to be saved.
|
-
sa.analysis.strains_Np_MT(data_del, data_ts, data_sg, repeats_path, res_path, repeats_keys=['RT', '37'], standardize=True)[source]
Find mutants with significantly different profiles than wild-type cells by estimating
distance between WT strains, MT strains, WT and MT strains and assess significance with
permutation test.
Parameters: |
- data_del (tuple (meta_data, plates_data)) – Deletion collection plates data as returned from utilities.read.
- data_ts (tuple (meta_data, plates_data)) – TS collection plates data as returned from utilities.read.
- data_sg (tuple (meta_data, plates_data)) – SG collection plates data as returned from utilities.read.
- repeats_path (str) – Full path to file with multi-occuring mutant strains specification.
- res_path (str) – Full path to the directory where results are to be saved.
- repeats_keys (list) – Names of TS (temperature sensitive mutant strains) plates’ extensions. By default these are [“RT”, “37”].
- standardize (bool) – Indicator whether to work with standardized or original features. By default, data set is standardized.
|
-
sa.analysis.strains_Np_WT(meta, plates, res_path)[source]
Analyze WT strains from many plates by combining them in one set.
Parameters: |
- meta (list) – Meta data, [(file_name1, attr_names1), (file_name2 attr_names2) ...], as returned from utilities.read.
- plates (list) – Plates data as returned from utilities.read.
- res_path (str) – Full path to the directory where results are to be saved.
|
-
sa.analysis.strains_Np_novelty_MT(data_del, data_ts, data_sg, res_path)[source]
Find mutants with significantly different profiles than wild-type cells by novelty
detection using one-class SVM and GMM.
Parameters: |
- data_del (tuple (meta_data, plates_data)) – Deletion collection plates data as returned from utilities.read.
- data_ts (tuple (meta_data, plates_data)) – TS collection plates data as returned from utilities.read.
- data_sg (tuple (meta_data, plates_data)) – SG collection plates data as returned from utilities.read.
- res_path (str) – Full path to the directory where results are to be saved.
|
-
sa.analysis.strains_coll(data_del, data_ts, data_sg, res_path)[source]
Preprocess plates from each collection (standardizing features and remove outlier WT strains)
and compute distances between strains from the same and different collections. Histograms
of distances between collections are saved to directory :param:`res_path`.
Parameters: |
- data_del (tuple (meta_data, plates_data)) – Deletion collection plates data as returned from utilities.read.
- data_ts (tuple (meta_data, plates_data)) – TS collection plates data as returned from utilities.read.
- data_sg (tuple (meta_data, plates_data)) – SG collection plates data as returned from utilities.read.
- res_path (str) – Full path to the directory where results are to be saved.
|
-
sa.analysis.strains_repl(data_del, data_ts, data_sg, repeats_path, res_path, repeats_keys=['RT', '37'])[source]
Analyze mutants that occur multiple times in the data set. First standardize data and then analyze
distance distribution of replicate observations and all observations.
Parameters: |
- data_del (tuple (meta_data, plates_data)) – Deletion collection plates data as returned from utilities.read.
- data_ts (tuple (meta_data, plates_data)) – TS collection plates data as returned from utilities.read.
- data_sg (tuple (meta_data, plates_data)) – SG collection plates data as returned from utilities.read.
- repeats_path (str) – Full path to file with multi-occurring mutants specification.
- res_path (str) – Full path to the directory where results are to be saved.
- repeats_keys (list) – Names of TS (temperature sensitive mutants) plates’ extensions. By default these are [“RT”, “37”].
|