Working with Cohorts

The Cohort class contains functions which can be used to create cohorts, add or remove samples, merge metadata and data-matrix of samples/datasets in a cohort and edit or delete a cohort. Args: token (str): Authentication token from polly Usage: from polly.cohort import Cohort

cohort = Cohort(token)

`add_to_cohort(repo_key, dataset_id=None, sample_id=None)`

This function is used to add datasets or samples to a cohort.

Parameters:

Name	Type	Description	Default
`repo_key`	`str`	repo_key(repo_name OR repo_id) for the omixatlas where datasets or samples belong.	required
`dataset_id`	`list / str`	dataset_ids(list,in case of repositories where one dataset has 1 sample) or a dataset_id(str,in case of in case of repository where 1 dataset has many samples)	`None`
`sample_id`	`list`	list of samples to be added in cohort, applicable only in case of in case of repository where 1 dataset has many samples.	`None`

Returns:

Type	Description
`None`	A message will be displayed on the status of the operation.

Raises:
  InvalidParameterException: Empty or Invalid Parameters.
  InvalidCohortOperationException: This operation is not valid as no cohort has been instantiated.

`create_cohort(local_path, cohort_name, description, repo_key=None, dataset_id=None, sample_id=None)`

This function is used to create a cohort. After making Cohort Object you can create cohort.

Parameters:

Name	Type	Description	Default
`local_path`	`str`	local path to instantiate the cohort.	required
`cohort_name`	`str`	identifier name for the cohort.	required
`description`	`str`	description about the cohort.	required
`repo_key`	`str`	repo_key(repo_name/repo_id) for the omixatlas from where datasets or samples is to be added.	`None`
`dataset_id`	`list / str`	dataset_ids(list,in case of repositories where one dataset has 1 sample) or a dataset_id(str,in case of in case of repository where 1 dataset has many samples)	`None`
`sample_id`	`list`	list of samples to be added in cohort, applicable only in case of in case of repository where 1 dataset has many samples.	`None`

Returns:

Type	Description
`None`	A message will be displayed on the status of the operation.

Raises:

Type	Description
`InvalidParameterException`	Empty or Invalid Parameters
`InvalidCohortNameException`	The cohort_name does not represent a valid cohort name.
`InvalidPathException`	Provided path does not represent a file or a directory.

`create_merged_gct(file_path, file_name='')`

This function is used to merge all the gct files in a cohort into a single gct file.

Parameters:

Name	Type	Description	Default
`file_path`	`str`	the system path where the gct file is to be written.	required
`file_name`	`str`	Identifier for the merged file name, cohort name will be used by default.	`''`

`delete_cohort()`

This function is used to delete a cohort. Returns: A confirmation message on deletion of cohort

`edit_cohort(new_cohort_name=None, new_description=None)`

This function is used to edit the cohort level metadata such as cohort name and description. Atleast one of the argument should be present. Args: new_cohort_name (str): new identifier name for the cohort. new_description (str): new description about the cohort.

Returns:
        A confirmation message on updation of cohort.

Raises:
        InvalidCohortOperationException: This operation is not valid as no cohort has been instantiated.
        CohortEditException: No parameter specified for editing in cohort

`is_valid()`

This function is used to check the validity of a cohort.

Returns:

Type	Description
`bool`	A boolean result based on the validity of the cohort.

Raises:

Type	Description
`InvalidPathException`	Cohort path does not represent a file or a directory.
`InvalidCohortOperationException`	This operation is not valid as no cohort has been instantiated.

`load_cohort(local_path)`

Function to load an existing cohort into an object. Once loaded, the functions described in the documentation can be used for the object where the cohort is loaded.

Parameters:

Name	Type	Description	Default
`local_path`	`str`	local path of the cohort.	required

Returns:

Type	Description
	A confirmation message on instantiation of the cohort.

Raises: InvalidPathException: This path does not represent a file or a directory. InvalidCohortPathException: This path does not represent a Cohort.

`merge_data(data_level)`

Function to merge metadata (dataset,sample and feature level) or data-matrix of all the samples/datasets in the cohort.

Parameters:

Name	Type	Description	Default
`data_level`	`str`	identifier to specify the data to be merged - "dataset", "sample", "feature" or "data_matrix"	required

Returns:

Type	Description
	A pandas dataframe containing the merged data which is ready for analysis

`remove_from_cohort(dataset_id=None, sample_id=[])`

This function is used for removing datasets or samples from a cohort.

Parameters:

Name	Type	Description	Default
`dataset_id`	`list / str`	dataset_ids(list,in case of repositories where one dataset has 1 sample) or a dataset_id(str,in case of in case of repository where 1 dataset has many samples)	`None`
`sample_id`	`list`	list of samples to be added in cohort, applicable only in case of in case of repository where 1 dataset has many samples.	`[]`

Returns:

Type	Description
`None`	A message will be displayed on the status of the operation.

Raises:

Type	Description
`InvalidParameterException`	Empty or Invalid Parameters
`InvalidCohortOperationException`	This operation is not valid as no cohort has been instantiated.

`summarize_cohort()`

Function to return cohort level metadata and dataframe with datasets or samples added in the cohort.

Returns:

Type	Description
	A tuple with the first value as cohort metadata information
	(name, description and number of dataset(s) or sample(s) in the cohort) and the second value
	as dataframe containing the source, dataset_id/sample_id and data type available in the cohort.

Raises:
  InvalidCohortOperationException: This operation is not valid as no cohort has been instantiated.

Examples

In TCGA

query = <someSQLquery>
results=omixatlas.query_metadata(query)

Query execution succeeded (time taken: 2.13 seconds, data scanned: 0.244 MB)
Fetched 123 rows

dataset_ids = results['dataset_id'].tolist()
cohort1.create_cohort("/import","tcga_data","Proteomics datasets","tcga", dataset_ids)

INFO:root:Cohort Created !


Initializing process...


Verifying Data: 100%|██████████| 123/123 [00:11<00:00, 10.71it/s]
Adding data to cohort: 100%|██████████| 123/123 [00:14<00:00,  8.72it/s]
Adding metadata to cohort: 100%|██████████| 123/123 [00:11<00:00, 10.25it/s]
INFO:root:'123' dataset/s added to Cohort!

dataset_metadata = cohort1.merge_data("dataset")
display(dataset_metadata.head())

All_Metadata_col = cohort1.merge_data("sample")
print("\nColumns/Datasets information")
display(All_Metadata_col.head())

df_real = cohort1.merge_data("data_matrix")
print("\nData Matrix")
display(df_real.head())

In GEO

query = <someSQLquery>
results = omixatlas.query_metadata(query)
dataset_ids = results['dataset_id'].tolist()
cohort1.create_cohort("/import","geo_data","Transcriptomics datasets","geo", dataset_ids[0])

for i in dataset_ids[1:]:
    cohort1.add_to_cohort("geo", i)

INFO:root:Cohort Created !


Initializing process...
Adding data to cohort...
Adding metadata to cohort...


INFO:cmap_logger:Reading GCT: /import/geo_data.pco/geo_GSE120746_GPL18573.gct
INFO:root:'18' sample/s added to Cohort!


Initializing process...
Adding data to cohort...
Adding metadata to cohort...


INFO:cmap_logger:Reading GCT: /import/geo_data.pco/geo_GSE62642_GPL16791.gct
INFO:root:'14' sample/s added to Cohort!


Initializing process...
Adding data to cohort...
Adding metadata to cohort...


INFO:cmap_logger:Reading GCT: /import/geo_data.pco/geo_GSE68719_GPL11154.gct
INFO:root:'73' sample/s added to Cohort!

dataset_metadata = cohort1.merge_data("dataset")
display(dataset_metadata.head())

All_Metadata_col = cohort1.merge_data("sample")
print("\nColumns/Datasets information")
display(All_Metadata_col.head())

df_real = cohort1.merge_data("data_matrix")
print("\nData Matrix")
display(df_real.head())

Working with Cohorts

add_to_cohort(repo_key, dataset_id=None, sample_id=None)

create_cohort(local_path, cohort_name, description, repo_key=None, dataset_id=None, sample_id=None)

create_merged_gct(file_path, file_name='')

delete_cohort()

edit_cohort(new_cohort_name=None, new_description=None)

is_valid()

load_cohort(local_path)

merge_data(data_level)

remove_from_cohort(dataset_id=None, sample_id=[])

summarize_cohort()

Examples

In TCGA

In GEO

Tutorial Notebooks

`add_to_cohort(repo_key, dataset_id=None, sample_id=None)`

`create_cohort(local_path, cohort_name, description, repo_key=None, dataset_id=None, sample_id=None)`

`create_merged_gct(file_path, file_name='')`

`delete_cohort()`

`edit_cohort(new_cohort_name=None, new_description=None)`

`is_valid()`

`load_cohort(local_path)`

`merge_data(data_level)`

`remove_from_cohort(dataset_id=None, sample_id=[])`

`summarize_cohort()`