Working with Cohorts
Cohort
The Cohort class contains functions which can be used to create cohorts, add or remove samples, merge metadata and data-matrix of samples/datasets in a cohort and edit or delete a cohort.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
token |
str
|
token copy from polly |
required |
Usage
from polly.cohort import Cohort
cohort = Cohort(token)
add_to_cohort(repo_key, dataset_id=None, sample_id=None)
This function is used to add datasets or samples to a cohort.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
repo_key |
str/int
|
repo_key(repo_name OR repo_id) for the omixatlas where datasets or samples belong. |
required |
entity_id |
list
|
list of dataset ID or sample ID to be added to the cohort. |
required |
Returns:
Type | Description |
---|---|
None
|
A confirmation message for number of datasets or samples which are added to the cohort. |
Raises:
Type | Description |
---|---|
InvalidParameterException
|
Empty or Invalid Parameters. |
InvalidCohortOperationException
|
This operation is not valid as no cohort has been instantiated. |
create_cohort(local_path, cohort_name, description, repo_key=None, dataset_id=None, sample_id=None)
This function is used to create a cohort. After making Cohort Object you can create cohort.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
local_path |
str
|
local path to instantiate the cohort. |
required |
cohort_name |
str
|
identifier name for the cohort. |
required |
description |
str
|
description about the cohort. |
required |
repo_key |
str/int
|
repo_key(repo_name/repo_id) for the omixatlas from where datasets or samples is to be added. |
None
|
entity_id |
list
|
list of dataset_id or sample_id to be added to the cohort. |
required |
Returns:
Type | Description |
---|---|
None
|
A confirmation message on creation of cohort. |
None
|
Raises:
Type | Description |
---|---|
InvalidParameterException
|
Empty or Invalid Parameters |
InvalidCohortNameException
|
The cohort_name does not represent a valid cohort name. |
InvalidPathException
|
Provided path does not represent a file or a directory. |
create_merged_gct(file_path, file_name='')
This function is used to merge all the gct files in a cohort into a single gct file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path |
str
|
the system path where the gct file is to be written. |
required |
file_name |
str
|
Identifier for the merged file name, cohort name will be used by default. |
''
|
delete_cohort()
This function is used to delete a cohort.
Returns:
Type | Description |
---|---|
None
|
A confirmation message on deletion of cohort |
edit_cohort(new_cohort_name=None, new_description=None)
This function is used to edit the cohort level metadata such as cohort name and description.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
new_cohort_name |
str
|
Optional Argument: new identifier name for the cohort. |
None
|
new_description |
str
|
Optional Argument: new description about the cohort. |
None
|
Returns:
Type | Description |
---|---|
A confirmation message on updation of cohort. |
Raises:
Type | Description |
---|---|
InvalidCohortOperationException
|
This operation is not valid as no cohort has been instantiated. |
CohortEditException
|
No parameter specified for editing in cohort |
is_valid()
This function is used to check the validity of a cohort.
Returns:
Type | Description |
---|---|
bool
|
A boolean result based on the validity of the cohort. |
Raises:
Type | Description |
---|---|
InvalidPathException
|
Cohort path does not represent a file or a directory. |
InvalidCohortOperationException
|
This operation is not valid as no cohort has been instantiated. |
load_cohort(local_path)
Function to load an existing cohort into an object. Once loaded, the functions described in the documentation can be used for the object where the cohort is loaded.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
local_path |
str
|
local path of the cohort. |
required |
Returns:
Type | Description |
---|---|
A confirmation message on instantiation of the cohort. |
Raises:
Type | Description |
---|---|
InvalidPathException
|
This path does not represent a file or a directory. |
InvalidCohortPathException
|
This path does not represent a Cohort. |
merge_data(data_level)
Function to merge metadata (dataset, sample and feature level metadata) or data-matrix of all the samples or datasets in the cohort.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_level |
str
|
identifier to specify the data to be merged - "dataset", "sample", "feature" or "data_matrix" |
required |
Returns:
Type | Description |
---|---|
A pandas dataframe containing the merged data which is ready for analysis |
remove_from_cohort(dataset_id=None, sample_id=[])
This function is used for removing datasets or samples from a cohort.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
entity_id |
list
|
list of dataset IDs or sample IDs which is to be removed from the cohort. |
required |
Returns:
Type | Description |
---|---|
None
|
A confirmation message on removal of datasets or samples from cohort. |
None
|
Raises:
Type | Description |
---|---|
InvalidParameterException
|
Empty or Invalid Parameters |
InvalidCohortOperationException
|
This operation is not valid as no cohort has been instantiated. |
summarize_cohort()
Function to return cohort level metadata and dataframe with datasets or samples added in the cohort.
Returns:
Type | Description |
---|---|
A tuple with the first value as cohort metadata information (name, description and number of dataset(s) or sample(s) in the cohort) and the second value as dataframe containing the source, dataset_id or sample_id and data type available in the cohort. |
Raises:
Type | Description |
---|---|
InvalidCohortOperationException
|
This operation is not valid as no cohort has been instantiated. |
Examples
In TCGA
Query execution succeeded (time taken: 2.13 seconds, data scanned: 0.244 MB)
Fetched 123 rows
dataset_ids = results['dataset_id'].tolist()
cohort1.create_cohort("/import","tcga_data","Proteomics datasets","tcga", dataset_ids)
INFO:root:Cohort Created !
Initializing process...
Verifying Data: 100%|██████████| 123/123 [00:11<00:00, 10.71it/s]
Adding data to cohort: 100%|██████████| 123/123 [00:14<00:00, 8.72it/s]
Adding metadata to cohort: 100%|██████████| 123/123 [00:11<00:00, 10.25it/s]
INFO:root:'123' dataset/s added to Cohort!
All_Metadata_col = cohort1.merge_data("sample")
print("\nColumns/Datasets information")
display(All_Metadata_col.head())
In GEO
dataset_ids = results['dataset_id'].tolist()
cohort1.create_cohort("/import","geo_data","Transcriptomics datasets","geo", dataset_ids[0])
for i in dataset_ids[1:]:
cohort1.add_to_cohort("geo", i)
INFO:root:Cohort Created !
Initializing process...
Adding data to cohort...
Adding metadata to cohort...
INFO:cmap_logger:Reading GCT: /import/geo_data.pco/geo_GSE120746_GPL18573.gct
INFO:root:'18' sample/s added to Cohort!
Initializing process...
Adding data to cohort...
Adding metadata to cohort...
INFO:cmap_logger:Reading GCT: /import/geo_data.pco/geo_GSE62642_GPL16791.gct
INFO:root:'14' sample/s added to Cohort!
Initializing process...
Adding data to cohort...
Adding metadata to cohort...
INFO:cmap_logger:Reading GCT: /import/geo_data.pco/geo_GSE68719_GPL11154.gct
INFO:root:'73' sample/s added to Cohort!
All_Metadata_col = cohort1.merge_data("sample")
print("\nColumns/Datasets information")
display(All_Metadata_col.head())