Atlas
How do Atlas work?
An atlas is a collection of tables. Each table contains data around a particular clinical factor or other criteria. There can be multiple tables inside of an Atlas
You can access it through the Atlas
class.
from polly.auth import Polly
from polly.atlas import Atlas, Table, Column
Polly.auth("<access_key>")
atlas = Atlas(atlas_id="atlas_1")
Here's how you can list all your atlases.
Atlas.list_atlases()
# [
# Atlas(atlas_id=test_polly_atlas),
# Atlas(atlas_id=data_model_demo),
# Atlas(atlas_id=test_atlas)
# ]
Here's how you create a new atlas.
Here's how you list the tables inside the atlas.
atlas.list_tables()
#[
# Table(
# name='gene_table',
# columns=[
# Column(name='gene', col_type='string', constraint='PRIMARY KEY'),
# Column(name='basemean', col_type='float', constraint='None'),
# Column(name='log2foldchange', col_type='float', constraint='None'),
# Column(name='lfcse', col_type='float', constraint='None'),
# Column(name='stat', col_type='float', constraint='None'),
# Column(name='pvalue', col_type='float', constraint='None'),
# Column(name='padj', col_type='float', constraint='None'),
# Column(name='negative_log10_padj', col_type='float', constraint='None'),
# Column(name='data_type', col_type='string', constraint='None'),
# Column(name='dataset_id', col_type='string', constraint='None')
# ]
# ),
# Table(
# name='patient',
# columns=[
# Column(name='curated_patient_id', col_type='string', constraint='PRIMARY KEY'),
# Column(name='alcohol_history', col_type='boolean', constraint='None'),
# Column(name='alcohol_intensity', col_type='string', constraint='None'),
# Column(name='tobacco_smoking', col_type='integer', constraint='None')
# ]
# )
#]
Each table is uniquely identified by an atlas_id
and name
and has some other attributes associated with it.
table = atlas.get_table(atlas_id='atlas_1', name='patient')
print(table)
#Table(
# name='patient',
# columns=[
# Column(name='curated_patient_id', col_type='string', constraint='PRIMARY KEY'),
# Column(name='alcohol_history', col_type='boolean', constraint='None'),
# Column(name='alcohol_intensity', col_type='string', constraint='None'),
# Column(name='tobacco_smoking', col_type='integer', constraint='None')
# ]
#)
One can use the query function to query any table from the atlas.
atlas.query("SELECT * FROM patient LIMIT 10;")
# alcohol_history alcohol_intensity tobacco_smoking
# curated_patient_id
# patient_1718265078961285000 False None 4
# patient_1718265078961325000 True Moderate 2
# patient_1718265078961336000 False Moderate 2
# patient_1718265078961344000 False Low 9
# patient_1718265078961350000 False Low 4
# patient_1718265078961358000 False Low 7
# patient_1718265078961364000 False None 0
# patient_1718265078961371000 True Low 1
# patient_1718265078961377000 True None 7
# patient_1718265078961385000 True Moderate 4
To add a new table to an atlas
columns = [
Column(name="curated_patient_id", col_type="string", constraint="PRIMARY KEY"),
Column(name="alcohol_history", col_type="boolean", constraint=None),
Column(name="alcohol_intensity", col_type="string"),
Column(name="tobacco_smoking", col_type="integer"),
]
new_table = atlas.create_table(table_name="patient_exposure", columns=columns)
print(new_table)
# Table(
# name='patient_exposure',
# columns = [
# Column(name="curated_patient_id", col_type="string", constraint="PRIMARY KEY"),
# Column(name="alcohol_history", col_type="boolean", constraint=None),
# Column(name="alcohol_intensity", col_type="string"),
# Column(name="tobacco_smoking", col_type="integer"),
# ]
# )
To add a table from a dataframe into an Atlas. Make sure to set index for dataframe which is taken as the primary key for the table
#Data Cleaning
df = pd.read_csv('path/to/exposure_file.csv')
df.set_index("curated_patient_id", inplace=True)
#curated_patient_id column will be considered as primary key fof the table `exposure`
atlas.create_table_from_df(table_name="exposure", df=df)
Here's how you can delete an existing table from an atlas
How do Tables work?
A table is a collection of user data. The table represents a database table and stores the user's data.
You can access it through the Table
class.
from polly.auth import Polly
from polly.atlas import Atlas, Table
Polly.auth("<access_key>")
exposure_table = Table(atlas_id="atlas_1", name="patient_exposure")
print(exposure_table)
# Table(
# name='patient_exposure',
# columns = [
# Column(name="curated_patient_id", col_type="string", constraint="PRIMARY KEY"),
# Column(name="alcohol_history", col_type="boolean", constraint=None),
# Column(name="alcohol_intensity", col_type="string"),
# Column(name="tobacco_smoking", col_type="integer"),
# ]
# )
To view first 5 rows of the table
df=exposure_table.head()
print(df)
# curated_patient_id alcohol_history alcohol_intensity tobacco_smoking
# P0031 True High 10
# P0032 False None 0
# P0033 True Moderate 5
# P0034 True Low 2
# P0035 False None 0
To add a new column to the table
bmi_column = exposure_table.add_column(Column(name="bmi", col_type="integer"))
print(bmi_column)
# Column(name='bmi', col_type='integer', constraint='NONE')
To delete an existing column
To iterate over the rows of the table. It iterates in batches of 500 records
for page in my_table.iter_rows():
for record in page:
print(record)
#{'curated_patient_id': 'P0031', 'alcohol_history': True, 'alcohol_intensity': 'High', 'tobacco_smoking': 10, 'bmi': None}
#{'curated_patient_id': 'P0032', 'alcohol_history': False, 'alcohol_intensity': 'None', 'tobacco_smoking': 0, 'bmi': None}
#{'curated_patient_id': 'P0033', 'alcohol_history': True, 'alcohol_intensity': 'Moderate', 'tobacco_smoking': 5, 'bmi': None}
#{'curated_patient_id': 'P0034', 'alcohol_history': True, 'alcohol_intensity': 'Low', 'tobacco_smoking': 2, 'bmi': None}
#{'patient_id': 'P0035', 'alcohol_history': False, 'alcohol_intensity': 'None', 'tobacco_smoking': 0, 'bmi': None}
To load the entire table data into a dataframe
df = exposure.to_df()
# curated_patient_id alcohol_history alcohol_intensity tobacco_smoking bmi
# P0031 True High 10 None
# P0032 False None 3 None
# P0033 True Moderate 5 None
# P0034 True Low 2 None
# P0035 False None 1 None
# P0036 True None 2 None
# P0037 False None 0 None
API Reference
Atlas
get_name
list_tables
Retrieves the list of tables associated with an Atlas.
Returns:
Type | Description |
---|---|
List[Table]
|
A list of Table objects representing the tables associated with an Atlas. |
Examples:
get_table
Retrieves a specific table object by name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table_name |
str
|
The name of the table to retrieve. |
required |
Returns:
Type | Description |
---|---|
Table
|
The Table object representing the specified table. |
Notes
It loads the table object and not the table data. Use to_df() function to do so.
Examples:
create_table
Creates a new table with the specified name and columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table_name |
str
|
The name of the new table to create. |
required |
columns |
List[Column]
|
A list of Column objects representing the columns of the new table. |
required |
rows |
list
|
A list of key-value pairs representing the table data. |
None
|
Returns:
Type | Description |
---|---|
Table
|
The newly created Table object. |
Examples:
create_table_from_df
Creates a new table with the specified table name and schema derived from the Pandas DataFrame.
Optionally loads the data into the table.
Raises Validation error if the datatype is not supported.
Supported column types are [int, float, bool, object]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table_name |
str
|
The name of the new table to create. |
required |
df |
DataFrame
|
A Pandas DataFrame representing the data and schema for the new table |
required |
Returns:
Type | Description |
---|---|
Table
|
The newly created table object showing first 5 rows from the table. |
Examples:
delete_table
query
Executes a query on the Atlas tables.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
query |
str
|
The SQL query to execute. |
required |
Returns:
Type | Description |
---|---|
Union[DataFrame, List[Dict]]
|
The result of the query execution. |
Examples:
Table
Attributes:
Name | Type | Description |
---|---|---|
atlas_id |
str
|
The unique identifier for the Atlas |
name |
str
|
The name of the table |
columns |
List[Column]
|
List of columns in the table |
__init__
Initializes an instance of a Table with the unique identifier atlas_id, table_name and optional list of columns
Parameters:
Name | Type | Description | Default |
---|---|---|---|
atlas_id |
str
|
The unique identifier for the Atlas. |
required |
name |
str
|
The name of the table to be initialized. |
required |
columns |
List[Column]
|
List of column objects representing the columns in the table. |
None
|
Examples:
list_columns
Retrieve the list of columns associated with the table.
Returns:
Type | Description |
---|---|
List[Column]
|
A list of Column objects representing the columns in the table. |
Examples:
get_column
Retrieves a specific column from the table based on its name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column_name |
str
|
The name of the column to retrieve. |
required |
Returns:
Type | Description |
---|---|
Column
|
The Column object representing the specified column. |
Raises:
Type | Description |
---|---|
ValueError
|
If no column with the specified name is found in the table. |
Examples:
add_column
delete_column
add_rows
Adds new rows to the tabl
Parameters:
Name | Type | Description | Default |
---|---|---|---|
rows |
List[dict]
|
A list of key-value pairs representing rows to be added. |
required |
Examples:
delete_rows
Deletes rows from the table based on the column value
Parameters:
Name | Type | Description | Default |
---|---|---|---|
rows |
List[dict]
|
A list of key-value pairs representing rows to delete, where the key is the primary key column name and value is the corresponding entry. |
required |
Examples:
update_rows
Updates rows in the table based on provided row data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
rows |
List[dict]
|
A list of dictionaries representing the rows to update. |
required |
Examples:
head
iter_rows
Iterates over the rows of the table in a paginated manner.
Yields:
Type | Description |
---|---|
List[Dict[str, Any]]
|
A list of dictionaries representing rows of the table, with column names as keys and corresponding values. |
Examples:
Column
Attributes:
Name | Type | Description |
---|---|---|
name |
The name of the column |
|
col_type |
The type of the column |
|
constraint |
The constraint on the column (optional). Can be one of ["PRIMARY KEY", None]. |
__init__
Initializes a Column instance with a given name, type, and optional constraint.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
The name of the column |
required |
col_type |
str
|
The type of the column |
required |
constraint |
Optional[str]
|
The constraint on the column. If not provided, it will be set to None. |
None
|
Examples: