Data Management
Data
The OmixAtlas class contains functions to add, update, or delete datasets in an omixatlas, or save datasets from polly notebooks to workspace.
add_datasets(repo_id, source_folder_path, destination_folder_path='', priority='low')
This function is used to add a new data into an OmixAtlas. Once user runs this function successfully, they should be able to see the ingestion status on the data ingestion monitoring dashboard after ~15 mins.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
repo_id |
str/int
|
repo_id for that Omixatlas |
required |
source_folder_path |
dict
|
source folder paths from data and metadata files are fetched. In this dictionary, there should be two keys called "data" and "metadata" with value consisting of folders where data and metadata is stored respectively. |
required |
destination_folder_path |
str
|
Destination folder structure in s3. |
''
|
priority |
str
|
Priority at which this data has to be ingested into the OmixAtlas. |
'low'
|
Raises:
Type | Description |
---|---|
paramError
|
If Params are not passed in the desired format or value not valid. |
RequestException
|
If there is issue in data ingestion. |
Returns:
Type | Description |
---|---|
pd.DataFrame
|
pd.DataFrame: DataFrame showing Upload Status of Files |
dataset_metadata_template(repo_key, source='all', data_type='all')
This function is used to fetch the template of dataset level metadata in a given OmixAtlas. In order to ingest the dataset level metadata appropriately in the OmixAtlas, the user needs to ensure the metadata json files contains the keys as per the dataset level schema of the OmixAtlas.
In order to ingest the dataset level metadata appropriately in the OmixAtlas, the user needs to ensure the json files contains the keys as per the dataset level schema of the OmixAtlas.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
repo_key |
str/int
|
repo_key(repo_name/repo_id) for that Omixatlas |
required |
source |
all, optional)
|
Source/Sources present in the schema. Default value is "all" |
'all'
|
data_type |
all, optional)
|
Datatype/Datatypes present in the schema. Default value is "all" |
'all'
|
delete_datasets(repo_id, dataset_ids)
This function is used to delete datasets from an OmixAtlas. Once user runs this function successfully, they should be able to see the ingestion status on the data ingestion monitoring dashboard after ~15 mins.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
repo_id |
str/int
|
repo_id for that Omixatlas |
required |
dataset_ids |
list
|
list of dataset_ids that users want to delete. |
required |
Raises:
Type | Description |
---|---|
paramError
|
If Params are not passed in the desired format or value not valid. |
RequestException
|
If there is issue in data ingestion. |
save_to_workspace(repo_id, dataset_id, workspace_id, workspace_path)
Function to download a dataset from OmixAtlas and save it to Workspaces.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
repo_id |
str
|
repo_id of the Omixatlas |
required |
dataset_id |
str
|
dataset id that needs to be saved |
required |
workspace_id |
int
|
workspace id in which the dataset needs to be saved |
required |
workspace_path |
str
|
path where the workspace resides |
required |
Returns:
Name | Type | Description |
---|---|---|
json |
json
|
Info about workspace where data is saved and of which Omixatlas |
update_datasets(repo_id, source_folder_path, destination_folder_path='', priority='low')
This function is used to update an existing data into an OmixAtlas. Once user runs this function successfully, they should be able to see the ingestion status on the data ingestion monitoring dashboard after ~15 mins.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
repo_id |
str/int
|
repo_id for that Omixatlas |
required |
source_folder_path |
dict
|
source folder paths from data and metadata files are fetched. In this dictionary, there should be two keys called "data" and "metadata" with value consisting of folders where data and metadata is stored respectively. |
required |
destination_folder_path |
str
|
Destination folder structure in s3. Users should use this only when they want to manage the folder structure in the backend. It is advised to not not give any value for this, by default the data goes in root folder. |
''
|
priority |
str
|
Priority at which this data has to be ingested into the OmixAtlas. The default value is "low". Acceptable values are "medium" and "high". |
'low'
|
Raises:
Type | Description |
---|---|
paramError
|
If Params are not passed in the desired format or value not valid. |
RequestException
|
If there is issue in data ingestion. |
Returns:
Type | Description |
---|---|
pd.DataFrame
|
pd.DataFrame: DataFrame showing Upload Status of Files |