Release Notes

May 30, 2025

Releases

Polly Knowledge Graph (Polly KG)
- Polly KG is a cutting-edge knowledge graph-based solution that organizes biological data into a structured, interconnected framework. Using this, users can generate hypotheses and gain a comprehensive understanding of data related to specific targets, indications, and drug classes.
  - Key Features
    - Unified Knowledge Base: Combines internal and public biomedical data into a harmonized graph model.
    - Entity Linking & Relationship Mapping: Connects genes, proteins, diseases, and drugs in a queryable network.
    - Multi-Modal Support: Integrates evidence from omics, clinical trials, and literature.
    - Querying using Natural Language:
      - Polly Co-Scientist, an AI-driven assistant designed to simplify complex biomedical data analysis by converting natural language queries into Cypher commands to interact with Polly's Knowledge Graph, without writing a single line of code.

April 8, 2025

Releases

Atlas
- This release brings major improvements to the Atlas service of Polly python, enhancing flexibility, data integrity, and relational modeling capabilities. The focus is on enabling better schema design and support for a wider range of data types, making Atlas more powerful for structured data use cases.
  - Key Features
    - Expanded Data Type Support: Atlas now supports a broader range of data types. Numeric, Character, Boolean, Date/Time, JSON, Monetary. This enables more expressive and flexible data modeling.
    - Foreign Key Relationships: Support for one-to-one, many-to-one, and many-to-many relationships enables more powerful relational querying and ensures consistency across tables.

November 29, 2024

Releases

Atlas
- Atlas combines the flexibility and user experience of spreadsheets with the scalability, data integrity, and queryability of relational databases, designed to handle data management and analysis needs at scale.
- Atlas is a powerful tool within the Polly-Python ecosystem that allows you to create and manage collections of tables with user-defined schemas.
  - Key Features
    - Flexible Table Schemas: Customize your data structure to fit your organization's specific workflows.
    - Programmatic Data Ingestion: Easily import data into your tables using Python code.
    - Advanced Querying: Perform complex queries across multiple tables within an atlas for granular results.
    - Multi-omics data: Allows you to create and manage collections of tables with user-defined schemas. It's particularly useful for handling: Omics data, Clinical data, Patient data and Non-omics assay data.

September 2, 2024

Update

We've added CITE-Seq to our repertoire of offerings. Polly now can import CITE-Seq data from diverse public sources including Gene Expression Omnibus, Human Cell Atlas, Single Cell Portal, CZI, and publications. We can cater to data from these public sources and data generated in-house by research groups. The AI-ready CITE-Seq datasets are processed with our standardized processing pipeline, annotated with metadata fields, and quality control with a comprehensive checklist of metrics, specifically defined for single cell and CITE-Seq data.
- Similar to Single-cell RNAseq data, CITE-Seq data can also be requested in two formats: Raw unfiltered counts and custom-processed counts.
- CITE-Seq data include two counts matrices (for protein and RNA modalities) and any control antibodies included in the antibody panel used for measuring the surface proteins.
- These AI-ready datasets are ideal for several downstream use cases, including accurately understanding cell types and cell populations, immune profiling, or Integrative Multi-omics analysis.

July 5, 2024

Releases

Polly Design System
- We are thrilled to announce the release of Polly's new design system. This update focuses on improving the user experience without changing existing features, providing a more polished visual interface.
  - Key Features
    - Responsive and Interactive Design: We have enhanced the responsiveness and interactivity of our platform, ensuring that it adapts seamlessly to various screen sizes. This helps to create an optimal viewing experience for all users.
    - Symmetrical and Readable Layout: Our new symmetrical layout creates a balanced and harmonious visual composition, making the interface design feel more organized and easier to navigate. We have also improved readability through typography, font size, and color choices, creating a clear and concise hierarchy of information.
    - Helpful and Guiding User Experience: We have implemented smart onboarding, guides, helps, and documentation to provide users with a clear path to follow. This helps them understand the basic features and functions of our platform, making it easier for them to get started and achieve their goals.
  - Improvements
    - Enhanced User Experience: The new design system is designed to enhance the overall user experience, making it easier and more intuitive for users to navigate and interact with the platform.
    - Consistency and Usability: By refining the size, state, and variation of UI components, the design system ensures consistency and improves usability across the platform.
    - Improved Legibility: Our well-structured text hierarchy improves the overall legibility of the design, making it easier for users to read and comprehend the content.
    - Improved Visual Appearance: The design system provides a more refined visual appearance, updating the platform's look and feel to a more modern and sophisticated design.

May 31, 2024

Releases

Polly Notebooks
- We are excited to announce the integration of SageMaker into our platform, replacing the existing Polly notebook infrastructure. SageMaker is a cloud-based platform that enhances the user experience of notebooks. This update aims at improving the stability and performance of the notebooks and enhance it’s usability.
  - Major changes are:
    - Machine Options: Machine options on notebooks have been renamed for better categorization.
    - Docker integration: Build, run, and develop Docker containers directly within Polly Notebooks.
    - Built-in package installers for Notebook packages and library installation in Python.
    - The terminal and file explorer are now directly available on the toolbox.

For details, refer to the Polly notebooks section in the documentation.

April 1, 2024

Releases

Polly Data Model
- This release introduces a structured framework that integrates disparate data sources, making them interoperable and analysis-ready.
- The Data Model preserves vital connections between projects, cases, clinical data, and molecular data, ensuring precise linkage to the respective data file objects. It accommodates any number of entities based on complexity and needs.
- This well-designed data model optimizes data storage and retrieval, enhancing performance and reducing the time required to access and integrate relevant information.
  - Data types: We support transcriptomics, proteomics, mutation, copy number, structural variants, single cell, and assay data.
  - Intended users: Organizations that generate multiomics data and perform integrative analysis with multiple data types to generate insights.

Update

We've added spatial transcriptomics to our repertoire of offerings. Polly now can import Spatial Transcriptomics (SRT) data from diverse public sources including GEO, Single Cell Portal, Zenodo, CZI-CellxGene, and publications. We can cater to data from these public sources as well as data generated in-house by research groups. To import SRT data successfully, access to essential components such as the raw counts matrix, spatial coordinates, imaging data, and metadata is required.
- Similar to Single cell RNAseq data, Spatial data can also be requested in two formats : Raw unfiltered counts and Custom processed counts.

February 26, 2024

Updates

Users can now select file formats for downloading datasets based on their requirements. This enhancement allows users to choose the format that best aligns with their downstream analysis. For example: Users can now download single-cell datasets in multiple file formats. With this update, users have the flexibility to directly download either or both h5ad and seurat files from the UI.

Users can now access enhanced cell type visualization details in the single-cell datasets, where they can overlay one metadata attribute on another. In this update, we have included curated fields alongside curated cell type and the number of cells. Within each of these curated disease fields, users can view the corresponding cell numbers and overlay additional metadata attributes.

November 27, 2023

Releases

Polly Verified
- Polly ensures that all available datasets have undergone standardization checks, establishing that every dataset adheres to uniform formats, ontologies, and controlled vocabularies, eliminating inconsistencies.
- Polly’s verified QA ensures that each dataset undergoes rigorous quality checks.
- Metadata Quality Checks include:
  - Schema Compliance and Lexical Errors
  - Ontology-backed fields validation
  - Logical Errors reconciliation
  - Completeness Check ensuring no empty fields

Update

Users can now track curation metrics to identify inefficiencies at different points of the curation journey.

October 20, 2023

Releases

CellxGene VIP
- Users can access the open-source CellxGene visualization plug-in (VIP) tool directly on the Polly platform.
- Users can now maximize the benefits of the interactive web browser-based interface for setting analysis parameters, visualizing data, and utilizing the command line interface functionalities.
- Users can use the Spatial Transcriptomics Visualization Analytical Tool for interactive visualization features like Embedding plots, Multi-tSNE/UMAP plots, Dual-gene plots, Sankey Diagram, Trajectory Analysis Plots, and Stacked violin plots.

Polly Discover
- Polly Discover is a suite of modules enabling users to unravel Complex Biology with expression signatures derived from omics data.
- It features interactive visualizations for the seamless interpretation of expression results, allowing users to enrich these findings with existing knowledge bases or integrate them into meta-analysis methods, ML applications, and more.

Update

Polly-Python:
- Users can sync the data between a workspace and a local directory, with the added functionality of automatic folder creation if the designated path does not exist.
The monitoring dashboard now features an upgraded Status filter, allowing users to allocate statuses to runs marked as partially completed or rejected.

August 14, 2023

Releases

Polly-python:
- Users can now perform meta-analysis using Polly-python and a shiny app derived from it.

Update

Users can now preview reports and PDFs in the workspace GUI.
Datasets can now be moved instantly for consumption on the UI.
Polly-python:
- Authentication is getting more standard and secure with bearer token-based authentication.

July 17, 2023

Releases

Cell-type distribution numbers are available for single-cell datasets on the revamped details page under the Cell Type Visualization tab. The Metadata Table tab contains a schema-driven and customizable metadata table that can be used to interact with cell-level metadata, while the Dataset Overview tab provides general dataset details.
The search results on OmixAtlases have been enhanced, and the keywords found will now be ranked higher. This is because the search bar is equipped with a new NER model that can recognize disease, tissue, drugs, and cell lines.
GEO accession numbers and PubMed IDs will be highlighted in yellow in the search results when searching for datasets using them. Other text keywords will be highlighted with bold text as they appear in the different description tabs on the cards.

Update

Switching between card view and table view will be faster than before.
Polly-python:
- The `query_metadata` function has been optimized for memory utilization.
- The `identify_cohorts` function has been improved. It will:
  - Give distribution of factors to users
  - Show number of samples in all cohorts
  - Show users how to plot sunburst with custom columns as code in docs
  - Print a message when user calls the function with 1 sample and the sunburst is empty
- polly-python installation will be smoother than before as it has been tested for compatibility for python versions >3.6.

June 19, 2023

Releases

A product tour is available in the help section of the navigation bar on the left side. A series of popup messages highlighting the different features and steps will be offered to the users for their onboarding on Polly.
Users now have the choice to log out of all currently open sessions when they change their passwords.

Update

Users will now be able to access the datasets they had shortlisted in the Public Data Atlases even after logging out because the datasets in the shortlist will still be available after the user logs out of the current session.
The total number of datasets in the Public Data Atlases that are available is now reflected accurately in the dataset number on the OmixAtlas homepage.
When applications are opened, more precise timing and the dataset size are displayed.
The quality of data with respect to differentiating empty fields has been improved by ensuring consistency between ‘Normal’ inputs in the disease field and 'none' inputs in other fields.

May 22, 2023

Releases

The Details page of datasets in the Bulk RNAseq OmixAtlas has been revamped to offer a cleaner interface with information divided into three sections:
- Dataset Overview: This section provides basic details about the dataset such as PubMed ID, link to the source, authors, summary, etc.
- Metadata Table: A table with cleaner and more harmonized metadata columns compared to the data source.
- Metadata Charts: A section where you can create interactive sunburst plots with up to 4 metadata fields plotted at a time. By default, experimental factors are represented in the sunburst plot.
Experimental variables that vary within the samples of a dataset are available as a separate field on Polly called `experimental factors`. This field will contain a list of variables. The list of variables can be found on the table view of the Bulk RNAseq OmixAtlas, in the metadata tables of datasets, and in the dataset overview section of the Details page of datasets.
Interactive filters and more relevant graphs are available in the summary pages of OmixAtlases.
The metadata fields available on Polly’s Bulk RNAseq OmixAtlas have been given cleaner, more intuitive, and more harmonized names to make them easier to understand.
- All the source metadata at the sample level will be available under cleaner and harmonized column names. Eg. `characteristics_ch1_2_treatment` → `treatment` , `characteristics_ch1_1_genotype` → `genotype` , `strain_ch1` becomes `strain`
- Merged columns where information was spread across two or more columns. The merged information is present in one unique column name. Eg. `characteristics_ch1_0_cell_type` , `characterstics_ch1_1_cell_type` → `cell_type`

Update

When users return to the Monitoring Dashboard, the filters that they had previously set will be retained.
When users attempted to transfer datasets from the public data atlas to the user's data atlas, the ingestion process used to be triggered 15 minutes after the attempt. This gap has been removed, and the ingestion process begins immediately after the transfer has been triggered.
OmixAtlas landing pages will load faster than before, almost instantly.
The majority of the single cell datasets load on Cellxgene within 25-40 seconds, much faster than before.

April 24, 2023

Releases

With cell-type ontology recommendations on the OmixAtlas filters, users receive matching and related cells for the cell types they enter. These recommendations contain cells that are hypernyms, hyponyms, or a part of the given cell type. If the user enters the name of a tissue, all the cells that are part of that tissue will be recommended to the user.
New datasets on GEO will be added to the Bulk RNASeq OmixAtlas every two weeks as they come on to GEO.
The Monitoring Dashboard has been enhanced with optimized filters, and more accurate statuses for runs.
While querying on polly-python, users can now query based on ‘views’ available in the OmixAtlas. This will help generate query responses specific to the dataset source and/or data type mentioned in the schema.

Update

The failure issues related to transferring datasets from the Public Data Atlases to the User Data Atlases have been fixed.

March 27, 2023

Releases

The Bulk RNA-Seq and Single Cell OmixAtlases have a shortlist page that can be accessed using the shortlist icon on the top left corner of the View Datasets screen. Datasets that users want to shortlist before buying will appear here.
Multiple datasets can be selected for shortlisting in one go using the check box feature on each dataset both in Card View and Table View of OmixAtlases.
Academic users can sign up directly on Polly, for the Play with Polly project, using their .edu or .ac email addresses.
Polly-python supports data ingestion of .CSV files which may include: datasets, metadata files, analyses results and so on.
The metadata fields ‘year’ and ‘authors’ have been added to all the datasets in the Bulk RNAseq OmixAtlas.
Polly is now SOC-2 certified. This means that users' data is secure and protected against data breaches. This certification also overlaps with other industry standards such as ISO 27001, HIPAA, etc.

Update

UI improvements have been made to the side navigation bar on Polly, the OmixAtlas page and the OmixAtlas cards.
Users will be notified if they are using an older version of polly-python.

February 27, 2023

Releases

When users search for keywords on the OmixAtlas search bar, they will also get results that match all the available sample-level metadata.
Abstracts are now available for datasets in the the Bulk RNA-seq OmixAtlas.
.gct files can now be directly downloaded from Phantasus when accessed through the user’s Destination OmixAtlas.
Restricted files, that can only be accessed by users who have access to the workspace in which the file is present, can now be downloaded.

Polly-python

The move_data function can be used to swiftly move data from the source to the destination OmixAtlases without having to download and re-upload the data.
The entire sample level metadata for a dataset can be obtained using the get_metadata function.
To download datasets easily, users can use the new download_dataset() function.
Upon changing the schema of an OmixAtlas, users will be alerted of the impact of the specific change.

Update

On **polly-python**, upon changing the schema of an OmixAtlas, users will be alerted of the impact of the specific change.
On the OmixAtlases, the performance of omixatlas summaries, dataset details, datasets search and search filters has been improved to facilitate a smoother experience.

January 30, 2023

Releases

For base dockers, python, R and pollyglot, memory & CPU utilization will be displayed within the notebooks.
Users will be informed when they utilize 70% of the memory in their notebooks.
The notebook loading page will show the time it will take to load notebooks
Drug ontology recommendation is now available on the OmixAtlas UI. Structurally similar drugs will be recommended when users search for drugs and on searching for genes, drugs that target the gene will be suggested.
OmixAtlas cards on the OmixAtlas home page have tags that differentiate source and destination OmixAtlases.
The Single-Cell OmixAtlas has been created with 1840 cell-type curated datasets from multiple sources, including high value publications.
Manually curated labels for age, gender, donor and sampling site have been added to the Single-Cell OmixAtlas.
Files, notebooks and analyses can be pinned on workspaces for easy accessibility.
The link_report_url function on polly-python can be used to link a dataset_id to a URL.

Update

Reports linked to datasets can now be previewed instead of being downloaded and then viewed.
The schema is automatically validated before inserting a new schema or updating the existing schema of an OmixAtlas.
In the download metadata function, users now have the option to select the keys to be field_name or original_name.

January 6, 2023

Releases

For HTML files in the workspaces, users can download the dependencies (images, folders, CSS files, linked HTML files, etc.) via the UI.
Deleted files/folders from workspaces can be retrieved for up to 15 days.
Drug ontology recommendation will be available via polly-python. Structurally similar drugs will be recommended when users search for drugs and on searching for genes, drugs that target the gene will be suggested.
The following fields have been updated for all GEO single-cell datasets:
- Datasets level: Tissue, Disease, Cell Line, Organism, Gene, Strain, Drug.
- Sample Level: Tissue, Strain, Disease, Drug, Genotype, Cell Line, Control Perturbation.
Sample-level cancer stage and dataset-level treatment and clinical labels were added to the new BulkRNASeq OmixAtlas with >42k datasets (Kallisto raw counts).
Users now have an option to move datasets from the Bulk RNA-Seq and Single Cell OmixAtlases to their organization’s OmixAtlases by themselves.
Users can request for additional services such as curation and QC directly from the OmixAtlas UI.

Update

The workspace search is faster, offering better results based on the user’s query.
New single-cell datasets from Broad SC Portal, ExpressionAtlas and Covid19 Atlas have been added to the Single-Cell OmixAtlas.
The speed of dataset ingestion for Bulk RNA-seq data has been improved 30x. 100 datasets are currently being added per day.
Higher anndata version (>=0.8) is supported while indexing .h5ad files.
Arrays can now be stored in sample level metadata (.gct files).
The visibility of sample level metadata in the Details Page has been improved for understanding a dataset better.

December 5, 2022

Releases

.h5seurat files are now supported on OmixAtlases.
3 new fields have been curated for 101k GEO datasets on the platform - cancer stage, chemical treatment and clinical labels.
On the OmixAtlas, the filter result counts for each category are visible and clickable to enable easy navigation to relevant datasets. For instance, on filtering diseases, users will get 18414 normal samples, 2063 neoplasms and 1374 obesity samples which users can access by just clicking on the respective filter entries.
The OmixAtlas table view can be expanded now since the metadata fields visible on the view can be customized through the schema. This offers flexibility for locating useful datasets.

Update

Ontology recommendation on OmixAtlases has been improved. Now, the recommendations will be more relevant, accurate and with more unique suggestions.
40k RNA-seq datasets from GEO were added to a new OmixAtlas.

November 7, 2022

New

Using the cost dashboard on the Polly Admin panel, an organization admin can track the compute cost for every user based on several parameters - machine type, usage of apps, jobs and notebooks, etc.
Users can now reorganize the notebooks within workspaces, at their convenience, by creating and cloning notebooks inside folders.
Users will receive notifications detailing the data sources and types of the new datasets ingested into the OmixAtlas that they have subscribed to.
The free text search bar on OmixAtlases enables search for keywords across dataset-level metadata fields. E.g., title, description, summary, tissue, drug, disease, etc., depending on the schema of the OmixAtlas. In this release, it supports even more precise and advanced search operations using logical operators:
- And &
- Or |
- Not ~
- Group ()
- Exact match ""

Update

Using the schema management module of Polly-python, users can customize the following on the OmixAtlas:
- columns for table view
- filters
- availability of ontology recommendations
- search fields
Curated metadata from the curation app can also be exported as a . file apart from .json.
New users can accept the End User License Agreement (EULA) directly on the Polly platform during their first login.

October 28, 2022

New

Users can now run jobs, cancel jobs and fetch status of jobs.
There is a new function for schema validation.
Users can download dataset level metadata.
Users can geenrate a merged GCT file from a cohort.

October 10, 2022

New

A card view layout page is available where you can sort and filter workspace cards. Tags can be attached with workspaces for better findability.
Users can mark workspaces as favorites and also watch workspaces to start receiving notifications specific to them. They can be sorted by their creation date for easier access.
The curation app can be accessed through the side navigation bar on Polly. Users can now switch between reviewer view and curator view directly from the UI.
With the newly deployed ingestion dashboard, users can track all ingestion runs made to OmixAtlas in real-time, view logs in the event of a failure, and list the completed ingestion runs.
- With the newly deployed ingestion dashboard, users can track all ingestion runs made to OmixAtlas in real-time, view logs in the event of a failure, and list the completed ingestion runs.
- Track running time and logs is possible for failed/rescheduled jobs too.
Schedule compute jobs and apps(notebooks and shiny) on separate nodes to optimize/reduce cost for both groups and increase stability.
Support for multiple docker images in the same Nextflow workflow is available.
The drug labels for Tier 2 datasets on GEO OA are now more accurate and relevant.
- All datasets and samples are now annotated with Pubchem identifiers (a change from CHEBI) and encompass a wider range of drug classes (including monoclonal antibodies).
- The more accurate disease labels for Tier 2 datasets on GEO OA will lead to an improved cohorting and search experience.

Polly-python:

The curation library is integrated with polly-python which will enable users to recognize entities in a given text, standardize them and tag them in a text with standardized nomenclature/ontology.
Schema management-related functions are now upgraded to enable users to update and replace schema. Update schema is to be used to make minor edits in the existing schema, and replace schema is to be used for completely replacing the existing schema of an OmixAtlas.
Schema validation functions have been released to enable users to validate the schema prior to inserting/updating or replacing the schema.
Results of query_metadata are now sorted alphabetically across the columns to improve the UX of browsing in the data frame
The following file types/extensions can be ingested - biom, zip, tar.gz, gct.bz, vcf.bgz, fcs, fs.

Update

Data Addition:
- 30k datasets were added from HugeAMP and OpenGWAS.
- HTAN single cell studies were added.

September 19, 2022

Update

.nf files can now be uploaded to workspaces.
New machines are available to run jobs:-
- mix5xlarge: 16 vCPUs, 512.0 GiB RAM & 2TB storage
- mix6xlarge: 24 vCPUs, 768.0 GiB & 2TB storage
- mix7xlarge: 64 vCPUs, 1024.0 GiB & 3TB storage
A new version of Polly-CLI is available. It includes performance improvements and bug fixes related to job logs.
If the OmixAtlas has multiple sources and/or datatypes, user can now put a schema specific to all the sources and/or datatypes. They will also be able to query the table as specific to source and/or datatype as defined in the schema.
1500 curated CPTAC datasets have been added to Polly.
Cohort creation is now enabled for all types of Public and Enterprise OmixAtlas
Cell line ontology recommendations are now available on polly-pyhton and the front-end of the OmixAtlases on Polly.
Card View on the OmixAtlas now allows sorting via recency of data ingestion.
A View Details Page is now available for the new Expression Atlas datasets.
Polly-python:
- The data matrix within h5ad files (single cell data) can now be queried.
- The curation library is integrated with polly-python which will enable the users to recognise entities in a given text, standardize entity and tag the entities in a text with standardized nomenclature/ontology.
- The replace_schema feature allows users to replace the existing schema entirely.
- Schema validation functions have been released to enable users to validate the schema prior to inserting/updating or replacing the schema.
- Older reports linked to datasets in any OmixAtlas can now be deleted and newer ones can be linked to the same datasets.
- The rows and columns that appear in the results of query_metadata are now sorted alphabetically to make browsing through the dataframe easier.
- Ingestion of the following file types/extensions is enabled in OmixAtlases - biom, zip, tar.gz, gct.bz, vcf.bgz, fcs, fs.

September 5, 2022

New

Microbiome data is now supported on OmixAtlases. The data can be searched and queried on Polly and can be downloaded in the form of a BIOM file.
‘Cancer Stage’ is the newest metadata field and it has been curated for 974 datasets on GEO.
Reports present in workspaces can now be linked to datasets in OmixAtlases.

Update

‘Select All’ is now an option for the filter results while searching and filtering datasets on the OmixAtlas UI.
Identify datasets that do not contain data matrices on the OmixAtlases through the new ‘Metadata-only’ labels.
Send requests to obtain data matrices for specific datasets.

August, 2022

New

The Nextflow integration with Polly CLI allows users to run any Nextflow bioinformatics pipeline with parallel processing on Polly’s scalable computational infrastructure.
OmixAtlases are now available as shareable links through Workspaces.
797 clinical datasets from MIMIC are now available on Polly, curated for four metadata fields “Drug, Dose, Frequency, and Strength.”
With the recent Polly Python version release, users can now:
- Copy files/folders in workspaces from one workspace to another.
- Add datasets to and delete datasets from OmixAtlases.
- Link and fetch reports to a dataset on any OmixAtlas.
- Get auto-generated metadata summaries for datasets present on the GEO OmixAtlas by giving the GEO accession ID as an input. This helps to improve findability and estimate the relevance of the dataset.
- Cell line recommendations are now available, to select multiple related cell lines. Users can start the search with a disease, tissue or cell line and receive recommendations for related or matching cell lines.
- The recommend functionality is available for disease, tissue and cell lines at sample level metadata queries as well.
New datasets from the following repositories have been added to Polly OmixAtlases in the last month:
- DepMap OmixAtlas - 4312
- GEO OmixAtlas - 922
- Single-Cell OmixAtlas - 240

Update

These are the major Polly Python updates to existing functionalities:

The complete schema for tables in an OmixAtlas can be fetched in the form of a dictionary.
Schema for feature level metadata is now available.
Schema for single cell and GWAS data can be retrieved.
Ontology recommendations for sample-level queries have been enabled.

March, 2022

New

Users can now host docker based applications and add their own notebook environments using Polly CLI.
Users can launch opensource notebooks from GitHub directly on Polly’s compute environment.
Users can save dataset from OmixAtlas to workspace as well as upload/download files and folders to/from workspace using polly-python.
Users can filter the schema in an Omixatlas specific to the source & data_type using polly-python.
Users can access the installed version of polly-python inside python shell or jupyter notebook cell.
Users can create cohorts for TCGA Transcriptomics and Mutation data using polly-python.
Users can now create an Omixatlas using polly-python.
Users can query datasets, samples and features on polly-python across multiple OmixAtlases at once. Find examples here.
Recommended disease ontologies are displayed when a user queries disease field in OmixAtlas.
Users can now copy/move files (except analysis files) or folder across workspaces and folders on Polly frontend.
Users can now launch notebooks situated within folders or sub-folders.
Users can also filter folders from workspace contents.

Update

polly-python users can now access schema functions via both repo_id and repo_name.
polly-python users can easily convert .gct file format to .maf file format in TCGA and cBioportal repositories using a file format converter function.
211 Single Cell datasets were added to OmixAtlas with cluster-level cell type annotation.
For 3.5k GEO datasets platform field was updated on Polly.
Users can now also preview tsv files along with other files types on Polly.
Cell types were added for fetal single cell atlas in Polly.

January, 2022

New

Users can create workspaces and fetch list of workspaces using polly-python.
Change in authentication process - Until now, the users had to authenticate each class separately. In this release, a global authentication mechanism has been developed using which users can authenticate multiple classes (such as OmixAtlas, Workspaces) using a single authentication step.
Users can publish personal notebook on Polly workspace environment and analyse vcf files using Hail docker.
Preview of all standard file types is available on Polly UI. Different file types like xls/xlsx, pdf, html, csv, png/jpg/gif, ipynb can be opened directly on Polly without a third party service.

Update

2792 datasets, 74713 samples have been annotated for tissue and cell line tags.
Strain has been added as a queryable field at dataset-level for GEO datasets.
A bug in Sort by Relevance when searching over description in OmixAtlas table view was fixed. Other sort related bugs have also been fixed.
360k datasets from the UK Biobank were added on Polly.
149k Immport lab datasets were added on Polly.
59k RCSB datasets were added on Polly.
Cell type curated for 229 Single Cell datasets were added to OmixAtlas.
Sample level age labels were added to all datasets for TCGA and GEO

November, 2021

New

Workspace contents like files, analyses, reports can now be sorted in workspaces based on name, created date, last modified, author, and type.
15,000 Microarray and WES (Whole Exome Sequencing) datasets from PPMI have been curated and added to Polly.
Dataset Overview (containing Title, Publication, Abstract, Tags for the Data, sample information as summary plots and table, processed data as a table) for every dataset can be viewed using the “View Details” option beside datasets in OmixAtlases.
The Datasets (gct, h5ad, vcf) files can be downloaded from the Options Menu in the Card view or from the View Details Page.
Request for Dataset option is available at multiple places within the OmixAtlases.

Update

Resolved a bug in El-MAVEN instance termination.
Curation information of 45,000 datasets from gnomAD and DepMap has been updated.

October, 2021

New

In addition to Liver OmixAtlas, polly now has GDC, GEO, cBioPortal, PharmacoDB, LINCS and Metabolomics OmixAtlases.

Update

Added 17500 datasets on Polly
Genotype, age and gender annotation were added to 3,900,000 GEO samples

September, 2021

New

New compute machines Mi5xlarge (32 vCPU, 250GB RAM), Mi6xlarge (64 vCPU, 500GB RAM), Mi7xlarge (64 vCPU, 970GB RAM), GPUsmall (1 GPU, 8 vCPU, 60GB RAM) and GPUmedium (4 GPU, 32 vCPU, 240GB RAM) were added to EL-MAVEN.
Introduced Polly Files (beta version), a desktop application for transferring files between computer and Polly Workspaces.

Update

Resolved a bug that caused an error in SQL query on “’” expansion.
gnomAD was enriched with 96,000 new datasets of WES and WGS type.
Fixed table view bugs and enhanced UI on OmixAtlas.
New datasets totalling 48,000 were added to Immport, HPA, CPTAC and GTex.
Auto-curated tags totalling 770,000 were added to polly datasets.
A bug affecting folder deletion was fixed.

August, 2021

New

Introduced Polly Python Library facilitating powerful search capabilities across dataset, sample, and feature level metadata on any computational environment through code.
Introduced “View Only Access” on Polly Workspaces – an enterprise grade permission giving more control to admins.
Enabled Voila Dashboards within Polly Notebooks.
Introduced an application resource monitor on EL-Maven, enabling users to monitor the progress of a job and make decisions about requirement of a bigger machine.

Update

Over 155,000 datasets were added to LINCS OmixAtlas on Polly.

July, 2021

New

Liver OmixAltas released.

June 18th, 2021

New

Added 12,200 new curated transcriptomics and single cell datasets to various Data Lake.

June 4th, 2021

New

Made Dual Mode Visualizatopn application and Untargeted Pipeline Application lighter for heavy datasets to avoid memory leakage.
Added 3.350 new curated transcriptomics and single cell datasets to various Data Lake.

Update

Resolved issue in Lipidomics, Dual Mode and Polly El-MAVEN Applications.

May 21st, 2021

New

Introduced Google Slide Intergration with Polly Notebooks.
Added Reporting feature using Markdown in Dual Mode Visulaization Application.
Added 23.350 new curated transcriptomics and single cell datasets to various Data Lake.

Update

Added Scree Plot under Quality Check in Dual Mode Visualization Application.
Resolved forgot password issue.

May 7th, 2021

New

El-MAVEN latest beta version is now available on Polly.
Added 25.120 new curated transcriptomics and single cell datasets to various Data Lake.

Update

Added two-way ANOVA capability in Dual Mode visualization Application along with combining multiple conditions or cohorts while performing differential expression.

April 23rd, 2021

New

Introduced an Admin Dashboard to provide account administrators the convenience to manage their accounts.
Added 10,120 new curated transcriptomics and single cell datasets to various Data Lake.

Update

HTML as a data file can now be opened through Workspace itself.
Resolved an issue to make Dropbox work seamlessly with Workspaces.

April 9th, 2021

New

Created Shiny and Studio applicationS for feature level search of GEO Datasets.
Added 11,177 new curated transcriptomics datasets to TCGA Data Lake.

Update

PDF as a data file can now be opened through Workspace itself.
Resolved an issue to make Google Drive work seamlessly with Workspaces.

March 26th, 2021

New

TEDDY (The Environmental Determinants of Diabetes in the Young) and DEPMAP (Dependency Map) Data Lakes have been added on Polly.
Introduced option to directly export data to the workspace from a Studio Preset.
Added 37,177 new curated datasets corresponding to various omics to different Data Lakes.

Update

Updated Polly Login User Interface.
Added additional filters to TEDDY Data Lake.
Resolved issue with app hosting infrastructure to increase stability of apps for better user experience.

March 12th, 2021

New

Introduced Docker building feature on Polly CLI which enable users to build dockers, check their build status and logs and push dockers to Polly.
Added 11,470 new curated transcriptomics and single cell datasets to different Data Lakes.

Update

Better accessibility to datasets within OmixWiki with accessibility to metadata filtering options.

February 26th, 2021

New

Introduced the functionality that enables the users to host their own application on Polly by using Polly CLI.
Enabled feature level querying for GEO Data Lake.
Added Genomics docker for variant calling and annotation.
Added a new notebook environment for Genomics Variant Analysis.
Enabled partial string search for dataset id in the search bar.
Added 20,096 new curated transcriptomics and single cell datasets to different Data Lakes.

February 12th, 2021

New

Introduced the status page for real time updates on Polly’s status, downtime, incidents, and maintenance.
Added auto-run feature for selected Studio Presets.
Enabled component updating and versioning by component creator.
Added 11,580 new curated transcriptomics datasets to GEO and LINCS Data Lakes.

Update

Updated the UI of visualization dashboard of Data Studio for better visibility.
Updated all notebook dockers with the latest version of discoverpy (0.0.10).
Added finer error and warning messages to CLI.
Removed the 1000 row limit on query results in CLI.

January 29th, 2021

New

Public sharing of the reports created within any Studio session is now available on Polly.
Added 14,727 new curated transcriptomics and metabolomics datasets with 9,513 transcriptomics datasets being added to the LINCS Data Lake.

Update

Added specific error message to indicate presence of multiple groups with the same compound name in Labeled LC-MS Workflow.
Added specific error message in Labeled LC-MS Workflow if isotopologues of the compound are spread over different metagroups in El-MAVEN output.

January 15th, 2021

New

GTEx Correlation and Enrichment Analysis preset is now available which can be used to identify enriched pathways based on the gene correlations.
Added TraceFinder Downstream Analysis preset with additional feature of translating the analytical insights into shareable dashboards.
Added 1,836 new curated transcriptomics and proteomics datasets to different Data Lakes.

Update

Enabled use of retention time information for metabolite identification and updated Untargeted Pipeline library to handle already identified metabolities.

January 1st, 2021

Update

Updated Untargeted Pipeline to be compatible with El-MAVEN's peakML output.

December 18th, 2020

New

LINCS(Library of Integrated Network-Based Cellular Signatures) repository with 19,520 curated datasets has been added in Data Lake.

Update

Added ANOVA Test and updated Limma Test with extra filters for volcano plot and Heatmap for the differentially expressed results in the Dual Mode Data Visulaization.

December 4th, 2020

New

We now support reactions from Chinese Hamster Ovary (CHO) for integrated pathway analysis in IntOmix.

Update

Resolved timeout error for opening a folder containing large number of files within a Workspace.
Resolved issue with Workspace root directory redirection on selection.

November 20th, 2020

New

Improved OmixWiki UI for better consumption.
Added the ability to clone Notebooks within Workspaces.

Update

Added granular error messages for Notebook functions and CLI jobs.
Resolved the issue with renaming large data files.
Resolved the issue with folder breadcrumb in Workspaces.
Fixed involuntary logout issue.

November 6th, 2020

New

Data transfer time limit has been extended to 8 hour enabling transfer of 1TB data through CLI at once.

Update

Updated user interface of Discover and Data Studio.
Added filtering interface to GEO data lake.
Added search functionality on Discover interface.
Added highlight and cumulative size feature on multiselection in Workspaces.
Updated collaborators icon to show number of collaborators.
Resolved inconsistent log₂FC values for multiple comparisons in IntOmix.
Resolved sample name descrepancy in concentration plot of QuantFit.
Fixed table column resizing error on filtering interface.
Resolved a bug in Polly Docker Domain.

October 23rd, 2020

New

Hosted our first User Group Meeting.
Introduced our public platform OmixWiki, showcasing top 100 cited COVID-19 publications with end to end omics analysis.
Released the newest version of El-MAVEN v0.12.0.

Update

Updated Workspaces user interface.
Added filtering interface to COVID-19 data lake.
Updated datasets searchability on dataset ID and description.
Fixed incorrect memory error in CLI.

October 9th, 2020

New

Introduced the option to make dockers on Polly public by adding public docker domain.
Welcome screen now displays the username.
Decreased launch time for applications and notebooks through horizontal pod scaling and buffering.

Update

Fixed landing on Discover after logging in error.
Fixed priority assignment of automated jobs error.
Fixed renaming files after upload error.
Fixed 404 error in Metabolomics Data Lake.
Integrated documentation to every application.

September 25th, 2020

New

Introduced Labeled LC-MS Analysis preset for natural abundance correction and visualization for single or dual labeled LC-MS data combined with an interactive, customizable and shareable reporting dashboard.
Integrated pathway visualization in Labeled LC-MS Workflow.
Added dilution factor and protein normalization in the Lipidomics Visualization Dashboard.

Update

Added warning message to prevent duplicate folder creation in Workspaces.
Fixed nested folder creation and notebook renaming error in Workspaces.
Fixed 503 error in Metabolomics Data Lake.
Fixed a bug associated with notebooks and shiny apps opening to a blank screen.
Fixed error occurring in automated jobs.

September 11th, 2020

New

Introduced Data Studio that brings the tools you need to create, customize, and share your analysis effortlessly with your team across the world.
Introduced CCLE Correlation Analysis for identification of features correlated with a gene mutation such as mutations in other genes, expression and sample level metadata.

Update

Updated the version of scanpy to 1.6.0 in single cell docker.
Fixed a bug in notebook giving error with CLI commands.

August 28th, 2020

New

Introduced a metabolomics docker equipped with packages for analysis of metabolomics data.
Added restore functionality to all the Data Lake applications.
Added boxplots for lipids in Lipidomics application.

Update

Updated discoverpy package in all the dockers to the latest version.
Fixed CellxGene visualization loading for specific datasets.
Fixed duplicate metabolite generation issue within the Dual Mode Data Visualization application.
Fixed minor UI issues in Workspaces.
Decreased Workspaces loading time.

August 14th, 2020

New

Introduced Workspaces on Polly, which is a new and improved version of Polly Projects.
Added GTEx app to process the filtered datasets from GTEx data lake.
Added a filtering interface for GTEx data lake that allows filtering of the data on the basis of fields within the curated dataset.
Integrated Discover and Dual Mode Visualization for processing and further analysis of transcriptomic and metabolomic and single cell filtered datasets.
Integrated Notebook to process the filtered datasets.
Hosted CellxGene for processing and visualization of single cell datasets.

Update

Enabled logs access functionality through Polly CLI.
Added the python package, Discoverpy to all the dockers.

Deprecated

The Project Management Dashboard has been deprecated and replaced by Workspaces.

July 31st, 2020

New

Added dot plot for Gene Ontology in the Discover application.
Added an extra layer of security in authentication.

Update

Allowed internal standards and unlabeled data to pass through the Labeled LC-MS Workflow to generate output.
Added Phantasus, Boxplot & Whisker plot along with the bar plot in the Discover application.
Fixed Polly CLI auto login error in notebooks.
Fixed unresponsive notebook with infinite loading.

July 17th, 2020

New

We have released the newest version of Polly CLI v0.1.18 enabling you to run a CLI job without the need of "secret" key if the private docker is on Polly.

Update

Labeled LC-MS Workflow has N and C as indistinguishable isotopes.
Improved the stability of both Shiny and Desktop Applications.
Communication within the infrastructure is now through encrypted keys.
Shiny apps as well as shiny states are encrypted during transit as well as storage.
Added encryption for the disks running the computations.
Encrypted buckets containing credentials.

July 3rd, 2020

New

We have released the newest version of El-MAVEN v0.11.0.
Polly now provides its own docker repository for easy management of dockers.

Update

Added Si as an indistinguishable isotope in Labeled LC-MS Workflow.
Introduced pre-processing functionalities along with updated selections and heatmap for visualization in Lipidomics Visualization Dashboard.

Deprecated

Deprecated El-MAVEN FirstView Integration.

June 19th, 2020

New

We now support reactions from Drosophila melanogaster for integrated pathway analysis in IntOmix.
Introduced pathway enrichment and pathway view feature along with comparative analysis in Dual Mode Data Visualization.
DEPMAP CCLE (DEPMAP Cancer cell line expression data and dependency scores for genes) repository has been added in Data Lake.
Implemented input file access from the sub-folders of a project for applications.

Update

The Single Cell Downstream docker is updated with these new packages: rpy2, anndata2ri (Python packages), ExperimentHub (R package).
Added a GPU instance for Polly CLI.

June 5th, 2020

New

Introduced visualization of labels in stacked plot within Labeled LC-MS Workflow.
Enabled least privilege access for stringent access policies.
Encryption of data in transit and at rest.

Update

Improved access logs throughout the platform.
Enhanced security using a secrets management service.
Implemented regular backups and versioning of data.

May 22nd, 2020

New

Introduced Polly QuantFit node in Compound Discoverer^TM that allows peak picking and absolute quantification on raw data obtained from a Thermo Scientific^TM Mass Spec instrument.

May 8th, 2020

New

We now host our desktop application, El-MAVEN on Polly.
Phi calculation feature has been added to Labeled LC-MS/MS Workflow.

Update

Changed the optimized color palette in IntOmix from a red-yellow-green scale to a more intuitive red-green scale. All upregulated metabolites or genes are represented by a shade of red and downregulated metabolites or genes as a shade of green.
Changed the non-optimized color palette in IntOmix from a pink-purple scale to a red-green scale to remove ambiguity.

April 24th, 2020

New

COVID-19 (Transcriptional datasets for SARS viruses, viral infections, and therapeutics for novel coronavirus) repository has been added in Data Lake.