About Atlas

What is an Atlas?

Vast and diverse biological data are being generated every year and deposited in repositories by academic labs and organizations worldwide. These data hold tremendous potential for reuse and drug discovery but are scattered across multiple, disparate sources and lack standardization. Thus, the availability of data does not equate to its easy usability.

Atlas is a comprehensive collection of curated biomolecular data that aims to advance the understanding of human physiology and disease pathology. It aims to adopt an integrated approach with access to ML-ready biology-centric data from public and proprietary repositories to generate new insights. Curated metadata makes the datasets more searchable and queryable.

What are the FAIR principles Atlas is based upon?

Atlas is a repository of FAIR (Findable, Accessible, Interoperable, Reusable) data. It is a collection of millions of datasets from public, proprietary, and licensed sources that have been curated, harmonized and made ready for downstream machine learning and analytical applications.

Findable - The data is harmonized to make it searchable with machine-readable metadata.
Accessible - The data is available and attainable by both humans and machines.
Interoperable - The data is semantically understandable allowing data exchange and reuse within teams in the organization.
Reusable - The data can be shared and reused among different teams and seamlessly integrated with other data sources.

All datasets on Polly go through a 2-step process:

1. Data Engineering: This includes transforming data to fit a uniform proprietary data schema across several datatypes. The transformation streamlines data in one consistent schema and allows users to query multiple data types on a single data infrastructure.

2. Metadata Harmonization: This involves tagging each sample and dataset with a uniform ontology.

What are the user benefits/features of Atlas?

User benefits

Working with multi-omics data	How our Atlases can help
Finding multi-omics data relevant to the user can take weeks as data is semi-structured and scattered across several sources.	Access to thousands of tissue-derived or disease-specific RNA-seq datasets from multiple sources in one place.
Enriching data and preparing it for machine learning applications takes time and effort that could be better spent on insight discovery.	Atlas data is processed through a standard pipeline and enriched with harmonized, scientifically relevant metadata, ready for machine learning applications.
User wants access to curated data but doesn't want to switch to another computational platform.	Atlas data can be accessed and analyzed on your existing computational infrastructure.

Key features

Centralized access to thousands of RNA-seq datasets from various sources.
ML-ready data with curated and harmonized metadata.
Compatibility with existing computational infrastructure.
Continuously evolving and updated with the latest data.

How is our Atlas different from available biomedical data repositories?

The data in Atlas are curated through Polly's ML-based curation workflow that structures different types of data, harmonizes metadata, and makes the data analysis ready. All data available in Atlas can be queried and directly used in downstream statistical or ML-based analyses. With Atlas one can:

Leverage curation infrastructure to standardize data: Build machine learning applications faster with analysis-ready data. Use Polly's curation models to process public and proprietary data through standard ingestion pipelines.
Query heterogeneous biomolecular data with code: Perform integrative analysis with powerful code-based querying capabilities across the Atlas data catalog. Use Polly Libraries for exploring data in depth through code.
Access and analyze data on any computational environment: Stream data to your compute infrastructure from Atlas. Focus on analysis while Atlas takes care of data storage and management.
Manage in-house data at scale with Enterprise Atlas: Harmonize proprietary data using the scalable curation infrastructure of Polly, making it ML-ready for discovering new insights.