Curating OmixAtlas

1. What is Curation?

Curation is a process of making biological data organized, annotated, and standardized to make it a FAIR (Findable, Accessible, Interoperable, and Reusable) resource.

Findable - to assign an persistent identifier that would make data easy to find by both humans and machines
Accessible - to be able to retrieve the data by their identifier using a standard protocol
Interoperable - to use standardized terms and have references to other data and be machine actionable
Reusable - to sufficiently describe data for both computers and humans to be able to understand the data

Using a structured FAIR data/database can fast-track the drug development process as it make it easier for researchers to test their first-hand hypotheses.

2. How is data Curated on OA?

Curated datasets stored on Polly are a result of a multi-step process that ensures the overall quality and reusability of the data. In summary, all data sets on Polly go through the following two-step process:-

Data Engineering - This includes transforming data to fit a proprietary data schema that is uniform across several datatypes. The transformation streamlines data in one consistent schema and allows users to query multiple data types on a single data infrastructure.
Harmonizing Metadata - This involves tagging each sample and dataset with a metadata harmonized with uniform ontology.

3. Why is data curation important?

Manually curating vast volumes of unstructured or semi-structured biomedical data for drug development can be expensive, cumbersome, time-consuming, and resource-intensive. Automating the curation process would ensure the following -

High data quality
Data will be reusable for any kind of analysis
Recently added datasets could also be quickly curated