Curating OmixAtlas
1. What is Curation?
Curation is a process of making biological data organized, annotated, and standardized to make it a FAIR (Findable, Accessible, Interoperable, and Reusable) resource.
- Findable - to assign an persistent identifier that would make data easy to find by both humans and machines
- Accessible - to be able to retrieve the data by their identifier using a standard protocol
- Interoperable - to use standardized terms and have references to other data and be machine actionable
- Reusable - to sufficiently describe data for both computers and humans to be able to understand the data
Using a structured FAIR data/database can fast-track the drug development process as it make it easier for researchers to test their first-hand hypotheses.
2. How is data Curated on OA?
Curated datasets stored on Polly are a result of a multi-step process that ensures the overall quality and reusability of the data. In summary, all data sets on Polly go through the following two-step process:-
- Data Engineering - This includes transforming data to fit a proprietary data schema that is uniform across several datatypes. The transformation streamlines data in one consistent schema and allows users to query multiple data types on a single data infrastructure.
- Harmonizing Metadata - This involves tagging each sample and dataset with a metadata harmonized with uniform ontology.
3. Why is data curation important?
Manually curating vast volumes of unstructured or semi-structured biomedical data for drug development can be expensive, cumbersome, time-consuming, and resource-intensive. Automating the curation process would ensure the following -
- High data quality
- Data will be reusable for any kind of analysis
- Recently added datasets could also be quickly curated