All the data being ingested and/or generated inside Polly is being stored inside AWS. Types of data include input/output files, logs, DB transactional data & static data. No data as of now is being stored outside of the AWS account. Although AWS provides very high durability and reliability for all the data stores, there is still a risk associated with some circumstances like accidental deletion, natural hazard & unwanted access to the AWS account.

Datastores

There are majorly 5 data stores in which the data is being stored currently inside AWS:


Datastore Purpose Backup Frequency
RDS (Postgres) For Tabular data
To store relational transactional data, including but not limited to users & organizations' information, workspaces data/metadata etc.
Uses AWS backup service (managed service for automated backups) Daily
DynamoDB For JSON based data
It works as extension to Postgres for the core working of the platform.
Uses AWS backup service (managed service for automated backups) Daily
Elasticsearch For Analysis Ready data Before ingesting data into Elasticsearch, it is being loaded to an Amazon S3 bucket(s). In case Elasticsearch data is lost, we can re-index the data from S3 buckets. Daily
EFS For application state storing
Applications based on shiny architecture uses EFS for storing application state at a given point in time.
Uses AWS backup service (managed service for automated backups) Daily
S3 buckets For File storage

S3 contains:
Static content: Media, Configurations etc
Input/Output files: Files ingested into the platform and/or generated output files
Logs Application logs
Custom cron job exectution

The job copies the files every day into a separate account.
Daily



Key take-aways

  • Backups are done at least once a day

  • Following datastores are backed up - RDS, DynamoDB, Elasticsearch, EFS, S3 buckets.

  • In case of a disaster, a maximum of one day of data will be lost.