Template:ClusterDataBackup

From Cheaha
Revision as of 20:20, 3 October 2019 by Jpr@uab.edu (talk | contribs) (Create template for data backup policy and guidance)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

There is no automatic back up of any user data on the cluster in home, data, or scratch. At this time, all user data back up processes are defined and managed by each user and/or lab. Given that data backup demands vary widely between different users, groups, and research domains, this approach enables those who are most familiar with the data to make appropriate decisions based on their specific needs.

For example, if a group is working with a large shared data set that is a local copy of a data set maintained authoritatively at a national data bank, maintaining a local backup is unlikely to be a productive use of limited storage resources, since this data could potentially be restored from the authoritative source. If, however, you are maintaining a unique source of data of which yours is the only copy, then maintaining a backup is critical if you value that data set. It's worth noting that while this "uniqueness" criteria may not apply to the data you analyze, it may readily apply to the codes that define your analysis pipelines.

An often recommended backup policy is the 3-2-1 rule: maintain three copies of data, on two different media, with one copy off-site. You can read more about the 3-2-1 rule here. In the case of your application codes, using revision control tools during development provides an easy way to maintain a second copy, makes for a good software development process, and can help achieve reproducible research goals.

Please review the data storage options provided by UAB IT for maintaining copies of your data. In choosing among these options, you should also be aware of UAB's data classification rules and requirements for security requirements for sensitive and restricted data storage. Given the importance of backup, Research Computing continues to explore options to facilitate data backup workflows from the cluster. Please contact us if you have questions or would like to discuss specific data backup scenarios.

A good guide for thinking about your backup strategy might be: "If you aren't managing a data back up process, then you have no backup data."