A Capability Maturity Model for Research Data Management

5.2 Ability to Perform

Last modified by Arden Kirkland on 2014/06/06 13:02

5.2 Ability to Perform

Ability to Perform describes the preconditions that must exist in the project or organization to implement the process competently. Ability to Perform typically involves resources, organizational structures, and training.

For data repository and presentation services, Ability to Perform includes enabling technologies, procedures, and business models that will sustain the services. 

5.2.1 Appraise and select enabling technologies

Projects need to select the hardware and software technology platforms on which they will store their data. The selection process should be started early in the project to allow time to collect and evaluate information on available options, such as system documentation or experiences from other users. Larger projects may want to pilot several alternatives before making a choice. Relevant system features include functionality, in particular, support for multimedia data (DataONE, 2011f), fit to project needs (e.g., capabilities compared to the expected volume of data and number of users), ease of use, and support. Relevant hardware features include capacity, reliability and expected lifetime (e.g., for hard drives) (DataONE, 2011d).

Projects may develop their own data archives in addition to working stores for data being actively used. Rather than archiving data themselves, projects may decide to deposit data in an existing repository. Again, the process of selecting a repository should start early to provide enough time to identify and evaluate alternatives. As well, repositories may have particular requirements that will shape the project's data management plan (DataONE, 2011e). A further possibility for data preservation is joining a digital preservation network, that is, collaborating with other institutions or projects to cooperatively archive data (e.g., the Digital Preservation Network, http://dpn.org/, or Chronopolis, http://chronopolis.sdsc.edu).

5.2.2 Develop business models for preservation

Preserving data has costs that will extend long past the end of the projects that generate the data. It is therefore critical to develop business models for funding the ongoing preservation of data to ensure the long-term preservation of archived data.

Current data repositories are either funded by grants or self-supported. Funding agencies such as NSF and NIH have awarded a good number of grants to support the initiation of major data repositories (DataOne, Dataverse, GenBank, to name a few) and the long-term preservation for some of these data repositories. Business models used in the self-supported category include a wide variety of options: individual and institutional memberships,  subscriptions, pay-per-submission, and voucher plans (Dryad, 2014). Generally, large reference collections of data (note 1), e.g., Genbank (http://www.ncbi.nlm.nih.gov/genbank/), the Knowledge Network for Biocomplexity (KNB) (https://knb.ecoinformatics.org/), and BioProject (http://www.ncbi.nlm.nih.gov/bioproject), are mostly supported by continued funding from the government, while resource collections of data (note 2), that are usually created by a disciplinary community for a refined scope, tend to have initial funding from the government but are increasingly required to become self-supported. The Dryad data repository so far has had a successful record in the self-supporting category.

It is the self-supported model that makes it ever more important to plan early and know what options there are to choose from. In the case of using self-supported data repositories, institutions or projects that decided to use the services can compare the cost between building an in-house repository and subscribing to data repository services. Costs to be covered include maintenance and operation of the hardware and institution infrastructure and necessary migration to new data formats and platforms.

5.2.3 Develop backup procedures and training

Projects should develop clear backup procedures. Documented procedures are necessary to ensure that data are backed up according to policy and that procedures to recover from problems are established and widely known (DataONE, 2011c). Procedures should identify all data that are to be backed up. They should set a clear schedule for making backups that is tailored to the data collection process (DataONE, 2011a). Streaming data should be backed up at regularly scheduled points in the collection process (DataONE, 2011a).

Procedures should identify who is responsible for creating the backups, including alternatives in case one person is unavailable (DataONE, 2011b). Backups may be automated, in which case someone should be responsible for regularly checking that they are being made. There may be different backup procedures for different data sets (DataONE, 2011c). Multiple versions of backups should be kept, e.g., to be able to recover from file damage that is not detected immediately.

The procedures should ensure that data backups are subject to the same protections as the original data (e.g., that confidential data are protected).

Finally, the procedures to recover from a backup copy should be described (DataONE, 2011a), both for individual files as well as for recovery from catastrophic failures. Responsibility for recovery should be assigned. Further, in the event of a failure, the recovery procedure must ensure that the backups will not be damaged by the same problem.

Personnel involved with backups should be trained in the relevant policies and procedures, including policies and procedures for data security. 


Rubric for 5.2 - Ability to Perform
Level 0
 This process or practice is not being observed
No steps have been taken to provide for resources, structure, or training with regards to enabling technlogies or business models for data preservation
Level 1: Initial
 Data are managed intuitively at project level without clear goals and practices
Resources, structure, and training with regards to enabling technlogies or business models for data preservation have been considered minimally by individual team members, but not codified
Level 2: Managed
 DM process is characterized for projects and often reactive
Resources, structure, and training with regards to enabling technlogies or business models for data preservation have been recorded for this project, but have not taken wider community needs or standards into account
Level 3: Defined
 DM is characterized for the organization/community and proactive
The project provides resources, structure, and training with regards to enabling technlogies or business models for data preservation, as defined for the entire community or institution
Level 4: Quantitatively Managed
 DM is measured and controlled
Quantitative quality goals have been established for resources, structure, and training with regards to enabling technlogies or business models for data preservation, and both data and practices are systematically measured for quality
Level 5: Optimizing
 Focus on process improvement
Processes regarding resources, structure, and training, with regards to enabling technlogies or business models for data preservation are evaluated on a regular basis, and necessary improvements are implemented


1. Reference collections are authored by (and serve) large segments of the science and engineering community and conform to robust, well-established and comprehensive standards, which often lead to a universal standard. Budgets are large and are
often derived from diverse sources with a view to indefinite support. Retrieved from http://www.nsf.gov/pubs/2007/nsf0728/nsf0728_4.pdf, p.23.

2. Resource collections are authored by a community of investigators, often within a domain of science or engineering,
and are often developed with community level standards. Budgets are often intermediate in size. Lifetime is between the mid- and long-term. http://www.nsf.gov/pubs/2007/nsf0728/nsf0728_4.pdf, p.22. 


DataONE. (2011a). Backup your data. Retrieved from https://www.dataone.org/best-practices/backup-your-data

DataONE. (2011b). Create and document a data backup policy. Retrieved from https://www.dataone.org/best-practices/create-and-document-data-backup-policy

DataONE. (2011c). Ensure integrity and accessibility when making backups of data. Retrieved from https://www.dataone.org/best-practices/ensure-integrity-and-accessibility-when-making-backups-data

DataONE. (2011d). Ensure the reliability of your storage media. Retrieved from https://www.dataone.org/best-practices/ensure-reliability-your-storage-media

DataONE. (2011e). Identify suitable repositories for the data. Retrieved from https://www.dataone.org/best-practices/identify-suitable-repositories-data

DataONE. (2011f). Plan for effective multimedia management. Retrieved from https://www.dataone.org/best-practices/plan-effective-multimedia-management

Dryad. (2014). Pricing plans and data publishing charges. Retrieved from http://datadryad.org/pages/pricing

<--Previous Page / Next Page -->

XWiki Enterprise 5.1-milestone-1 - Documentation