A Capability Maturity Model for Research Data Management
CMM for RDM » 5. Repository Services and Preservation » 5.1 Commitment to Perform

5.1 Commitment to Perform

Last modified by Arden Kirkland on 2014/05/18 23:10

5.1 Commitment to Perform

Commitment to Perform describes the actions the organization must take to ensure that the process is established and will endure. Commitment to Perform typically involves establishing organizational policies and senior management sponsorship.

5.1.1 Develop data preservation policies

Projects should develop data preservation policies that specify required level of access to data and needed controls on viewing and changing data. The goal of developing data preservation policies is to guide development of systems that operate as expected by users. 

Development of data preservation policies is necessary to ensure that data are preserved in a cost-effective way consistent with user expectations, while maintaining desired controls on accessing and changing data. 

Data preservation policies should be based on an analysis of the risks to which the data are exposed and the expectations of users. For example, a common risk facing all data systems is a loss of data due to failure of or damage to hardware, so such events should be expected and planned for. On the other hand, while commercial data may have a financial value that makes them attractive to criminals, research data might not pose such risks. Risks can be classified by likelihood of occurrence and expected impact. Likely high impact risks (e.g., a disk drive failing and destroying stored data) should be prevented (e.g., by using redundant storage so a single disk failure has no impact). Unlikely high impact risks (e.g., the building burning down) should be planned for (e.g., by keeping off site backups). Likely low impact risks (e.g., a user error in editing a data item) should be controlled (e.g., by keeping an audit trail). Unlikely low impact risks might just be ignored. Risks should be considered broadly, including technical risks (e.g., hardware or software errors), human risks (e.g., operator errors) and institutional risks (e.g., a data repository ceasing operation). Based on the risk analysis, data preservation policies should state what data are being preserved and against what risks. Identifying the likelihood and impact of risks will help ensure that resources are directed to the most important risks and that risks are not overlooked.

User expectations regarding data should be considered. For example, for a small project, it may be acceptable to lose access to data for a few days while replacing a failed server, while for others such a failure might be unacceptable, justifying the cost to maintain redundant hardware. Again, identifying user needs will help ensure that resources are spent appropriately.

Finally, data preservation policies should state who is responsible for the preservation of the data and identify acceptable and unacceptable behaviors. For example, considering data access, policies should state who can access data; considering data integrity, who can change data and under what circumstances.

5.1.2 Develop data backup policies

To backup data means to make a copy of the data that can be used in case the primary data store is damaged or lost. The goal of developing data backup policies is to provide guidance to data curators about how data should be backed up and to identify roles and responsibilities of personnel for creating, maintaining and using backups (DataONE, 2011a).

It is important to define backup policies to ensure that data are being backed up appropriately, that backups are properly protected and that responsibilities are clearly delineated.

The backup policy should describe what data need to backed up and how frequently, where backups are kept and for how long, and who can access them (DataONE, 2011b). The policy may also dictate the hardware and software to be used. If backups are not automatic, the policy should state who performs the backups. The policy should also state how and how often backups are validated and what metrics are used to evaluate backups.

5.1.3 Develop data curation policies

Projects create a variety of kinds of data, as well as data documentation and analysis scripts or tools. Data curation policies state what data should be preserved long-term and what data can be discarded. The goal of developing data curation policies is to provide guidance for data curators and users on deciding what data should be preserved.

Development of curation policies is necessary because data may have long-term value that should be preserved, but keeping all data is neither practical nor economically feasible (DataONE, 2011c). Only datasets that have significant long-term value and that cannot be recreated or that are costly to reproduce should be preserved. 

In developing curation policies, consider the tradeoff between the cost of preservation due to the dataset size or repository policies against the potential value of the data to the user community (Hook et al., 2010). Funding agencies or institutions may also have requirements and policies governing contribution to repositories (DataONE, 2011c).

DataOne suggests that "raw data are usually worth preserving" (DataONE, 2011d). Data that have undergone a quality control check may be costly to recreate and so should be preserved. On the other hand, intermediate products in an analysis might be voluminous and easy to recreate and so not worth preserving. Source code is generally small and so likely worth preserving.

Rubric

Rubric for 5.1 - Commitment to Perform
Level 0
 This process or practice is not being observed
No steps have been taken to establish organizational policies or senior management sponsorship for data preservation, curation, or backups
Level 1: Initial
 Data are managed intuitively at project level without clear goals and practices
Data preservation, curation, and backups have been considered minimally by individual team members, but nothing has been codified or included in organizational policies or senior management sponsorship 
Level 2: Managed
 DM process is characterized for projects and often reactive
Data preservation, curation, and backups have been addressed for this project, but have not taken wider community needs or standards into account and have not resulted in organizational policies or senior management sponsorship 
Level 3: Defined
 DM is characterized for the organization/community and proactive
The project follows approaches to data preservation, curation, and backups that have been defined for the entire community or institution, as codified in organizational policies with senior management sponsorship 
Level 4: Quantitatively Managed
 DM is measured and controlled
Quantitative quality goals have been established regarding data preservation, curation, and backups, and are codified in organizational policies with senior management sponsorship;  both data and practices are systematically measured for quality
Level 5: Optimizing
 Focus on process improvement
Processes regarding data preservation, curation, and backups are evaluated on a regular basis, as codified in organizational policies with senior management sponsorship, and necessary improvements are implemented

References


DataONE. (2011a). Create and document a data backup policy. Retrieved from https://www.dataone.org/best-practices/create-and-document-data-backup-policy


DataONE. (2011b). Ensure integrity and accessibility when making backups of data. Retrieved from https://www.dataone.org/best-practices/ensure-integrity-and-accessibility-when-making-backups-data


DataONE. (2011c). Identify data with long-term value. Retrieved from https://www.dataone.org/best-practices/identify-data-long-term-value


DataONE. (2011d). Decide what data to preserve. Retrieved from https://www.dataone.org/best-practices/decide-what-data-preserve


Hook, L. A., Vannan, S. K. S., Beaty, T. W., Cook, R. B., & Wilson, B. E. (2010). Best Practices for Preparing Environmental Data Sets to Share and Archive. Oak Ridge National Laboratory Distributed Active  Archive Center. Retrieved from http://daac.ornl.gov/PI/BestPractices-2010.pdf

<--Previous Page / Next Page -->

Created by Jian Qin on 2013/06/15 01:33

XWiki Enterprise 5.1-milestone-1 - Documentation