A Capability Maturity Model for Research Data Management
CMM for RDM » 5. Repository Services and Preservation » 5.3 Activities Performed

5.3 Activities Performed

Last modified by crowston on 2014/06/01 12:07

5.3 Activities Performed

Activities Performed describes the roles and procedures necessary to implement a key process area. Activities Performed typically involve establishing plans and procedures (i.e., the specific actions that need to be performed), performing the work, tracking it, and taking corrective actions as necessary.

5.3.1 Store data

A key function in data management is storing the data both for current use and for long-term archiving. Earlier sections discussed logical formats for data storage; in this section, we focus on physical storage. All storage devices, locations and access accounts should be documented and accessible to team members (DataONE, 2011a). Data should be stored in non-proprietary hardware formats (Borer et al., 2009) so that they can be read even if the original hardware is not available (e.g., many hardware RAID devices use proprietary disk formats, so a failed RAID controller must be replaced with the same model). Media should be handled and stored carefully (DataONE, 2011d). Data discs should be routinely inspected and replaced as needed (DataONE, 2011d). Storing data solely on local hard drives or servers is not recommended: keeping multiple copies of the data files in separate locations is safer (DataONE, 2011e).

5.3.2 Provide data security

Confidential data has to be stored in such a way as to restrict access to authorized personnel (Columbia Center for New Media Teaching and Learning, n.d.). Data should be secured in accordance with developed data access polices. Possible access controls include physical security on the hardware and allowing only properly authenticated users access to the data. User might have to sign license agreements governing how data are used and protected. Highly confidential data might be accessed only from particular locations, rather than being distributed to users.

5.3.3 Control changes to data files

The original data set should be preserved in its original state (Borer et al., 2009; DataONE, 2011f; Hook et al., 2010). Unaltered images should be preserved at the highest resolution possible. (DataONE, 2011e).

Changes to data files should be controlled, that is, appropriate tools, such as version control tools, should be used to keep track of the history of changes to the data files (Hook et al., 2010). Changes should be made only by users authorized by the developed data access policies. The nature of and reasons for the changes should be recorded. In particular, users should be aware of, and document, any changes in the coding scheme (Hook et al., 2010). A further danger of using applications such as spreadsheets to store data is that these programs are designed to facilitate making changes to the data, while for scientific data, changes should be controlled.

It may be appropriate to provide multiple versions of data products with defined identifiers for unambiguous reference, reflecting the state of the data at different points in time (DataONE, 2011g).

5.3.4 Backup data

Data, processing codes, and documentation should be regularly backed up (Hook et al., 2010) according to the defined procedures to ensure that there are at least two (and preferably more) copies of all important data. Backup devices should be selected for and regularly checked for reliability. Backups should be regularly tested for completeness and correctness to ensure that backup copies have the same content as the original data file (DataONE, 2011c). Backups might include periodic full backups (i.e., all files) as well as more frequent incremental backups (i.e., backing up only data that have changed since the last backup). The backups should also be checked to ensure that they are secure and and that only those who need access to backups have proper access (DataONE, 2011c). Contact information should be available for the persons responsible for the backed up data (DataONE, 2011c).

A copy of the backup should be kept at a trusted off-site location (DataONE, 2011b). As well, keeping backup copies of data off-line will help ensure that they will are not affected by any system problems or software errors that damage the primary copy (Borer et al., 2009). Copies of physical data stores such as lab notebooks and samples should also be regularly stored off-site for safe keeping (Columbia Center for New Media Teaching and Learning, n.d.).

5.3.5 Curate data

Data should be selected for long-term storage according to the developed curation policies and copied to the appropriate repositories. Data that are not selected for long-term storage should be disposed of on a determined schedule. The disposition of datasets should be recorded.

5.3.6 Perform data migrations

In a long-running project, it may be necessary to migrate data to newer hardware or software formats. Such migrations should be carefully planned so they are not disruptive to the research process. When new hardware is installed, it is prudent to keep the old hardware with its copy of the data until the new device “settles in” and is deemed reliable (DataONE, 2011d).

When new versions of software are released, it is prudent to continue using the version of the software that was originally used to create a data file to view and manipulate the file contents (DataONE, 2011f). If it is necessary to use a newer version of a software package to open files created with an older version of the application, first save a copy of the original file in case there are problems with the migration. Implementation of new versions of software should be coordinated across a research group to avoid compatibility problems.


Rubric for 5.3 - Activities Performed
Level 0
 This process or practice is not being observed
No steps have been taken to establish procedures  for the workflow of data preservation, including storage, security, version control, and migration
Level 1: Initial
 Data are managed intuitively at project level without clear goals and practices
The workflow of data preservation, including storage, security, version control, and migration, has been considered minimally by individual team members, but not codified
Level 2: Managed
 DM process is characterized for projects and often reactive
The workflow of data preservation, including storage, security, version control, and migration, has been addressed for this project, but has not taken wider community needs or standards into account and has not been codified 
Level 3: Defined
 DM is characterized for the organization/community and proactive
The project follows approaches to the workflow of data preservation, including storage, security, version control, and migration, that have been defined for the entire community or institution
Level 4: Quantitatively Managed
 DM is measured and controlled
Quantitative quality goals have been established regarding the workflow of data preservation, including storage, security, version control, and migration, and both data and practices are systematically measured for quality 
Level 5: Optimizing
 Focus on process improvement
Processes regarding the workflow of data preservation, including storage, security, version control, and migration, are evaluated on a regular basis, and necessary improvements are implemented 


Borer, E. T., Seabloom, E. W., Jones, M. B., & Schildhauer, M. (2009). Some Simple Guidelines for Effective Data Management. Bulletin of the Ecological Society of America, 90(2), 205–214. http://dx.doi.org/10.1890/0012-9623-90.2.205

Columbia Center for New Media Teaching and Learning. (n.d.). Responsible conduct of research: Data acquisition and management: Foundation text. Retrieved from http://ccnmtl.columbia.edu/projects/rcr/rcr_data/foundation/index.html#3_B

DataONE. (2011a). Advertise your data using datacasting tools. https://www.dataone.org/best-practices/advertise-your-data-using-datacasting-tools

DataOne. (2011b). Backup your data. https://www.dataone.org/best-practices/backup-your-data

DataONE. (2011c). Ensure integrity and accessibility when making backups of data. https://www.dataone.org/best-practices/ensure-integrity-and-accessibility-when-making-backups-data

DataONE. (2011d). Ensure the reliability of your storage media. https://www.dataone.org/best-practices/ensure-reliability-your-storage-media

DataONE. (2011e). Plan for effective multimedia management. https://www.dataone.org/best-practices/plan-effective-multimedia-management

DataONE. (2011f). Preserve information: keep your raw data raw. https://www.dataone.org/best-practices/preserve-information-keep-your-raw-data-raw

DataONE. (2011g). Provide version information for use and discovery. https://www.dataone.org/best-practices/provide-version-information-use-and-discovery-0

Hook, L. A., Vannan, S. K. S., Beaty, T. W., Cook, R. B., & Wilson, B. E. (2010). Best Practices for Preparing Environmental Data Sets to Share and Archive. Oak Ridge National Laboratory Distributed Active  Archive Center. Retrieved from  http://daac.ornl.gov/PI/BestPractices-2010.pdf

<--Previous Page / Next Page -->

XWiki Enterprise 5.1-milestone-1 - Documentation