A Capability Maturity Model for Research Data Management
CMM for RDM » 3. Data description and representation » 3.3 Activities Performed

3.3 Activities Performed

Last modified by Arden Kirkland on 2014/06/06 12:59

3.3 Activities Performed

Activities Performed describes the roles and procedures necessary to implement a key process area. Activities Performed typically involve establishing plans and procedures (i.e., the specific actions that need to be performed), performing the work, tracking it, and taking corrective actions as necessary.

3.3.1 Generate metadata according to agreed upon procedures

Follow agreed upon procedures for generating metadata for variables, files, and studies to ensure the ability of future users to find, identify, select, and obtain data.  There is not a single set of metadata that applies in all situations, but consider which elements are important for lower levels of granularity and higher-level description of the dataset as a whole. Document variables

Document individual data items such as variables (columns in structured tabular data), with names, labels and descriptions. Examples of elements of variable documentation are data type; units of measurement; formats for date, time, and geography; method of measurement,  coverage (e.g. geographic, temporal), and codes and classification schemes (e.g. codes for missing data, or flags for quality issues or qualifying values). ICPSR offers extensive guidelines for variable documentation based on the DDI standard for quantitative social science data. DataOne (2011) offers guidelines based on best practices in the natural and physical sciences.

Document variables in the data file, and in a separate file. Long (2009) offers guidelines for naming and describing variables and values (p. 143-194). For structured, tabular data, a well-documented data dictionary provides a concise guide to understanding and using the data. An example of a data dictionary is available from the Colorado Clinical and Translational Sciences Institute: http://cctsi.ucdenver.edu/RIIC/Documents/Data-Management-Figure-3.pdf.

For qualitative data,  offering structured contextual information in a separate data list provides users with a guide to the data. The UK Data Archive has examples and templates for data lists: http://www.data-archive.ac.uk/create-manage/document/data-level?index=2 

Use a controlled (standardized) vocabulary. Sometimes there is a sufficiently high degree of standardization in a research community to make it possible to report data in standardized ways (time, taxonomy, for example). This promotes interoperability of metadata, which is desirable when possible. When this degree of standardization does not exist, documentation of the language used on a study is next best. Document files

Describe the contents of data files. It may be helpful to create a separate document describing how files are structured and technical information on the files (e.g. the version of the software).

File formats that are stable, and interoperable with other systems, are desirable.

Long (2009) offers extensive recommendations on file management best practices (p. 18-30, 125-141). Long also offers templates for planning a directory structure and for creating a data registry here: http://www.indiana.edu/~jslsoc/web_workflow/wf_chapters.htm. Document the study

Describe the research project. Common elements in study level documentation are author (principal investigator, researchers); funding; rationale for the project; data sources used; context of data collection; data collection methods; information on confidentiality; access and use conditions, transformation of data, and its structure and format. Examples of guidelines for study level documentation are available at the UK Data Archive at http://www.data-archive.ac.uk/create-manage/document/study-level  and ICPSR (based on the Data Documentation Initiative (DDI) metadata schema) at http://www.icpsr.umich.edu/icpsrweb/content/deposit/guide/chapter3docs.html

When the dataset or collection is a complex object that consists of multiple files, describe their organization in an index, table of contents, or a readme file.

Provide a mechanism for identity control that uniquely identifies the data in a machine readable way. One system for providing identity control is via the International DOI Foundation (IDF)’s  Digital Object Identifier system, (DOI). 

Provide a citation. There is not complete consensus on the elements that make up a complete data citation. However, Brase et al. (2014) say the Digital Curation Centre's 11 elements of a data citation are well-supported by literature: http://www.dcc.ac.uk/resources/how-guides/cite-datasets#x1-5000. DataOne offers citation guidelines here: https://www.dataone.org/best-practices/provide-citation-and-document-provenance-your-dataset.

Provide documentation of analysis when information for replication is desired (Long, 2009). Documentation of analysis is not necessarily required to support discovery and secondary use of a dataset, as secondary use may explore a completely different research question than the original analysis. Replication repositories or journal data sharing policies may require documentation of analysis. For example, Nature Publishing Group's data policy is here: http://www.nature.com/authors/policies/availability.html.


Rubric for 3.3 - Activities Performed
Level 0
 This process or practice is not being observed
No steps have been taken for managing the workflow of metadata creation during the research process
Level 1: Initial
 Data are managed intuitively at project level without clear goals and practices
Workflow management for metadata creation during the research process has been considered minimally by individual team members, but not codified
Level 2: Managed
 DM process is characterized for projects and often reactive
Workflow management for metadata creation during the research process has been recorded for this project, but has not taken wider community needs or standards into account 
Level 3: Defined
 DM is characterized for the organization/community and proactive
The project follows approaches to workflow for metadata creation during the research process as defined for the entire community or institution
Level 4: Quantitatively Managed
 DM is measured and controlled
Quantitative quality goals have been established regarding workflow for metadata creation during the research process, and both metadata and practices are systematically measured for quality
Level 5: Optimizing
 Focus on process improvement
Processes regarding workflow for metadata creation during the research process are evaluated on a regular basis, and necessary improvements are implemented


Brase, J., Socha, Y., Callaghan, S., Borgman, C.L., Uhlir, P.F., Carroll, B. (2014). Data citation: Principles and practice. In J. Ray (Ed.), Research Data Management: Practical Strategies for Information Professionals (Charleston Insights in Library, Information, and Archival Sciences). West Lafayette, Indiana: Purdue University Press.

DataONE. (2011). Best Practices. Retrieved from https://www.dataone.org/best-practices

Long, J. S. (2009). The workflow of data analysis using Stata. College Station, Tex.: Stata Press.

<--Previous Page / Next Page -->

Created by Jian Qin on 2013/06/14 05:44

XWiki Enterprise 5.1-milestone-1 - Documentation