A Capability Maturity Model for Research Data Management
CMM for RDM » 0. Introduction

0. Introduction

Last modified by Jian Qin on 2017/03/11 08:14

0. Introduction

Research in science, social science, and the humanities is increasingly data-intensive, highly collaborative, and highly computational at a large scale. The tools, content and social attitudes for supporting multidisciplinary collaborative research require “new methods for gathering and representing data, for improved computational support and for growth of the online community” (Murray-Rust, 2008). As a result, improved research data management (RDM) is now a critical need, with action needed across the data lifecycle: from data capture, analysis and visualization (Gray, 2007), through curation, sharing and preservation, to support for further discovery and reuse. To enable assessment and improvement of RDM practices that increase the reliability of RDM, this document presents a capability maturity model (CMM) for RDM.

Currently, RDM practices vary greatly depending on the scale, discipline, funding and type of projects. “Big science” research fields—such as astrophysics, geosciences, climate science and system biology—generally have established well-defined RDM policies and practices, with supporting data repositories for data curation, discovery and reuse. RDM in these disciplines often has significant funding support for the necessary personnel and technology infrastructure. By contrast, in most “small science” or humanities research (i.e., projects typically involving a single PI and a few students), RDM is less well developed. However, even in these fields, RDM practices are still critical: the data generated by these projects may be small on an individual level, but they can nevertheless add up to a large volume collectively (Carlson, 2006) and in aggregation can have more complexity and heterogeneity than those generated from big research projects.

The importance of RDM has been raised to a new level, as demonstrated by US National Science Foundation’s renewed mandate that proposals include a data management plan. However, low awareness of—or indeed lack of—data management is still common among research projects, especially small science projects. This lack of awareness is affected by factors such as the type and quantity of data produced, the heritage and practices of research communities and size of research teams (Key Perspectives, 2010). Further complicating the discussion of practices, RDM is an interdisciplinary field: communities of practice involve researchers, information technology professionals, librarians and graduate students, each bringing their domain-specific culture and practices to bear on RDM. But as yet, the field lacks a conceptual model upon which practices, policies and performance and impact assessment can be based. Research projects need more concrete guidance to analyze and assess the processes of RDM. The goal of this document is to present the first steps towards development of such a model, in the form of a Capability Maturity Model (CMM) for RDM. 


Carlson, S. (2006). Lost in a sea of science data. The Chronicle of Higher Education, 52: A35. Retrieved from http://chronicle.com/weekly/v52/i42/42a03501.htm

Gray, J. (2007). Jim Gray on eScience: A transformed scientific method. In: T. Hey, S. Tansley, & K. Tolle (Eds.), The Fourth Paradigm: Data Intensive Scientific Discovery, pp. 5-12. Redmond, WA: Microsoft Research. Retrieved from http://languagelog.ldc.upenn.edu/myl/JimGrayOnE-Science.pdf

Key Perspectives. (2010). Data dimensions: disciplinary differences in research data sharing, reuse and long term viability. SCARP Synthesis Study, Digital Curation Centre. Retrieved from http://www.dcc.ac.uk/scarp

Murray-Rust, P. (2008). Chemistry for everyone. Nature, 451, 648-651. Retrieved from http://www.nature.com/nature/journal/v451/n7179/full/451648a.html

                                                                    Next Page -->

XWiki Enterprise 5.1-milestone-1 - Documentation