A Capability Maturity Model for Research Data Management
CMM for RDM » 0. Introduction » 0.3 Research Data Management Maturity Levels
Last modified by Arden Kirkland on 2014/06/30 09:36
From version 82.5
edited by Arden Kirkland
on 2014/03/19 19:43
To version 83.1
edited by Arden Kirkland
on 2014/05/09 17:44
Change comment: minor changes to generalize "science" to "research"

Content changes

... ... @@ -6,7 +6,7 @@
6 6
7 7 Perhaps the most well-known aspect of the CMM is five levels of //process or capability maturity//, which describe the level of development of the practices in a particular organization, representing the “degree of process improvement across a predefined set of process areas” and corresponding to the generic goals listed in the previous section. The initial level describes an organization with no defined processes: software is developed (i.e., the specific software related goals are achieved), but in an ad hoc and unrepeatable way, making it impossible to plan or predict the results of the next development project. As the organization increases in maturity, processes become more refined, institutionalized and standardized, achieving the higher numbered generic processes. The CMM thus described an evolutionary improvement path from ad hoc, immature processes to disciplined, mature processes with improved software quality and organizational effectiveness ([[CMMI Product Team, 2006, p. 535>>||anchor="CMMI"]]).
8 8
9 -Our goal in this document is to lay out a similar path for the improvement of research data management. RDM practices as carried out in scientific projects similarly range from ad hoc to well-planned and well-managed processes ([[D’Ignazio & Qin, 2008>>||anchor="Dignazio"]]; [[Steinhart et al., 2008>>||anchor="Steinhart"]]). The generic practices described above provide a basis for mapping these maturity levels into the context of RDM, as illustrated in Figure 1 and described below.
9 +Our goal in this document is to lay out a similar path for the improvement of research data management. RDM practices as carried out in research projects similarly range from ad hoc to well-planned and well-managed processes ([[D’Ignazio & Qin, 2008>>||anchor="Dignazio"]]; [[Steinhart et al., 2008>>||anchor="Steinhart"]]). The generic practices described above provide a basis for mapping these maturity levels into the context of RDM, as illustrated in Figure 1 and described below.
10 10
11 11 [[image:maturity-level.jpg||style="display: block; margin-left: auto; margin-right: auto;"]]
12 12
... ... @@ -20,13 +20,13 @@
20 20
21 21 == 0.3.2 Level 2: Managed ==
22 22
23 -Maturity level 2 characterizes projects with processes that are managed through policies and procedures established within the project. At this level of maturity, the research group has discussed and developed a plan for RDM. For example, local data file naming conventions and directory organization structures may be documented. However, these policies and procedures are idiosyncratic to the project meaning that the SDM capability resides at the project level rather than drawing from organizational or community processes definitions. For example, in a survey of science, technology, engineering and mathematics (STEM) faculty, Qin and D’Ignazio ([[2010>>||anchor="Qin"]]) found that respondents predominately used local sources to decide what metadata to create when representing their datasets, either through their own planning, in discussion with their lab groups or somewhat less so through the examples provided by peer researchers. Of far less impact were guidelines from research centers or discipline-based sources. Government requirements or standards also seemed to provide comparatively little help ([[Qin and D’Ignazio, 2010>>||anchor="Qin"]]). As a result, at this level, developing a new project requires redeveloping processes, with possible risks to the effectiveness of RDM. Individual researchers will likely have to learn new processes as they move from project to project. Furthermore, aggregating or sharing data across multiple projects will be hindered by the differences in practices across projects.
23 +Maturity level 2 characterizes projects with processes that are managed through policies and procedures established within the project. At this level of maturity, the research group has discussed and developed a plan for RDM. For example, local data file naming conventions and directory organization structures may be documented. However, these policies and procedures are idiosyncratic to the project meaning that the RDM capability resides at the project level rather than drawing from organizational or community processes definitions. For example, in a survey of science, technology, engineering and mathematics (STEM) faculty, Qin and D’Ignazio ([[2010>>||anchor="Qin"]]) found that respondents predominately used local sources to decide what metadata to create when representing their datasets, either through their own planning, in discussion with their lab groups or somewhat less so through the examples provided by peer researchers. Of far less impact were guidelines from research centers or discipline-based sources. Government requirements or standards also seemed to provide comparatively little help ([[Qin and D’Ignazio, 2010>>||anchor="Qin"]]). As a result, at this level, developing a new project requires redeveloping processes, with possible risks to the effectiveness of RDM. Individual researchers will likely have to learn new processes as they move from project to project. Furthermore, aggregating or sharing data across multiple projects will be hindered by the differences in practices across projects.
24 24
25 25 == 0.3.3 Level 3: Defined ==
26 26
27 27 In the original CMM, “Defined” means that the processes are documented across the organization and then tailored and applied for particular projects. Defined processes are those with inputs, standards, work procedures, validation procedures and compliance criteria. At this level, an organization can establish new projects with confidence in stable and repeatable execution of processes. For example, projects at this level likely employ a metadata standard with best practice guidelines. Data sets/products are represented by some formal semantic structures (controlled vocabulary, ontology, or taxonomies), though these standards may be adapted to fit to the project. For example, the adoption of a metadata standard for describing datasets often involves modification and customization of standards in order to meet project needs.
28 28
29 -In parallel to the SEI CMM, the RDM process adopted might reflect institutional initiatives in which organizational members or task forces within the institution discuss policies and plans for data management, set best practices for technology and adopt and implement data standards. For example, the [[Purdue Distributed Data Curation Center>>url:http://d2c2.lib.purdue.edu||rel="__blank"]] (D2C2, [[http:~~/~~/d2c2.lib.purdue.edu/>>url:http://d2c2.lib.purdue.edu/||rel="__blank"]]) brings researchers together to develop optimal ways to manage data, which could lead to formally maintained descriptions of RDM practices. Level 3 organizations can also draw on research-community-based efforts to define processes. Examples include the [[Hubbard Brook Ecosystem Studies>>url:http://www.hubbardbrook.org||rel="__blank"]] ([[http:~~/~~/www.hubbardbrook.org/>>url:http://www.hubbardbrook.org/||rel="__blank"]]), the [[Long Term Ecological Research Network>>url:http://www.lternet.edu/||rel="__blank"]] (LTER, [[http:~~/~~/www.lternet.edu/>>url:http://www.lternet.edu/||rel="__blank"]]) and [[Global Biodiversity Information Facility>>url:http://www.gbif.org/||rel="__blank"]] (GBIF, [[http:~~/~~/www.gbif.org/>>url:http://www.gbif.org/||rel="__blank"]]). Government requirements and standards in regard to scientific data are often targeted to higher level of data management, e.g., community level or discipline level.
29 +In parallel to the SEI CMM, the RDM process adopted might reflect institutional initiatives in which organizational members or task forces within the institution discuss policies and plans for data management, set best practices for technology and adopt and implement data standards. For example, the [[Purdue Distributed Data Curation Center>>url:http://d2c2.lib.purdue.edu||rel="__blank"]] (D2C2, [[http:~~/~~/d2c2.lib.purdue.edu/>>url:http://d2c2.lib.purdue.edu/||rel="__blank"]]) brings researchers together to develop optimal ways to manage data, which could lead to formally maintained descriptions of RDM practices. Level 3 organizations can also draw on research-community-based efforts to define processes. Examples include the [[Hubbard Brook Ecosystem Studies>>url:http://www.hubbardbrook.org||rel="__blank"]] ([[http:~~/~~/www.hubbardbrook.org/>>url:http://www.hubbardbrook.org/||rel="__blank"]]), the [[Long Term Ecological Research Network>>url:http://www.lternet.edu/||rel="__blank"]] (LTER, [[http:~~/~~/www.lternet.edu/>>url:http://www.lternet.edu/||rel="__blank"]]) and [[Global Biodiversity Information Facility>>url:http://www.gbif.org/||rel="__blank"]] (GBIF, [[http:~~/~~/www.gbif.org/>>url:http://www.gbif.org/||rel="__blank"]]). Government requirements and standards in regard to research data are often targeted to higher level of data management, e.g., community level or discipline level.
30 30
31 31 == 0.3.4 Level 4: Quantitatively Managed ==
32 32

XWiki Enterprise 5.1-milestone-1 - Documentation