A Capability Maturity Model for Research Data Management
CMM for RDM » 1. Data Management in General » 1.1 Commitment to Perform

Changes for document 1.1 Commitment to Perform

Last modified by Arden Kirkland on 2014/05/18 11:53
From version 22.3
edited by Arden Kirkland
on 2014/03/17 00:30
To version 23.1
edited by Arden Kirkland
on 2014/05/10 10:55
Change comment: generalizing science to research, minor proofreading

Content changes

... ... @@ -8,25 +8,25 @@
8 8
9 9 == 1.1.1 Identify stakeholders ==
10 10
11 -The goal of identifying stakeholders is to establish a shared understanding of who are the data owners, contributors, managers, and users affected by data management. Stakeholders include not only those who create and manage data but also entities that are data users, funding agencies, home institutions of contributing researchers ([[DataOne, 2011>>||anchor="DataONE"]]).
11 +The goal of identifying stakeholders is to establish a shared understanding of who are the data owners, contributors, managers, and users affected by data management. Stakeholders include not only those who create and manage data but also entities that are data users, funding agencies, or home institutions of contributing researchers ([[DataOne, 2011>>||anchor="DataONE"]]).
12 12
13 -Explicit identification of stakeholders is important because scientific data management processes are increasingly complex and so involve entities with different roles, specializing in different aspects of data management. For example, data managers are responsible for data storage, management, backup, and access. Research team members need to document data collection and processing methods and parameter, validate and verify data quality, and maintain information on workflows and data flows for provenance and quality control purpose. Technology staff needs to assure that the infrastructure services are in good order to support the data management activities. However, organizations may not have all of these stakeholders and responsibilities can be differently distributed.
13 +Explicit identification of stakeholders is important because research data management processes are increasingly complex and so involve entities with different roles, specializing in different aspects of data management. For example, data managers are responsible for data storage, management, backup, and access. Research team members need to document data collection and processing methods and parameters, validate and verify data quality, and maintain information on workflows and data flows for provenance and quality control purposes. Technology staff need to assure that the infrastructure services are in good order to support the data management activities. However, organizations may not have all of these stakeholders and responsibilities can be differently distributed.
14 14
15 -Furthermore, the tasks and interests in data management of these different groups may or may not cross with one another. For example, Mullins ([[2007>>||anchor="Mullins"]]) reported that, after extensive interviews with scientist in biology, earth and atmospheric science, astronomy, chemistry, chemical engineering, plant science, ecological sciences, it became clear that no single method or process would suffice the needs for data management cross all disciplines. Their extensive conversations with stakeholders led them to identify the need to foster collaboration between domain scientists as well as librarians/archivists, computer scientists, infrastructure technologists. In addition to project level stakeholders, three types of data sharing intermediaries may have a role in supporting data management at various stages of the research data life cycle: data archives (all stages), institutional repositories (end of research life cycle), and virtual organizations.
15 +Furthermore, the tasks and interests in data management among these different groups may or may not cross with one another. For example, Mullins ([[2007>>||anchor="Mullins"]]) reported that, after extensive interviews with scientists in biology, earth and atmospheric science, astronomy, chemistry, chemical engineering, plant science, and ecological sciences, it became clear that no single method or process would suffice the needs for data management across all disciplines. Their extensive conversations with stakeholders led them to identify the need to foster collaboration between domain scientists as well as librarians/archivists, computer scientists, and infrastructure technologists. In addition to project level stakeholders, three types of data sharing intermediaries may have a role in supporting data management at various stages of the research data life cycle: data archives (all stages), institutional repositories (end of research life cycle), and virtual organizations.
16 16
17 -As a result, explicit identification of stakeholders is necessary to ensure that the design of the processes meets their different needs and to ensure implementation efficiency and usefulness of data management. As in Mullins ([[2007>>||anchor="Mullins"]]) identification of stakeholders may start with discussion with key informants, such as researchers or sponsored program office staff and then use snowball sampling to identify additional stakeholders. The results of these efforts may be confirmed by a follow-up survey.
17 +As a result, explicit identification of stakeholders is necessary to ensure that the design of the processes meets their different needs and to ensure implementation efficiency and usefulness of data management. As in Mullins ([[2007>>||anchor="Mullins"]]), identification of stakeholders may start with discussion with key informants, such as researchers or sponsored program office staff, and then use snowball sampling to identify additional stakeholders. The results of these efforts may be confirmed by a follow-up survey.
18 18
19 19 == 1.1.2 Develop user requirements ==
20 20
21 -The goal of developing user requirements is to describe the goals the data management systems and practices achieve for various user groups without going into details about how those goals are to be achieved. For example, researchers may require that data management ensure that data are available for future analysis, while potential reusers of data may require effective data description to enable them to find and make sense of the data.
21 +The goal of developing user requirements is to describe the goals the data management systems and practices achieve for various user groups, without going into details about how those goals are to be achieved. For example, researchers may require that data management ensures that data are available for future analysis, while potential reusers of data may require effective data description to enable them to find and make sense of the data.
22 22
23 -Developing user requirements for scientific data management must consider a wide array of factors because differences in disciplinary or research fields and types of research significantly affect the workflows, data flows, and data management and use practices. These differences in turn will affect the user requirements for data management services and tools and result in idiosyncrasies of the systems and services supporting the data management tasks. For example, the requirements for storing and describing real-time stream of data are different than for survey data. In a collaborative data management situation, user requirements must take into consideration the technical standards for data formats, sampling protocols, variable names, data discovery interfaces, among other things ([[Hale et al., 2003>>||anchor="Hale"]]).
23 +Developing user requirements for research data management must consider a wide array of factors because differences in disciplinary or research fields and types of research significantly affect the workflows, data flows, and data management and use practices. These differences in turn will affect the user requirements for data management services and tools and will result in idiosyncrasies of the systems and services supporting the data management tasks. For example, the requirements for storing and describing a real-time stream of data are different than for survey data. In a collaborative data management situation, user requirements must take into consideration the technical standards for data formats, sampling protocols, variable names, and data discovery interfaces, among other things ([[Hale et al., 2003>>||anchor="Hale"]]).
24 24
25 -User requirements for scientific data management may be identified through analyzing data flows, workflows, leading data management problems, and researchers’ data practices. These requirements can be represented at a high level in use cases, user scenarios or personas ([[Cornell University Library, 2007>>||anchor="Cornell"]]; [[Lage, Losoff, & Maness, 2011>>||anchor="Lage"]]). A key point in this process is that user requirements mean not only clear-cut project objectives but also goals for the data management services to serve a longer term and wider scope of scientific data management.
25 +User requirements for research data management may be identified through analyzing data flows, workflows, leading data management problems, and researchers’ data practices. These requirements can be represented at a high level in use cases, user scenarios or personas ([[Cornell University Library, 2007>>||anchor="Cornell"]]; [[Lage, Losoff, & Maness, 2011>>||anchor="Lage"]]). A key point in this process is that user requirements mean not only clear-cut project objectives but also goals for the data management services to serve a longer term and wider scope of research data management.
26 26
27 27 == 1.1.3 Establish quantitative objectives for data management ==
28 28
29 -The goal of establishing quantitative objectives for data management is to provide a set of measures of the data management process and quantitative targets for those measures. For example, a simple metric is the quantity of data collected and the cost of the collection process. For instance, in doing a survey, a goal might be a certain sample size (number of surveys completed) and a target set based on the research needs and the project’s budget for data collection. An alternative metric is the quality of the data, with a target of a no more than a certain error rate. A goal for data privacy might be that there be no unintentional data releases. For data sharing, a goal might be that new users can gain access to the data within a certain time period.
29 +The goal of establishing quantitative objectives for data management is to provide a set of measures of the data management process and quantitative targets for those measures. For example, a simple metric is the quantity of data collected and the cost of the collection process. In doing a survey, a goal might be a certain sample size (number of surveys completed) and a target set based on the research needs and the project’s budget for data collection. An alternative metric is the quality of the data, with a target of a no more than a certain error rate. A goal for data privacy might be that there be no unintentional data releases. For data sharing, a goal might be that new users can gain access to the data within a certain time period.
30 30
31 31 Establishing quantitative objectives is important to provide a basis for measuring the effectiveness of the data management process and for assessing improvements to the process. Picking inappropriate measures can be counterproductive if it leads people to focus on achieving the wrong goals. For example, if a data repository used only number of datasets collected as a measure of the data archiving process, it might fail to ensure the datasets are well documented or useful, resulting in a large collection of useless data. It is likely that a portfolio of measures will need to be developed, addressing the different goals of the process.
32 32
... ... @@ -36,11 +36,11 @@
36 36
37 37 == 1.1.4 Develop communication policies ==
38 38
39 -Developing communication policies is developing communication channels and procedures among the constituencies. This makes communication efficient and clear. Communication channels are specific to organizational contexts, and can be facilitated by communication technologies such as websites, ticketing systems, discussion forum, mailings, wikis, social media, etc.
39 +Developing communication policies relates to communication channels and procedures among the constituencies. This makes communication efficient and clear. Communication channels are specific to organizational contexts, and can be facilitated by communication technologies such as websites, ticketing systems, discussion forum, mailings, wikis, social media, etc.
40 40
41 -Developing communication policies is dependent on the scale and context of data management. For example, on a community level data management project needs to maintain proper channels to communicate with internal functional groups and external constituencies about the decisions, procedures, and policies about the process and products. These may be a call for comments and suggestions on a metadata schema, policy on data publication and use, or the approval process for contributed data sets. A research group may also install communication policies that will clearly specify the reporting channels for data management operations.
41 +Developing communication policies is dependent on the scale and context of data management. For example, a community level data management project needs to maintain proper channels to communicate with internal functional groups and external constituencies about the decisions, procedures, and policies about the process and products. These may be a call for comments and suggestions on a metadata schema, policy on data publication and use, or the approval process for contributed data sets. A research group may also install communication policies that will clearly specify the reporting channels for data management operations.
42 42
43 -Whether a data management project is at a community level or research group level, the objectives and expectations should be clearly defined and communicated. This is especially important when multiple partners are involved because documenting the nature of collaborative partnership supports open communication ([[Hale et al., 2003>>||anchor="Hale"]]). Policies for data management, use, and services are an instrument of communication. Providing them on institution or project’s websites as separate documents offers open communication with the community members and constituencies. Data service providers should maintain open and effective communication venues for the community. For example, Cornell’s Research Data Management Service Group uses their website to provide communication channels for their community on different levels ([[https:~~/~~/confluence.cornell.edu/display/rdmsgweb/Home>>url:https://confluence.cornell.edu/display/rdmsgweb/Home||rel="__blank"]]).
43 +Whether a data management project is at a community level or research group level, the objectives and expectations should be clearly defined and communicated. This is especially important when multiple partners are involved because documenting the nature of collaborative partnership supports open communication ([[Hale et al., 2003>>||anchor="Hale"]]). Policies for data management, use, and services are an instrument of communication. Providing them on an institution or project’s websites as separate documents offers open communication with the community members and constituencies. Data service providers should maintain open and effective communication venues for the community. For example, Cornell’s Research Data Management Service Group uses their website to provide communication channels for their community on different levels ([[https:~~/~~/confluence.cornell.edu/display/rdmsgweb/Home>>url:https://confluence.cornell.edu/display/rdmsgweb/Home||rel="__blank"]]).
44 44
45 45
46 46 == References ==
... ... @@ -68,4 +68,3 @@
68 68
69 69
70 70
71 -

XWiki Enterprise 5.1-milestone-1 - Documentation