A Capability Maturity Model for Research Data Management
CMM for RDM » 1. Data Management in General » 1.2 Ability to Perform

1.2 Ability to Perform

Last modified by Arden Kirkland on 2014/05/18 11:56

1.2 Ability to Perform

Ability to Perform describes the preconditions that must exist in the project or organization to implement the process competently. Ability to Perform typically involves resources, organizational structures, and training.

1.2.1 Develop and implement a budget

Effective data management incurs costs (Hale et al, 2003). Budgeting for data management helps ensure allotment of sufficient financial resources to support data management activities.
Budget considerations vary with the type, scope, scale, and timeframe of the data management context. Those who collect data need adequate financial resources to manage local data during the life cycle of the project (DataOne, 2011a; Hale et al., 2003). Local data management costs might include data management personnel, database systems, servers, networks, and security for project data that is shared over a network (Hale et al., 2003).

Another type of data management cost is synthesis and integration of data, and collaboration necessary to support this synthesis (Hale et al., 2003). The creation of metadata using a standardized metadata format is a cost for data that is publically shared beyond the scope of a research project.

Organizations with missions aimed at disseminating and preserving data budget for data management beyond the timeframe of specific research projects. When data centers are underfunded, their focus becomes managing their own data rather than addressing the broader needs of those they serve.

As new data management models emerge, the budget for data management also needs to take the memberships or subscriptions of data repository services into consideration. This has become a trend that, on the one hand, disciplinary data repositories are seeking self-sustainable solutions through devising economic models that will charge institutions for services (Sheaffer, 2012). On the other hand, institutions that are initiating or have established data management services will need funding to start up the RDM services and keep them in operation once they become part of the regular tasks. 

Budgeting should include not only allotment of hardware and software, but also near- and long-term RDM service payments and staff with the appropriate technical expertise. In their ethnographic study of data and work practices across three science cyberinfrastructure projects in the environmental sciences Mayernik et al. (2011) found that “human support is valuable in the development of data management plans, but is only available in institutions that specifically provide funding for it” (p. 421).

1.2.2 Staffing for data management

Staffing for data management refers to identifying the levels and types of expertise needed for achieving immediate and/or near-term data management objectives. A data management lifecycle involves different tasks at different stages that demand a combination of varying levels and types of expertise and skills. For example, the Data Preservation Alliance for the Social Sciences (DATA-PASS at http://www.data-pass.org) is a broad-based partnership of data archives for acquiring, cataloging, and preserving social sciences data. The partnership involves existing data repositories, academic institutions, and government agencies. As such the communication among partners, technical system architecture, and policies are inherently complicated. Having a capable staff will be extremely important to meet the constantly shifting data curation activities (Walters & Skinner, 2011).

Staffing needs should be reviewed carefully and each role/position’s responsibilities specified clearly. This is not only important for hiring the right personnel but also important for developing a suitable training program “to ensure that the staff and managers have the knowledge and skills required to fulfill their assigned roles” (Paulk et al., 1993, p. 12). 

1.2.3 Develop collaborations and partnerships

Stakeholder involvement in data management processes often takes the form of collaboration and/or partnership. When resources can be effectively shared, partnerships can reduce hardware and software costs, lead to better data and data products, and reduce many technical barriers by agreeing on core data standards and the flow of data (Hale et al., 2003). Collaboration and partnership are often a process of community building that, if managed properly, can contribute to sustaining a community of RDM practice.  

Collaboration and partnership can be managed by creating agendas and schedules for collaborative activities, documenting issues, and developing recommendations for resolving relevant stakeholder issues. In addition, activities in collaboration and partnership may also include problem solving, information and experience sharing, resource/assets reuse, coordination, visits, and creation of documentations. Over time a community of RDM practice can be built, which in turn will strengthen the collaboration and partnership.

1.2.4 Train researchers and data management personnel

A key indicator for mature data management processes is that training programs are provided so researchers and staff understand data management processes well and have the capability to perform data management activities. Examples of training programs include:

  • Providing online guidance and workshops for data management
  • Training in data documentation best practices
  • Training in the unique tools and methods used in a research field

The purpose of training programs is two-fold: for researchers, the training program is to develop the skills and knowledge of individuals so that they can adopt the best practices in managing their data; and for data managers, the training program will build the institutional capability by having capable personnel to perform infrastructural and technical services for data management.

Planning for training typically involves identification of training needs, training topics, requirements and quality standards for training materials, training tasks, roles, and responsibilities, and required resources. Schedules for training activities and their dependencies also need to be laid out in the training program. Training programs may also be offered by conference workshops, professional development events, or educational programs outside of one's institution. These venues are useful for training the trainers who will provide internal training programs and services. 

1.2.5 Develop RDM tools

Research data management tools are software programs that help researchers effectively manage data during a research lifecycle. The nature of research types determines the requirements for such tools. Computational intensive research fields such as astrophysics use workflow management systems to capture metadata for provenance and output management, which is a highly automated process (Brown et al., 2006). Geodynamics data, on the contrast, often reside in spreadsheet files and sometimes are mixed with researchers' annotation text. It will be difficult to manage this type of data with completely automatic tools due to the inconsistent data recording practice (Qin, D'Ignazio, & Baldwin, 2011). Developing RDM tools in a sense is also a process of developing and establishing best practices in RDM.

Tools for RDM include off-the-shelf applications, such as data repository management systems and metadata editors created for specific standards, along with those developed in-house. Before deciding whether to adopt an off-the-shelf tool or develop one in-house, a comprehensive analysis should be conducted to understand not only the local requirements but also the need for links to community data management infrastructure and standards. This means that tools adopted or developed should consider key functions for immediate data management needs such as storage, annotation, organization, and discovery, and at the same time the "staging" functions for effective data deposition and dissemination in community, national, and international data repositories.

More often than not software tools for RDM have been developed (Michener, 2006). Adoption of such tools means adopting the mechanisms to systematically capture the integration process (DataONE, 2011b). RDM projects vary in scope and nature as the data they deal with change from discipline to discipline and from project to project. Whether tools are adopted or developed for ad hoc or long-term needs, support for researchers to use these tools should be an integral part of the tool adoption/development process (Mayernik et al., 2011).

1.2.6 Establish a data management plan

A data management plan (DMP) documents the definitions, procedures, methods, and best practices for a project or organization to maintain a consistent practice of RDM. Careful planning for data management before you begin your research and throughout the data's life cycle is essential (DataONE, 2011c) because it can increase project efficiency and optimize the reliability of the data that are collected by minimizing errors.

The most common DMPs are the kind prepared as part of a grant proposal because of the mandate from funding agencies such as the U.S.National Science Foundation (NSF), the Institute for Museum and Library Services (IMLS), or the National Endowment for the Humanities Office of Digital Humanities (NEH-ODH). Examples of this type of DMP can be found from funding agencies' websites as well as many research universities' websites, e.g., the Research Cyberinfrastructure (RCI) at UC San Diego provides a list of DMP samples for major NSF disciplinaries (http://rci.ucsd.edu/dmp/examples.html).  Also, the DMP Tool website has a list of templates based on specific funder requirements (https://dmp.cdlib.org/pages/funder_requirements). 

Resources for DMP development:

  1. Disciplinary-based NSF DMP templates: http://dmconsult.library.virginia.edu/dmp-templates/
  2. DMP Tool hosted at California Digital Library: https://dmp.cdlib.org/  

Rubric

 Rubric for 1.2 - Ability to Perform
Level 0
This process or practice is not being observed 
No steps have been taken to provide organizational structures or plans, training, or resources such as budgets, staffing, or tools
Level 1: Initial
Data are managed intuitively at project level without clear goals and practices 
Structures or plans, training, and resources such as budgets, staffing, or tools have been considered minimally by individual team members, but not codified
Level 2: Managed
DM process is characterized for projects and often reactive 
Structures or plans, training, and resources such as budgets, staffing, or tools have been recorded for this project, but have not taken wider community needs or standards into account
Level 3: Defined
DM is characterized for the organization/community and proactive 
The project follows includes structures or plans, training, and resources such as budgets, staffing, or tools that have been defined for the entire community or institution
Level 4: Quantitatively Managed
DM is measured and controlled  
Quantitative quality goals have been established regarding structures or plans, training, and resources such as budgets, staffing, or tools, and practices in these areas are systematically measured for quality
Level 5: Optimizing
Focus on process improvement  
Processes regarding structures or plans, training, and resources such as budgets, staffing, or tools are evaluated on a regular basis, and necessary improvements are implemented

References


Brown, D.A, Brady, P.R., Dietz, A., Cao, J., Johnson, B., & McNabb, J. (2006). A case study on the use of workflow technologies for scientific analysis: Gravitationalwave data analysis, in I.J. Taylor, E. Deelman, D. Gannon, and M.S. Shields(Eds.), Workflows for e-Science, chapter 5, pp. 41–61. Berlin: Springer-Verlag.


DataONE. (2011a). Define roles and assign responsibilities for data management. Retrieved from https://www.dataone.org/best-practices/define-roles-and-assign-responsibilities-data-management


DataONE. (2011b). Document the integration of multiple datasets. Retrieved from https://www.dataone.org/best-practices/document-integration-multiple-datasets


DataONE. (2011c). Plan data management early in your project. Retrieved from https://www.dataone.org/best-practices/plan-data-management-early-your-project


Hale, S. S., Miglarese, A. H., Bradley, M. P., Belton, T. J., Cooper, L. D., Frame, M. T., et al. (2003). Managing Troubled Data: Coastal Data Partnerships Smooth Data Integration. Environmental Monitoring and Assessment, 81(1-3), 133–148. doi:10.1023/A:1021372923589. Retrieved from http://link.springer.com/article/10.1023%2FA%3A1021372923589


Mayernik, S. M., Batcheller, A. L., Borgman, C. L. (2011). How institutional factors influence the creation of scientific metadata. In: Proceedings of iConference 2011, February 8-11, 2011, Seattle, WA, pp. 417-425. New York: ACM Press. 


Michener, W. K. (2006). Meta-information concepts for ecological data management. Ecological Informatics, 1(1), 3–7. doi:10.1016/j.ecoinf.2005.08.004. Retrieved from http://www.sciencedirect.com/science/article/pii/S157495410500004X


Paulk, M. C., Curtis, B., Chrissis, M. B., & Weber, C. V. (1993). Capability Maturity Model for Software, Version 1.1 (No. CMU/SEI-93-TR-024). Software Engineering Institute. Retrieved from http://resources.sei.cmu.edu/library/asset-view.cfm?assetID=11955


Qin, J., D’Ignazio, J., & Baldwin, S. (2011). A workflow-based knowledge management architecture for geodynamics data. A White paper submitted to NSF GEO/OCI EarchCube Charrette meeting. Retrieved from http://earthcube.ning.com/group/user-requirements/forum/topics/white-paper-a-workflow-based-knowledge-management-architecture


Sheaffer, P. (2012). Creating a sustainable business model for a digital repository: the Dryad experience. ASIS&T Research Data Access and Preservation Summit 2012, Baltimore, MD. Retrieved from http://www.slideshare.net/asist_org/creating-a-sustainable-business-model-for-a-digital-repository-the-dryad-experience-peggy-schaeffer-rdap12


Van den Eynden, V., Corti, L., Woollard, M. & Bishop, L. (2011). Managing and Sharing Data: A Best Practice Guide for Researchers. (3rd ed.) Essex, England: University of Essex. Retrieved from http://www.data-archive.ac.uk/media/2894/managingsharing.pdf


Walters, T. & Skinner, K. (2011). New roles for new times: Digital curation for preservation. Retrieved from http://www.arl.org/focus-areas/workforce/1086

<--Previous Page / Next Page -->


XWiki Enterprise 5.1-milestone-1 - Documentation