A Capability Maturity Model for Research Data Management

3.2 Ability to Perform

Last modified by Arden Kirkland on 2014/06/06 12:56

3.2 Ability to Perform

Ability to Perform describes the preconditions that must exist in the project or organization to implement the process competently. Ability to Perform typically involves resources, organizational structures, and training.

The ability to perform in the data description and representation process area refers to the readiness of metadata artifacts and tools as well as the readiness of staff and procedures that are essential for performing data description and representation. 

3.2.1 Develop or adopt metadata specifications and schemas

A large number of metadata standards are available for adoption. Whether to develop new metadata specifications or adopt an existing standard requires a good knowledge of the standards relevant to the description needs. Metadata policies (See Section 3.1) provide guidelines for decision making about what data should be described by agreed-upon metadata standards or schemas, and when. Metadata specifications define how data should be described with the goal of helping future users find, identify, select, obtain, and appropriately understand and use information from a dataset. Metadata specifications are usually a collection of elements, controlled vocabularies, encoding schemas, and best practice guidelines.  

Regardless of whether the work involves developing new specifications or adopting existing standards, careful analyses of data types and status at different stages of the research lifecycle must be performed to understand description and user requirements. For example, active data files that may change by the minute will be fine with just rudimentary metadata embedded in the file (descriptive file names, creator's name, time stamps, and other technical metadata), while a dataset as the final data product from a project will need comprehensive metadata to describe the research context and key metadata values.   

In practice, metadata standards are rarely followed exactly as they are. Modifications will most likely be necessary when adopting a metadata standard(s). The resulting metadata specifications from modifying one or more metadata standards are called metadata "Application Profiles" (AP). Zeng & Qin provide a detailed discussion of different approaches to designing metadata application profiles (2014). Many projects and communities have created numerous metadata application profiles and many of these APs can be located through metadata directories or registries, e.g., the Digital Curation Centre in the UK (http://www.dcc.ac.uk/) hosts a metadata directory for science disciplines at http://www.dcc.ac.uk/resources/metadata-standards. Sometimes informal, "homegrown" metadata practices are used, which is better than using no metadata schema at all. Whenever possible, use a previously created schema that complies with an authoritative community standard. Use of these services can help prevent "reinventing the wheel" when designing metadata specifications and schemas.

In addition to easing retrieval, the use of standards makes documentation more consistent in general. The use of a schema will greatly improve the interoperability of the information collected.  

3.2.2 Select and acquire tools

Tools for producing metadata should be selected and evaluated for feasibility. Metadata standards often come with tools. Some standards have multiple tools. An example of a type of tool is the workflow management system astrophysicists use that automates capture of metadata. Automated tools typically cannot capture all of the necessary metadata. A best practice is to make use of tools currently in use in a research community for generating metadata (Riley, 2014). 

3.2.3 Develop strategies for generating metadata based on community practices

Metadata descriptions may be created for a collection of data, the study that generated the collection of data, or individual data sets and files. For computationally-intensive research fields such as astrophysics, much of the required metadata may be captured automatically for data files and datasets, but in field and experimental research fields such as ecology and geodynamics, a large amount of human intervention has to go into the metadata creation process. A best practice for generating metadata is to leverage existing documentation practices within a community of researchers  (Riley, 2014).

One strategy for generating metadata to facilitate discovery and long-term preservation is to rely on researchers to perform this activity themselves. Thus far this approach has had limited success (Tenopir, 2011), and has inhibited the deposit of data in repositories with useful metadata  (Riley, 2014). This is often a default approach for generating metadata due to limited resources.

There are efforts to automate the generation of metadata via software tools, though this capability is not fully realized for most research communities. An example of an ability to perform issue is ensuring flexible data services for virtual datasets (DataONE, 2011).

A best practice in many contexts is to conceptualize metadata creation as a shared responsibility, that is facilitated by librarian support  (Riley, 2014). For example, the ICPSR data repository asks researchers to provide descriptive study information, but also devotes significant staff resources to enhancing researcher metadata to make it more fully interoperable with DDI (Data Documentation Initiative) metadata (a social science metadata standard), and transforming data into multiple data formats (for three common statistical software platforms) to make it widely accessible.

Researcher interest in documentation of data is greatest when it assists with everyday project data management (Jahnke & Asher, 2012). A best practice is to integrate metadata creation into researcher workflows during the active phase of research projects, leveraging researcher interest in project data management (Jahnke & Asher, 2012).

3.2.4 Arrange staffing for creating metadata

Roles in creating metadata vary with the scale and nature of the research context. Large, heavily funded projects often have internal infrastructure with dedicated data management personnel; smaller projects are more likely to benefit from support from data supports services offered by an academic library (Ray, 2014).

Often there are two levels of metadata that are of concern for research data: annotation on the spot that researchers do in the context of everyday data management, and high-level bibliographic metadata afforded by librarian expertise. When metadata is conceptualized as a shared responsibility, project researchers themselves might produce on the spot metadata, and need training in best practices; a librarian might then later produce bibliographic metadata to facilitate discovery.

To support documentation of everyday data management it can be helpful for researchers to commit to putting aside time at the end of each work session, and at project milestones, to document project activities (Long, 2009). 

3.2.5 Provide training for researchers and librarians

When metadata creation is conceptualized as a shared responsibility, training can be helpful for both researchers and librarians (Riley, 2014). Training for researchers can be in the form of general information appropriate for a broad range of researchers delivered at key points in the research life cycle. For example, DMPTool (https://dmp.cdlib.org/) offers guidelines for generating metadata at https://dmptool.org/dm_guidance as part of data management planning; with regard to discipline specific training on data management practices, Colorado Clinical and Translational Sciences Institute (CCTSI) offers education in data management best practices (http://cctsi.ucdenver.edu/CommunityEngagement/Resources/DataSharingGuidelines/Pages/DataManagement.aspx) for translational biomedical research via a website with videos (http://cctsi.ucdenver.edu/RIIC/Pages/DataManagement.aspx).

A promising approach to researcher data management education is the TIER protocol developed by Ball and Medeiros at Haverford College (http://www.haverford.edu/TIER/). This approach to researcher education is to experientially teach data management practices that produce replicable analysis through the structure of deliverables required for student research projects. The rationale is that if budding researchers learn data management when they learn research methods, sound documentation practices are not perceived as a hardship.

When metadata support is offered as a service delivered by subject liaison librarians, training for librarians can come via online resources. Examples include the Digital Curation Centre's curation resources (http://www.dcc.ac.uk/resources) and training materials (http://www.dcc.ac.uk/training), and Purdue University's Data Profile Toolkit (http://datacurationprofiles.org/). Librarians can also pursue more in-depth professional development, or formal education such as the five library schools in the United States that offer data curation programs (Riley, 2014).

3.2.6 Assess community data and metadata practices

The provision of metadata services requires understanding of existing research community metadata practices, in addition to metadata structures associated with libraries (Ray, 2014).  Purdue University‚Äôs data curation profiles, which are generated via interviews, are one such approach for librarians to increase their knowledge of existing practices. Another approach is to use small pilot studies early on in development of data curation services (Westra, 2014).


Rubric for 3.2 - Ability to Perform
Level 0
 This process or practice is not being observed
No steps have been taken to provide organizational structures or plans, training, or resources such as staffing and tools for metadata development
Level 1: Initial
 Data are managed intuitively at project level without clear goals and practices
Structures or plans, training, and resources such as staffing and tools for metadata development have been considered minimally by individual team members, but not codified
Level 2: Managed
 DM process is characterized for projects and often reactive
Structures or plans, training, and resources such as staffing and tools for metadata development have been recorded for this project, but have not taken wider community needs or standards into account
Level 3: Defined
 DM is characterized for the organization/community and proactive
The project follows includes structures or plans, training, and resources such as staffing and tools for metadata development that have been defined for the entire community or institution
Level 4: Quantitatively Managed
 DM is measured and controlled
Quantitative quality goals have been established regarding structures or plans, training, and resources such as staffing and tools for metadata development, and practices in these areas are systematically measured for quality
Level 5: Optimizing
 Focus on process improvement
Processes regarding structures or plans, training, and resources such as staffing and tools for metadata development are evaluated on a regular basis, and necessary improvements are implemented


DataONE. (2011). Ensure flexible data services for virtual datasets. Retrieved from https://www.dataone.org/best-practices/ensure-flexible-data-services-virtual-datasets

Jahnke, L., Asher, A., & Keralis, S. D. (2012). The problem of data. Council on Library and Information Resources (CLIR) Report, pub. #154. ISBN 978-1-932326-42-0 Retrieved from http://digitalcommons.bucknell.edu/fac_pubs/52/

Long, J. S. (2009). The workflow of data analysis using Stata. College Station, Texas: Stata Press Books.

Ray, J. M.  (2014). Introduction to research data management. In J. Ray (Ed.), Research Data Management: Practical Strategies for Information Professionals (Charleston Insights in Library, Information, and Archival Sciences). West Lafayette, Indiana: Purdue University Press.

Riley, Jenn. (2014). Metadata services. In J. Ray (Ed.), Research Data Management: Practical Strategies for Information Professionals (Charleston Insights in Library, Information, and Archival Sciences). West Lafayette, Indiana: Purdue University Press.

Tenopir, C., Allard, S., Douglass, K., Aydinoglu, A. U., Wu, L., Read, E., Manoff, M., Frame, M. (2011). Data Sharing by Scientists: Practices and Perceptions. PLoS ONE, 6(6), e21101. doi:10.1371/journal.pone.0021101. Retrieved from http://www.plosone.org/article/info:doi/10.1371/journal.pone.0021101

Westra, Brian (2014). Developing Data Management Services for Researchers at the University of Oregon. In J. Ray (Ed.), Research Data Management: Practical Strategies for Information Professionals (Charleston Insights in Library, Information, and Archival Sciences). West Lafayette, Indiana: Purdue University Press.

Zeng, M. L. & Qin, J. (2014). Metadata. Chicago, IL: ALA Neal Schuman. 

<--Previous Page / Next Page -->

XWiki Enterprise 5.1-milestone-1 - Documentation