A Capability Maturity Model for Research Data Management
CMM for RDM » 3. Data description and representation

3. Data description and representation

Last modified by Arden Kirkland on 2014/06/03 19:04

3. Data description and representation

Overall goal: Describe and represent data to facilitate future discovery and use.

Data description and representation is a process of capturing information that enables users to find, understand, and use/reuse data. In a broad sense even an email exchange between colleagues explaining how data can and cannot be used is a type of informal metadata (Edwards et al, 2011). The focus of this section of the CMM for RDM is on metadata process areas that involve adopting metadata standards, generating metadata descriptions for data, and best practices.

Metadata can be applied to different levels of interrelated research data outputs, from those that are more granular to those that are more global, such as:

  • a variable, parameter, or column heading field in a database
  • a file 
  • a study

During the active phase of a research project researchers might be most attuned to documentation and management of data at granular levels (i.e. variables and files). However, the metadata in a data archive needs to have contextual information about the study as a whole that is not common knowledge to those beyond the project in which the data were produced.

Metadata has different functions that can carry differing requirements. It is generally true that there is less immediate need for metadata the closer one is to the context of data creation. A researcher who just took a measurement has the units of measurement in her head, and researchers on collaborative projects have informal opportunities for communicating about data. When data gets farther from the context of creation, documentation of contextual details becomes increasingly important. There is a sense in which documentation of contextual information has a life cycle of its own, which roughly correspond with different functions metadata serves:

  • active management of data during a project,
  • preservation and discovery once data have been shared in an archive,  
  • reuse of data or replication of analysis performed in a study, and
  • assessment of the impact of research outputs.

Different stakeholders might value different metadata functions. For example, researchers are typically concerned with active management of data during a project, and librarians tend to value preservation and discovery once data have been shared in an archive.  Consequently, different stakeholders may have deeply different conceptions of metadata requirements. A life cycle approach to data management, which takes the function of metadata throughout its life cycle into account, can be helpful in attending to differences in perspective.

Fortunately, one metadata element can often serve multiple functions (Riley, 2014), and documentation of data at different levels of granularity can reap benefits at other levels. Practices that can improve project level data management (e.g. variable documentation) can also increase opportunities for discovery when the study data is archived (e.g. ICPSR  is a data archive that offers a variable search capability). Similarly, practices that improve discovery for secondary users also facilitate self-discovery for data creators who may not remember project details at a later date.


Edwards, P. N., Mayernik, M. S., Batcheller, A. L., Bowker, G. C., & Borgman, C. L. (2011). Science friction: Data, metadata, and collaboration. Social Studies of Science, 41(5), 667–690. doi:10.1177/0306312711413314. Retrieved from http://pne.people.si.umich.edu/PDF/EdwardsEtAl2011ScienceFriction.pdf

Riley, Jenn. (2014). Metadata services. In J. Ray (Ed.), Research Data Management: Practical Strategies for Information Professionals. West Lafayette, Indiana: Purdue University Press.


<--Previous Page / Next Page -->

Created by Jian Qin on 2013/06/12 06:39

XWiki Enterprise 5.1-milestone-1 - Documentation