A Capability Maturity Model for Research Data Management
CMM for RDM » 4. Data Dissemination » 4.3 Activities Performed

4.3 Activities Performed

Last modified by crowston on 2014/06/01 12:01

4.3 Activities Performed

Activities Performed describes the roles and procedures necessary to implement a key process area. Activities Performed typically involve establishing plans and procedures (i.e., the specific actions that need to be performed), performing the work, tracking it, and taking corrective actions as necessary.

Policies regarding data dissemination institutionalize data dissemination and show commitment, but enabling technologies add the actual ability to perform this process.

4.3.1 Identify and manage data products

Along a research lifecycle data come in various forms and with different levels of processing. They can be categorized based on the nature of research as observational, experimental, derived (or compiled), or simulation (DataONE, 2011e). The nature of research determines what types of data will be produced and what format these data will take (DataONE, 2011c). Before these data become sharable, they must be processed, "packaged," and registered in a repository or catalog of data products. According to the level of processing, data products can range from raw data, calibrated data, or derived/calculated data to visualized and interactable data. While data sharing policies define the classification of data to be shared, this process requires a list of criteria and procedures to identify individual datasets that can be deemed as data products for sharing and any restrictions of access and usage associated with each of them. 

The identification and management of data products relies heavily on the metadata descriptions (a key process area described in Chapter 3) and tools. As data products vary in their content and complexity, e.g. both a large collection of datasets and documentation files or only a single data file may be viewed as a data product, it is essential to have clear guidelines for how data products may be grouped, packaged, or aggregated. It is also necessary that data packages be represented (Jones et al., 2001). The dissemination service interfaces should be based upon Open Standards (DataONE, 2011d). 

4.3.2 Encourage sharing

Shared data can improve research by providing greater spatial, temporal, and disciplinary coverage than individual organizations can offer. Data submitted to a data repository are integrated and provide a way for organizations to build repositories of cohesive, high-quality data (Hale et al., 2003). However, data sharing policies following the institution's commitment to perform data dissemination do not always function as an incentive to motivate researchers to share data. A variety of venues should be used to convey the benefits of sharing data and the protection of data confidentiality and intellectual property rights to raise the awareness among researchers. Incentives such as impact and usage metrics embedded in the dissemination service system should be implemented as a reward mechanism to encourage sharing. Create shared need for data among partners to encourage better data stewardship (Hale et al., 2003)

4.3.3 Enable data discovery

Data discovery is a key function of all data repository systems. The discovery services should take into consideration the needs of both domain experts and non-expert users. For data products that might be useful for interdisciplinary research, it is even more important for the discovery service to facilitate and support discovery functions through enabling search and browsing. In other words, make your outputs perceivable (DataONE, 2011b). 

Discovery services should also allow the addition of community tagging, annotation, and comments (DataONE, 2011f). For example, researchers can share and publish data using web-based datacasting tools and services (DataONE, 2011a). 

4.3.4 Distribute data

Multiple channels can be established for data distribution to allow the widest possible coverage and timely dissemination. These channels include:

  • Linking data to publications: Dryad Digital Repository (http://datadryad.org/) and Astrophysics Data Systems (ADS) (http://adsabs.harvard.edu/index.html) are two examples of this type of services. Linking services enables bi-directional discovery, i.e., finding and obtaining data through publications or vice versa.
  • Registering the data repository in a data union catalog: Examples includes DataBib (http://databib.org/) and the Registry of Research Data Repositories (re3data, http://www.re3data.org/). The DataONE project has built a system for searching across multiple member data repositories. Joining a union catalog or data registry allows for federated and other broader searches, which affords the data to be distributed to much wider communities. 
  • Distribute information on data products through Web services: Open Standards for Web services include RSS/Atom and Web Services Definition Language (DataONE, 2011d). Users may subscribe these services to receive timely updates on data product information. 

4.3.5 Ensure data citation

Data citation embodies two notions: to credit the data creator and to enable data reuse, verification, and impact tracking (DataCite, 2014). To enable consistent practice of data citation, guidelines should be provided regarding what information should be included (content) and how the information should be presented in a data citation (style). The Socioeconomic Data and Applications Center (SEDAC) provides examples of guidelines for citing the data from this center. This guideline specifies the required information for a data citation as:

  • Primary responsibility party
  • Year of publication, issue, release
  • Edition/Version
  • Type of resource, format
  • Statement of responsibility for dynamically generated data and maps
  • Publisher and place of publication
  • Distributor
  • Availability and access
  • Retrieval statement
  • Unpublished data (SEDAC, 2014)

Adopting a data citation standard such as DataCite can be another way to ensure consistent data citation practice. 


Rubric for 4.3 - Activities Performed
Level 0
 This process or practice is not being observed
No steps have been taken for managing the workflow of data dissemination, including sharing, discovery, and citation
Level 1: Initial
 Data are managed intuitively at project level without clear goals and practices
Workflow management for data dissemination, including sharing, discovery, and citation, has been considered minimally by individual team members, but not codified
Level 2: Managed
 DM process is characterized for projects and often reactive
Workflow management for data dissemination, including sharing, discovery, and citation, has been recorded for this project, but has not taken wider community needs or standards into account 
Level 3: Defined
 DM is characterized for the organization/community and proactive
The project follows approaches to workflow for data dissemination, including sharing, discovery, and citation, as defined for the entire community or institution
Level 4: Quantitatively Managed
 DM is measured and controlled
Quantitative quality goals have been established regarding workflow for data dissemination, including sharing, discovery, and citation, and practices are systematically measured for quality
Level 5: Optimizing
 Focus on process improvement
Processes regarding workflow for data dissemination, including sharing, discovery, and citation, are evaluated on a regular basis, and necessary improvements are implemented


DataCite. (2014). Why cite data? Retrieved from https://www.datacite.org/whycitedata

DataONE. (2011a). Advertise your data using datacasting tools. Retrieved from https://www.dataone.org/best-practices/advertise-your-data-using-datacasting-tools

DataOne. (2011b). Check data and other outputs for print and web accessibility. Retrieved from https://www.dataone.org/best-practices/check-data-and-other-outputs-print-and-web-accessibility

DataONE. (2011c). Define expected data outcomes and types. Retrieved from https://www.dataone.org/best-practices/define-expected-data-outcomes-and-types

DataONE. (2011d). Ensure flexible data services for virtual datasets. Retrieved from https://www.dataone.org/best-practices/ensure-flexible-data-services-virtual-datasets

DataONE. (2011e). Identify data with long-term value. Retrieved from https://www.dataone.org/best-practices/identify-data-long-term-value

DataONE. (2011f). Provide capabilities for tagging and annotation of your data by the community. https://www.dataone.org/best-practices/provide-capabilities-tagging-and-annotation-your-data-community

Hale, S. S., Miglarese, A. H., Bradley, M. P., Belton, T. J., Cooper, L. D., Frame, M. T., et al. (2003). Managing Troubled Data: Coastal Data Partnerships Smooth Data Integration. Environmental Monitoring and Assessment, 81(1-3), 133–148. doi:10.1023/A:1021372923589. Retrieved from http://link.springer.com/article/10.1023%2FA%3A1021372923589

Jones, M. B., Berkley, C., Bojilova, J., & Schildhauer, M. (2001). Managing scientific metadata. IEEE Internet Computing, 5(5), 59–68. doi:10.1109/4236.957896. Retrieved fromhttp://www.computer.org/csdl/mags/ic/2001/05/w5059-abs.html

SEDAC. (2014). Citing our data. Retrieved from http://sedac.ciesin.columbia.edu/citations

<--Previous Page / Next Page -->

XWiki Enterprise 5.1-milestone-1 - Documentation