A Capability Maturity Model for Research Data Management
CMM for RDM » 2. Data acquisition, processing and quality assurance

2. Data acquisition, processing and quality assurance

Last modified by Arden Kirkland on 2014/05/11 15:33

2. Data acquisition, processing and quality assurance

Overall goal: Reliably capture research data in a way that facilitates use, preservation and reuse.

The first stage in the data lifecycle is to collect the data along with data documentation. Data collection is the process of capturing observations of the world—physical, biological, behavioural or social—in a form that can be used for analysis. Observations are of some property or properties (e.g., presence or absence, mass, behaviour, structure, attitude) of one or more units of observation (e.g., an organism, artifact, sample, group or organization). Data documentation means the description created by the researcher of how the data were collected (e.g., conditions, parameters, techniques, etc.), the initial processing of the data, and of the data themselves (e.g., formats, units, etc.). An important subgoal of this stage is to ensure the quality of the data and the data documentation as they are captured and processed. 

Given a phenomenon of interest, it may be possible to record the properties of all of the relevant observational units (e.g., the single case being studied in depth or all of the organisms in an experiment). However, as the scale and number of units in the study increases, it may not be feasible to record more than a fraction of the units, requiring some process for sampling, i.e., for choosing which units to measure. Temporally, data collection may be one-off, i.e., at a single point in time, or repeated at more or less regular intervals, with greater or finer temporal spacing. Finally, data collection might be made simultaneously of multiple properties of each unit of observation, or of only a few. 

Observations can be recorded as verbal or textual reports, yielding qualitative data. Qualitative observations might be left free-form or coded into a fixed set of categories, e.g., the species of an observed organism or one particular behavior or structural characteristic from a set, with more or less formal rules for translating the observation into the categories. Often data from observations are recorded as quantitative measurements. Measurement is the process of converting the observed properties to numbers, that is, symbols representing points along a scale. While conceptually a measure might take on any value, in practice there are only a finite number of possible symbols available to represent the value. Measurements can be made on scales with different properties, from an ordinal scale that simply distinguishes ordered values (e.g., the life stage of an organism that could be represented as A, B, C and so on) to a ratio scale that imposes ordering, equal spacing and a zero point (e.g., a count, length or intensity). 

Adopting a realist perspective, a measurement can be thought of as the true value plus some amount of error. Error can arise from many different sources. Some error is inherent in the measurement process itself, e.g., quantization error due to the spacing of points on the measurement scale. Such error is lower for a more precise measurement, i.e., one with a finer gradation of points on the scale. Error can also be introduced by the specific measurement process, e.g., the instruments used may have some inherent inaccuracy, or from accidents in the measurement. Finally, if observations are aggregated, e.g., to create estimates of an average value in a population, then there will be statistical uncertainty in the estimate due to sampling. 

<--Previous Page / Next Page -->

Created by Jian Qin on 2013/06/08 22:46

XWiki Enterprise 5.1-milestone-1 - Documentation