A Capability Maturity Model for Research Data Management
Last modified by Arden Kirkland on 2014/06/06 12:54
  • charflynn
    charflynn, 2013/12/06 11:47

    For a good instructional video on the limitations of spreadsheet software (i.e. Excel) for data management, see:
    Colorado Clinical and Translational Sciences Institute (CCTSI). (2013). Managing your research data. http://cctsi.ucdenver.edu/RIIC/Pages/DataManagement.aspx#database

    Text from the CCTSI data management website (same URL as above):

    "Excel Is Not the Answer: Choosing a Database Structure

    While there are a number of other database management systems to choose from, the use of spreadsheets such as Excel for data entry and storage is never a good idea! Protect yourself by keeping original primary data in a robust database, which can be exported to a spreadsheet or statistical package for analysis without corrupting the underlying data. Among the many problems with Excel for data entry and storage are:

    It is much too easy to corrupt data in Excel. If you make the common error of sorting on a single column and forgetting to undo the change before saving the dataset, your dataset is now hopelessly corrupted and unrecoverable.

    Excel doesn’t provide facilities for storage of metadata

    Range checking/data validation is possible but cumbersome

    Keeping all the data on a single spreadsheet encourages PHI to be mixed with non-PHI, which can create privacy and security concerns

    While MS Access does solve some of the data entry and storage problems inherent to Excel, it does not meet HIPAA standards for security, including standards related to authorization, authentication and audit controls. Refer to URL: http://www.uchsc.edu/hipaa/internal/ for additional information."

    Instead of using Excel or Access, CCTSI encourages their researchers to use a database system they support called REDCap (Research Electronic Data Capture) for data entry and storage. 

  • charflynn
    charflynn, 2013/12/06 11:49

    Note: PHI in the above comment refers to "personal health information." This is a class of data biomedical researchers need to be cognizant of for HIPAA compliance.

  • crowston
    crowston, 2014/01/22 20:57

    something about training for data collection? 

    • charflynn
      charflynn, 2014/03/19 12:28

      Yes, they are training materials. 

      I was mentioning them as a source that addresses best practices for selecting file formats. CCTSI might articulate a best practice for selecting file formats for research data as: "Use data file formats suitable for managing research data." In the text I shared above they explain the limitations of using commercial spreadsheet software as a file format for research data.  

      Some of their reasons are the same as those you mention in the paragraph that begins "At the whole file level, electronic data files should..."  They mention other issues as well (e.g. no metadata, does not allow for storing personal health information in a way that is HIPAA compliant). 

      Their beef with Excel isn't that it's a commercial product. The problem in their eyes is that it wasn't made to be a data management tool, and so it doesn't do a good job doing the things we want a data management tool do, and that this is especially true for research data.

XWiki Enterprise 5.1-milestone-1 - Documentation