Skip to main content

Data management general guidance


Table of Contents


Introduction

What is a data management plan?

A data management plan is a formal document that outlines what you will do with your data during and after a research project. Most researchers collect data with some form of plan in mind, but it's often inadequately documented and incompletely thought out. Many data management issues can be handled easily or avoided entirely by planning ahead. With the right process and framework it doesn't take too long and can pay off enormously in the long run.

Who requires a plan?

In February of 2013, the White House Office of Science and Technology Policy (OSTP) issued a memorandum directing Federal agencies that provide significant research funding to develop a plan to expand public access to research. Among other requirements, the plans must:
"Ensure that all extramural researchers receiving Federal grants and contracts for scientific research and intramural researchers develop data management plans, as appropriate, describing how they will provide for long-term preservation of, and access to, scientific data in digital formats resulting from federally funded research, or explaining why long-term preservation and access cannot be justified"

Most US Federally funded grants and many private foundations require some form of a data management plan. In the United States, most data management plans are 2-page documents submitted as part of the funding proposal process.

We can help

We have been working with internal and external partners to make data management plan development less complicated. By getting to know your research and data, we can match your specific needs with data management best practices in your field to develop a data management plan that works for you. If you do this work at the beginning of your research process, you will have a far easier time following through and complying with funding agency and publisher requirements.

We recommend that those applying for funding from US Federal agencies, such as the NSF or NIH, use the DMP Tool. The DMP Tool provides guidance for many of the Federal agencies’ requirements, along with links to additional resources, services, and help.


Types of Data

Research projects generate and collect countless varieties of data. To formulate a data management plan, it's useful to categorize your data in four ways: by source, format, stability, and volume.

What's the source of the data?

Although data comes from many different sources, they can be grouped into four main categories. The category(ies) your data comes from will affect the choices that you make throughout your data management plan.

Observational

Experimental

Simulation

Derived / Compiled

What's the form of the data?

Data can come in many forms, including:

How stable is the data?

Data can also be fixed or changing over the course of the project (and perhaps beyond the project's end). Do the data ever change? Do they grow? Is previously recorded data subject to correction? Will you need to keep track of data versions? With respect to time, the common categories of dataset are:

The answer to this question affects how you organize the data as well as the level of versioning you will need to undertake. Keeping track of rapidly changing datasets can be a challenge so it is imperative that you begin with a plan to carry you through the entire data management process.

How much data will the project produce?

For instance, image data typically requires a lot of storage space, so you'll want to decide whether to retain all your images (and, if not, how you will decide which to discard) and where such large data can be housed. Be sure to know your archiving organization's capacity for storage and backups.

To avoid being under-prepared, estimate the growth rate of your data. Some questions to consider are:


File Formats

The file format you choose for your data is a primary factor in someone else's ability to access it in the future. Think carefully about what file format will be best to manage, share, and preserve your data. Technology continually changes and all contemporary hardware and software should be expected to become obsolete. Consider how your data will be read if the software used to produce it becomes unavailable. Although any file format you choose today may become unreadable in the future, some formats are more likely to be readable than others.

Formats likely to be accessible in the future are:

Examples of preferred format choices:

If you find it necessary or convenient to work with data in a proprietary/discouraged file format, do so, but consider saving your work in a more archival format when you are finished.

For more information on recommended formats, see the UK Data Service guidance on recommended formats.

Tabular data

Tabular data warrants special mention because it is so common across disciplines, mostly as Excel spreadsheets. If you do your analysis in Excel, you should use the "Save As..." command to export your work to .csv format when you are done. Your spreadsheets will be easier to understand and to export if you follow best practices when you set them up, such as:

Other risks to accessibility


Organizing Files

Basic Directory and File Naming Conventions

These are rough guidelines to follow to help manage your data files in case you don't already have your own internal conventions. When organizing files, the top-level directory/folder should include:

The sub-directory structure should have clear, documented naming conventions. Separate files or directories could apply, for example, to each run of an experiment, each version of a dataset, and/or each person in the group.

File Renaming

Tools to help you:


Metadata: Data Documentation

Why document data?

Clear and detailed documentation is essential for data to be understood, interpreted, and used. Data documentation describes the content, formats, and internal relationships of your data in detail and will enable other researchers to find, use, and properly cite your data.

Begin to document your data at the very beginning of your research project and continue throughout the project. Doing so will make the process much easier. If you have to construct the documentation at the end of the project, the process will be painful and important details will have been lost or forgotten. Don't wait to document your data!

What to document?

Research Project Documentation

Dataset documentation

How will you document your data?

Data documentation is commonly called metadata – "data about data". Researchers can document their data according to various metadata standards. Some metadata standards are designed for the purpose of documenting the contents of files, others for documenting the technical characteristics of files, and yet others for expressing relationships between files within a set of data. If you want to be able to share or publish your data, the DataCite metadata standard is of particular signficiance.

Below are some general aspects of your data that you should document, regardless of your discipline. At a minimum, store this documentation in a "readme.txt" file, or the equivalent, with the data itself.

General Overview

Content Description

Technical Description

Access


Persistent Identifiers

If you want to be able to share or cite your dataset, you'll want to assign a public persistent unique identifier to it. There are a variety of public identifier schemes, but common properties of good schemes are that they are:

Here are some identifier schemes:


Security and Storage

Data Security

Data security is the protection of data from unauthorized access, use, change, disclosure, and destruction. Make sure your data is safe in regards to:

Encryption and Compression

Unencrypted data will be more easily read by you and others in the future, but you may need to encrypt sensitive data.

Uncompressed data will be also be easier to read in the future, but you may need to compress files to conserve disk space.

Backups and storage

Making regular backups is an integral part of data management. You can backup data to your personal computer, external hard drives, or departmental or university servers. Software that makes backups for you automatically can simplify this process considerably. The UK Data Archive provides additional guidelines on data storage, backup, and security.

Backup Your Data

Test your backup system

Other data preservation considerations

Who is responsible for managing and controlling the data?

For what or whom are the data intended?

How long should the data be retained?

Beyond any externally imposed requirments, think about the long-term usefulness of the data. If the data is from an experiment that you anticipate will be repeatable more quickly, inexpensively, and accurately as technology progresses, you may want to store it for a relatively brief period. If the data consists of observations made outside the laborartory that can never be repeated, you may wish to store it indefinitely.


Sharing and Archiving

Why share your data?

Considerations when preparing to share data

What does it mean to share data? (adapted from the Support Your Data project)

Sharing data means making your data available so that they can be accessed and used—by yourself or by others—in the future. Here are three factors to consider when sharing data.

Format

Data should be shared in a usable format. This may mean sharing raw data instead of prepared data (or vice versa) or ensuring that data is saved in common or open file formats.

Completeness

Remember that notes, documentation, and other information about your data are part of your data. To ensure that your shared data is useful, make sure these elements are included.

Location

When choosing a method for sharing your data, consider how other researchers will find and use it. The storage options you use to save your data as you work on it will probably be different than the options you use to share it, especially over the longer term.

Requirements and how to meet them

Many research funders, publishers, institutions, and research communities have formal expectations about how data should be shared.

Things to think about

Finding a data repository

You should select a repository or archive for your data based on the long-term security offered and the ease of discovery and access by colleagues in your field. There are two common types of a repository to look for:

A searchable and browsable list of repositories can be found at these websites:


Citing Data

Citing data is important in order to:

Citation Elements

A dataset should be cited formally in an article's reference list, not just informally in the text. Many data repositories and publishers provide explicit instructions for citing their contents. If no citation information is provided, you can still construct a citation following generally agreed-upon guidelines from sources such as the Force 11 Joint Declaration of Data Citation Principles and the current DataCite Metadata Schema.

Core elements

Common additional elements

Example citations


Sharing data that you produced/collected yourself

Sharing data that you have collected from other sources

If you are uncertain as to your rights to disseminate data, UC researchers can consult with your campus Office of General Council. Note: Laws about data vary outside the U.S.

For a general discussion about publishing your data, applicable to many disciplines, see the ICPSR Guide to Social Science Data Preparation and Archiving.

Confidentiality and Ethical Concerns

It is vital to maintain the confidentiality of research subjects both as an ethical matter and to ensure continuing participation in research. Researchers need to understand and manage tensions between confidentiality requirements and the potential benefits of archiving and publishing the data.

To ethically share confidential data, you may be able to: