Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Research Data Management (RDM): Finding and reusing data

Introduction

The research data life cycle begins with the creation of new data or the reuse of existing data. Moreover, science promotes the reproducibility of investigations using the collected data as a method of confirming reported results and increasing the transparency of the research processes. Reuse of data also facilitates generating new advances in scientific fields. Despite the fact that reuse of data is a pretty common practice, accessing research data for reutilization is not always so simple. Often, research data is dispersed in different repositories or remains in researchers' computers. This section presents several recommendations for accessing research data and strategies for using them properly.

Where to find data

There is a wide array of places where one can find data. Some of these are: ​

  • In an article: as part of the text, in the supplementary materials section, or contacting the author to request the data
  • In data journals, which are journals specifically dedicated to publishing data
  • In federal and local government agency websites  
  • In research project websites
  • In repositories, which can be subject-specific, institutional or general

Tools

A few tools for working with data: 

Search engines for finding data

These are some search engines you may use to find data:

Repository data

Some general repositories are:

Some discipline-specific repositories are: 

  • The Cell Image Library™ by Center for Research in Biological Systems
  • Protein Data Bank by Research Collaboratory for Structural Bioinformatics
  • PubChem by National Center for Biotechnology Information (NIH) 
  • GenBank by National Center for Biotechnology Information (NIH) 
  • DOE for U.S. Department of Energy sponsored projects
  • NEON by the National Ecological Observatory Network of the National Science Foundation (NSF)
  • PANGAEA by Alfred Wegener Institute, Helmholtz Center for Polar and Marine Research (AWI)
  • ICPSR by the Institute for Social Research at the University of Michigan 
  • Qualitative Data Repository by the Center for Qualitative and Multi-Method Inquiry at Syracuse University
  • CERN (European Organization for Nuclear Research)
  • GitHub (for source code)

How to cite research data

If you use third party data, you must cite it following these recommendations:  

  • As a minimum, you should include:
    • Identifier
    • Creator
    • Title
    • Publisher
    • Date of publication
    • Type of resource 

Here are two templates and a citation example based on suggestions provided by DataCite in the document titled DataCite Metadata Schema Documentation for the Publication and Citation of Research Data:

  • Templates:
    • Creator (Publication date): Title. Publisher. (Type of resource). Identifier
    • Creator (Publication date): Title. Version. Publisher. (Type of resource). Identifier
  • Citation example:
    • Irino, T; Tada, R (2009): Chemical and mineral compositions of sediments from ODP Site 127-797. V. 2.1. Geological Institute, University of Tokyo. (dataset). https://doi.org/10.1594/PANGAEA.726855

Interpreting CC licenses

Creative Commons (CC) licenses are frequently assigned to data sets in order to specify how people can use them. It is important to be able to interpret these licenses correctly.

Click on the image below in order to enlarge it and see explanations about each of the six CC licenses.

Licencias Creative Commons

Taken from the Creative Common website

Open Data Commons Licenses

Open Data Commons Logo

Public Domain Dedication and License (PDDL)

  • “Public Domain for data/databases”

Attribution License (ODC-By)

  • “Attribution for data/databases”

Open Database License (ODC-ODbL)

  • “Attribution and Share-Alike for data/databases”