McGoogan News

Search tips: searching for data sets

By Emily Glenn

Several discipline-specific resources are available to guide you through various search facets to discover relevant data. However, if you are just getting started with a topic, your search for data sets may be more fruitful if you start within a data bank. Data banks are large repositories of data sets on specific topics, funded by specific agencies, or focused on a geographic area. Data banks can contain data sets that are qualitative or quantitative, assembled through the course of research or mandatory reporting, and that may or may not be published.

Browse some of these existing data banks to locate data sets that match your research interests.

  • Data.gov (https://www.data.gov/): A catalog of government data from across the United States Federal Government. Over 193,100 data sets are available and can be filtered by location, format, topic, producer, and more. This is a good place to search for a topic that spans multiple agencies.
  • ICPSR (http://www.icpsr.umich.edu/): ICSPR, an international consortium of more than 750 academic institutions and research organizations, maintains a data archive of more than 250,000 files of research in the social and behavioral sciences. The vast majority of ICPSR data holdings are public-use files with no access restrictions.
  • UNdata (http://data.un.org/): UNdata is a single entry point for United Nations data set (and replaces the UN Common Database). Data can be accessed via keyword search, hierarchical browse, or advanced search. UNdata contains official statistics produced by countries and compiled by United Nations data systems, as well as estimates and projections for agriculture, crime, education, energy, industry, labor, national accounts, population, and tourism. You can also find indicators such as Millennium Development Goals.
  • Qualitative Data Repository (https://qdr.syr.edu/discover): Qualitative Data Repository (QDR) is a dedicated archive for storing and sharing digital data (and accompanying documentation) generated or collected through qualitative and multi-method research in the social sciences. The repository’s emphasis is on political science.

To cast a wider net, try searching for data sets via Google. Data sets found via Google may be missing their original contextual information that would otherwise be present in a data bank.

  • Use Google Advanced Search
  • Include search terms like data or table
  • Use OR in all caps will find similar or related terms
  • Search for a particular document type (e.g. filetype:xls)
  • Search for data on a particular site or domain (e.g. site: .gov)

Exclude words by using the “-” sign in front of the word you wish to exclude

  • Sample search results: “tobacco screening” OR “tobacco cessation” filetype:xls -.gov

Now that you have located a data set, how can you tell if it is of high quality? As with other information sources, consider the completeness, accuracy, and timeliness of the data sets you are reviewing. Knowing the domain of the can help you gauge the study design and data collection methods used to gather the data and whether design and methods contributed to reliability and validity. Do the format and file type avail themselves to download and interpretation of data sets? Are all variables named and clearly described? Completeness of the data and codebook, transparency in methods, and appropriate complexity and are attributes of high-quality data sets (1). The lifecycle of the research, reporting, and interpretation should be considered when evaluating the quality of a data set.

(1)    Chen, H., Hailey, D., Wang, N., & Yu, P. (2014). A Review of Data Quality Assessment Methods for Public Health Information Systems. International Journal of Environmental Research and Public Health, 11(5), 5170–5207. http://doi.org/10.3390/ijerph110505170