Databases Examples - spinningideas/resources GitHub Wiki
Free Dataset Sources & Example Databases
A collection of sources that host free, public-use datasets and sample databases. Most sources allow direct download; some require registration.
Dataset Repositories & Search Engines
- Google Dataset Search β A Google index across thousands of dataset repositories, including government, academic, and publisher sources.
- Data.gov β The U.S. governmentβs open data portal with over 200,000 datasets covering climate, health, finance, and more.
- Kaggle Datasets β Community-curated datasets for data science, machine learning, and analytics projects.
- Data.world β Collaborative platform for discovering and sharing open datasets.
- FiveThirtyEight Data β Datasets behind FiveThirtyEightβs articles and analyses (sports, politics, economics).
- GitHub: Awesome Public Datasets β A topic-centric list of high-quality public datasets across many domains.
- UCI Machine Learning Repository β Classic benchmark datasets (e.g., Iris, Wine, Car Evaluation) widely used in machine learning.
- Open Data Kit: Data Packaged Core Datasets β Curated, packaged reference datasets maintained as part of the Frictionless Data project.
Global Development & Economics
- World Bank Open Data β Free and open access to global development data, including the World Development Indicators (WDI).
- World Bank Data Catalog β Searchable catalog of World Bank datasets with thousands of entries.
- UN Data β Free access to a wide range of international statistical resources from UN agencies.
- Bureau of Labor Statistics (BLS) Data β U.S. employment, inflation, productivity, and wage statistics.
- U.S. Census Bureau β Population, housing, economic, and geographic data.
Government & Civic Data
- NYC Open Data β New York City public datasets.
- San Francisco Open Data β San Francisco city datasets.
- Data.gov.uk β UK government open data portal.
- Open Data Monitor (EU) β Overview of open data resources in Europe.
Health & Social Impact
- WHO Data β Global health statistics from the World Health Organization.
- CDC Data β U.S. Centers for Disease Control and Prevention datasets.
- County Health Rankings β U.S. county health factor rankings.
- IHME Global Burden of Disease β Global disease burden estimates from the Institute for Health Metrics and Evaluation.
Climate, Environment & Energy
- NOAA Climate Data β U.S. climate and weather data from the National Oceanic and Atmospheric Administration.
- NASA Earth Data β Satellite and Earth science datasets.
- EPA Air Quality Data β U.S. air quality and pollution data.
- U.S. Energy Information Administration (EIA) β Energy production, consumption, and price data.
- UN Greenhouse Gas Inventory Data β Greenhouse gas data from UN sources.
Machine Learning & Sample Databases
- Iris Dataset β Classic classification dataset from the UCI repository.
- Wine Quality Dataset β Wine quality ratings used for regression/classification benchmarks.
- Chinook Sample Database β A sample SQLite database for practicing SQL, modeling a digital media store.
References
- Tableau. "Free Public Data Sets For Analysis." https://www.tableau.com/learn/articles/free-public-data-sets
- awesomedata. "Awesome Public Datasets." GitHub. https://github.com/awesomedata/awesome-public-datasets
- Google. "Dataset Search." https://datasetsearch.research.google.com/
- University of California, Irvine. "UCI Machine Learning Repository." https://archive.ics.uci.edu/
- The World Bank. "World Bank Open Data." https://data.worldbank.org/
- U.S. General Services Administration. "Data.gov." https://www.data.gov/
- Kaggle. "Kaggle Datasets." https://www.kaggle.com/datasets
- data.world. https://data.world/
- FiveThirtyEight. "FiveThirtyEight Data." https://data.fivethirtyeight.com/
- United Nations. "UN Data." https://data.un.org/
- National Oceanic and Atmospheric Administration. "NOAA Climate Data." https://www.ncdc.noaa.gov/data-access/quick-links
- World Health Organization. "WHO Data." https://www.who.int/data/