Dataset Global Climate Data - Rostlab/DM_CS_WS_2016-17 GitHub Wiki

Dataset Global Climate Data

  • Proposer: Vasileios Magioglou - @magiob - [email protected]
  • Members
    1. @magiob
    2. @SinghAbhilasha
    3. @ashwarypande

Summary

The Daily Global Climate Dataset contains meteorological element from 1763 to 2016 worldwide. It is provided by the US National Oceanic and Atmospheric Administration. We are all aware of the climate change and its effects in the environment and our lives (http://climate.nasa.gov/). Average temperature of the planet is rising. It would be interesting to explore this claim along with many others using this dataset. It provides the opportunity to make use of scientific data from trustworthy sources to investigate the effects across the years in multiple locations.

Prediction Goals

  • Predict climate behavior the upcoming years
  • Combine with other sources to predict physical disasters and impact on human life and health
  • Predict clean water supply based on precipitation
  • Droughts in the warmer zones of the planet
  • Make use of the emissions and pollution statistics to correlate the with climate data changes
  • Identify areas affected the most from the climate change
  • Explore the impact on areas which took measures to protect environment / micro-climate
  • Correlate health problems with the increased temperatures and other elements

Weekly Progress

  • Week 01 (W46-Nov16) Global Climate Dataset -- Main findings:

    • Identified feature types
    • Spotted several missing values, clustering of values and clustering towards recent years
    • Some missing values can cause faulty data
  • Week 02 (W47 Nov16) Global Climate Dataset -- Main findings:

    • PRCP, Tmin, Tmax patterns, for Delhi and merged Indian cities
    • PCA for dataset of Delhi
    • WB %missing values, visualization of features
  • Week 03 (W48 Nov30) Global Climate Dataset -- Main findings:

    • Merging of GCD and WB - interpolation to missing values
    • Analysis of Tmin, Tmax, PRCP for Mumbai, Bangalore and Chennai
    • Correlation of GCD and WB features
  • Week 04 05 (W50 Dec14) Global Climate Dataset -- Main findings:

    • Analysis of Austria data
    • Merging of all data related to Austria into one dataset
    • Correlation analysis
  • Week 06 (W51 Dec22) Global Climate Dataset -- Main findings:

    • Analysis of core environmental data
    • Correlation of Austria data with aggregated data and distinct data of neighboring countries
    • Show who is contributing more to Austria's pollution
  • Week 07 (W52 Jan11) Global Climate Dataset -- Main findings:

    • Detection of biggest contributors-countries and highest emissions
    • Correlation of local temperature with global emissions
    • Setup of prediction model
  • Week 08 (W53 Jan18) Global Climate Dataset -- Main findings:

    • PCR
    • PLSR
    • Failure of our linear regression model
  • Week 09 (W54 Jan25) Global Climate Dataset -- Main findings:

    • Linear Regression with rolling window for predicting emissions - years
    • PLSR for predicting the average temperature rise - emissions
    • Auto-Regression for time series for predicting temperature - years
  • Week 10 (W05 Feb08) Global Climate Dataset -- Main findings:

    • Biggest contributors [China, USA, Russia, Brazil, Germany, India], sources and reasons of emissions
    • Emissions rise more than 40 times by 2067, lifetime effect of emissions in atmosphere
    • Temperature rise by ~1 degrees Celsius by 2067

Long Description

Summary

  • Size: maximum 2.8 GB (much bigger if hourly dataset is chosen)
  • Format: .csv or .txt
  • Source: NOAA (National Oceanic and Atmospheric Administration)

==================================================================== There are a lot of sources where someone can find evidence, information about causes and effects of climate change. NASA analyzes the problem in a compact easy to understand way (http://climate.nasa.gov/). It is interesting to see how these causes and effects are expressed through real life data.

The dataset "Global Historical Climatological Network - Daily" (GHCN-D) provided by NOAA consists of surface observations and climate records from around the world from 1763 to 2016. The dataset includes observations from World Meteorological Organization, Cooperative, and CoCoRaHS networks. The archive includes over 40 meteorological elements including:

  • temperature daily maximum/minimum
  • temperature at observation time
  • precipitation
  • snowfall
  • snow depth
  • evaporation
  • wind movement
  • wind maximums
  • soil temperature
  • cloudiness, and more

A sample of data is available.
Containing observations of one or more of the above elements at more than 100,000 stations that are distributed across all continents, the dataset is the world's largest collection of daily climatological data. Besides the meteorological data, there are also available measurement, quality and source flags which characterize the observations.

Full documentation of the dataset can be found here.

====================================================================

Challenges

  • Not all data are available for all years
  • Not all data are available for all locations

Links / Data / Other

Reference

Klein Tank, A.M.G. and Coauthors, 2002. Daily dataset of 20th-century surface air temperature and precipitation series for the European Climate Assessment. Int. J. of Climatol., 22, 1441-1453.