Data Science Success Stories from Science and Engineering - rmcgranaghan/data_science_tools_and_resources GitHub Wiki
Data Science Success Stories from Science and Engineering
This page contains a running list of 'home runs' (i.e., examples of success stories) in which data science has been applied in the sciences and engineering (and knowledge/progress was created that would not have been possible without data science). If you would like to contribute examples to this page, send me a request to be a collaborator! These home runs are particularly effective in helping change the culture and increase the adoption of these important methods. This is meant to be a resource for everyone to utilize to better communicate and creat change.
The Characteristics of a home run are:
- They are extensible and have been extended
- Create open data/software/hardware that are usable
- They create a new community that is sustained beyond project lifetime
- They advance the state-of-the-art
- They push the limit of the intersection of physics/prior understanding and machine learning
What this page is NOT: a list of references to published papers. All home runs need speak to how they address the criteria above.
In the near-Earth space environment
- Prediction of ionospheric scintillation
- McGranaghan et al., [2018]: data wrangled large volume of ionospheric scintillation observations, aligned with solar wind and geomagnetic activity data to develop a predictive algorithm that advanced the state-of-the-art (measured by evaluative metric that can serve as a foundation on which future efforts can build)
- Two consecutive NASA Frontier Development Laboratory teams have advanced this research
- Defense Meteorological Satellite Program Magnetometer (SSM) instrument, data products, discovery, and open source
- Relevant reference: Kilcommons et al.; 2018
- Method to reprocess DMSP SSM data with greater accuracy, curated into usable database with `added-value' products, and from which frontier scientific discovery is demonstrated
- Work can be used and extended by the community by contributing all data products to the NASA CDAWeb Virtual Observatory and making the source code completely open
- The data products now available are essential to organize space weather activity and are often needed in all studies, but seldom available prior to this work without considerable effort.
- The Magnetospheric State Query System
- Relevant reference: Fung and Shao; 2008
- 30 years of solar wind and geomagnetic activity data used to show that magnetospheric state can be specified by a state vector
- Work can be used and extended by the community by creating a digital resource (i.e., The Magnetospheric State Query System) based on the results and that provides access to the wealth of data used for the project
- Magnetospheric state is critical to Heliophysics understanding and is a prerequisite to detailed specification of the near Earth space environment
- SunPy
- Magnetospheric multiscale mission (MMS) instrument completion
- Aurorasaurus
Heliophysics and Planetary Science
- Mars dust devil identification (see slide 32 here)
- Helioviewer.org
Earth Science
- Advanced Information Systems Technology (AIST) Program
- Contains strong data science and machine learning use cases, individuals and communities leading the way, and new ideas (2016 funded AIST programs and details of the projects and individuals from the Earth Science Technology Office Portfolio)
- OceanWorks
- PanGEO
- As described on their homepage "Pangeo is first and foremost a community of people working collaboratively to develop software and infrastructure to enable Big Data geoscience research...includ(-ing) interconnected software package and deployments of this software in cloud and high-performance-computing environments. Such a deployment is sometimes referred to as a Pangeo Environment."
- This group provides a useful set of ML workflow examples
- See a recent commentary on Pangeo and Jupyter by Lindsey Heagy and Fernando Pérez at Project Jupyter Blog
- Awesome Earth AI: A curated list of tutorials, notebooks, software, datasets, courses, books, video lectures and papers specifically for Artificial Intelligence (AI) use cases in Earth Science.