Data cleaning - gbif-norway/documentation GitHub Wiki

dwclean

dwclean is a command line tool that cleans, validates and enhances Darwin Core CSV/TSV files.

The original version was tightly tailored to the specific needs of the Norwegian GBIF node at the time, but as source data quality has improved over time we were able to slowly get rid of the code dealing with extreme edge cases. In 2018 the large dwclean script was cleaned up and split into a command line tool, a library and plugins handling the actual work of cleaning and validating data. Some of the edge case handling (such as guessing MGRS grid zone designators based on square identifiers) still lives on in the MUSIT plugin.