Data cleaning - gbif-norway/documentation GitHub Wiki
dwclean
dwclean
is a command line tool that cleans, validates and enhances Darwin Core CSV/TSV files.
The original version was tightly tailored to the specific needs of the Norwegian GBIF node at the time, but as source data quality has improved over time we were able to slowly get rid of the code dealing with extreme edge cases. In 2018 the large dwclean
script was cleaned up and split into a command line tool, a library and plugins handling the actual work of cleaning and validating data. Some of the edge case handling (such as guessing MGRS grid zone designators based on square identifiers) still lives on in the MUSIT plugin.