Guide to make the validations - SIB-Colombia/sib-google-spreadsheet-validator GitHub Wiki
#Guide for making the validations# In this page the validations are explained step by step.
1. Validate the data with a copy of the template in blank, ready to include data
With this option you can validate data in a spreadsheet in blank that works like a template where the data is loaded for the validation.
Others sections that can be of interest.
1.1 Copy the template with its scripts
Download the template with the validations , here
In the menu of the template spreadsheet, please select the option File>Make a Copy, to obtain a copy of the spreadsheet that you can fill with data you would like to validate.
1.2 Select language
In the menu, select the language you would like to use for the validation process. By default, script will make validations in English. Once you have selected the language, it is necessary to reload the window/tab of the spreadsheet, to see different options in the language of your choice. Languages>Validations in English/Validations in Spanish
1.3 Reload the spreadsheet
After open the copy in a new tab, reload the page (press F5 in the keyboard or the reload button of your browser). Now you can enter the data records in each row.
1.4 Enter the data records
In the copy, enter the data records that you would like to validate, maximum 1000 records. Note that each row represents a record, and each column represents a field in Darwin Core standard. In the first cell of each column (1A, 1B, etc), enter the exact name of the element to be validated.
Each row with their data represents a record (in the image, the set of records is marked in blue, and an individual record is marked in green).Therefore, each value needs to be in its respective column: for example, if the record has a value data of decimal latitude, it needs to be put in first cell of the column containing the value decimalLatitude.
The name of sheet that have the records to validate, must be ElementsDwC, other name will generate error in the script. You can change the name of the sheet, clicking on it.
Take the total records to validate from row #2 to the last row containing data in the columns of the elements. If in this interval of rows, there is one that does not contain any values, this will be read as an empty record.
Is possible to have any order for the columns of the elements, but only will be validated the first column that has in its first cell, the name of the element to validate, to go through the sheet from left to right.
1.5 Convert the values in the cells to plain text
Format the data records into plain text by selecting all rows with data, including the first row with the element names, and choose the option Format>Number>Plain text in the spreadsheet menu.
1.6 Create the controlled vocabularies sheet
Before executing the validations, you need to create a sheet with controlled vocabularies. In each column of this sheet the controlled vocabulary is associated with the respective element, if this element has been validated with a controlled vocabulary before. In the menu, choose option Validation functions > Generate sheet of controlled vocabularies.
The sheet will be called Controlled vocabularies. This name should not be changed in order to run the validations correctly. Each cell in the columns has a valid value for the element identified in the first cell. If you would like to add a new value for a specific element, enter it below the last cell containing a value in the column with the element name. If you run validations without having generated the sheet for controlled vocabularies, the elements evaluated without controlled vocabulary, will be marked with error (the cell will be colored with gray).
1.7 Running validations
After creating the sheet for the controlled vocabularies, you can run the validations, for maximum 1000 records. Select the first sheet (the sheet with records), then in the menu of the spreadsheet, choose the options Validations Darwin Core> Basic record elements/Biological record elements/Event record elements/Location record elements/Taxon record elements
In some cases, it is necessary to authorize the execution of the scripts.
If in the sheet no exist a column with the name of a element to validate, this will be shown in a message. If there is more than one column with a specific element name in the first cell (repeated element), a message will be displayed, the repeated elements will be marked yellow and the data in these cells will not be validated.
At the end of the validations, a message will be displayed and the cells with wrong values according to the type of validation will be marked red.
Automatically, a third sheet is created, named “Errors in validations”. In this sheet, the number of cells with wrong values for each element will be shown, according to the realized validation. If an element is not validated for a specific type of validation, the corresponding cell will be empty.
1.8 Error marks
According to the type of validation for the value of a cell, if the result is a wrong value, the cell will be marked with a respective color. If validation is for the content (for example empty cell, wrong format), the cell will be colored red. If the validation is for a value dependent of another cell value (for example identifier or code of Institution, collection, catalog number) the cell will be colored orange.
1.9 Correct errors
In order to locate cells with error marks, choose the option Correction functions >Search cell marked with error. Once identified the cells with error marks and corrected the values, you need to remove the error marks before running the validations again: Select Correction functions> Set the sheet without error marks in the menu. This function will mark all cells with a dataset record with a white background, regardless of the cell color before the validations. Repeat the steps from 1.6 until the sheet does not show any error marks.
2. Validate data records in an existing Google spreadsheet
Choose this option in order to validate data previously loaded in a Google spreadsheet.
2.1 Create a Google spreadsheet
Enter the main page of Google drive and create a new spreadsheet. You can choose any name for this spreadsheet, but it is recommendable to choose a name easy to use.
2.2 Change the name of the first sheet of the Google spreadsheet
If you would like to run validations in English, change the default name of the first sheet of the spreadsheet, naming it ElementsDwC. Therefore, click on the name of the sheet, located in left bottom of the sheet, and select Rename.
2.3 Add the script to the spreadsheet
Once you have entered each data value in their respective element in the spreadsheet, and you have changed the name of the first sheet, you add the validation scripts to the spreadsheet. Therefore, you open the Script editor and select in the menu the option Tools > Script editor.
2.3.1 Create a project in Script editor
The script editor will be open in a new tab. A message will show options to create a new project. Select the option Blank project.
The default name of the project is “Untitled project”. You can change the name of the project, clicking on it.
2.3.2 Create the script files
The files in a goggle apps script have the extension .gs and are written in javascript. By default, a new project will be named Code.gs. You should create four more files. In the menu of the script editor, choose File>New>Script File
The names of the files are:
You can delete the file Code.gs or rename it with the name of one of these necessary files.
2.3.3 Select the scripts
You can select the files on the right side of the page. To view and edit the content, you click on the tab with the file name, at the top of the edit field.
2.3.4 Get the scripts
Get the scripts for the validations in this links. If you click on a script name, the code is shown.
2.3.5 Copy and paste the script code
In the edit field, you erase the default code in each file of the project, copy the code of each link file .gs, and paste them in their respective project file. Every time you edit a file, the corresponding tab will be marked with an asterisk. To save the changes, you click on the save button, or you choose,* File > save* in the menu.
2.4 Reopen the spreadsheet
The validation script will create a new button in the spreadsheet menu, but only if the spreadsheet is newly opened. To activate this button in the menu, close the spreadsheet and reopen it in Google drive, or reload the page in your browser where the spreadsheet is shown. Wait for a few seconds and you the following buttons will appear: Validations DarwinCore, Validation functions, Correction functions, and Languages. If you click on these buttons, their corresponding options are shown.
2.5 Enter the data and run validations
Follow from the step 1.4 Enter the records data, to insert data and validate.
3. Validate data records in an existing Microsoft Excel file
With this option, you can validate records data in MS Excel spreadsheet, extension .xls and .xlsx.
3.1 Upload the file
To run validations, it is necessary to upload the MS Excel file to Google drive. In order to upload files, convert the documents to Google docs. At the right top side of the Google drive main page, choose the option* Upload settings >Convert uploaded files*.
To upload a file, click on the upload button on the right side of the Google drive page, and select the file in your computer.
3.2 Open the file
After uploading the file, open it. Click on the title shown in the list “My Drive”. This will open a new tab for the file.
3.3 Enter data and validate
If you want add more data and validate, follow from 1.3 Enter the records data
4. Validate geographic data
Elements like country, state, province and county, are related with elements decimal Latitude, and decimal Longitude. It is expected that there is correspondence between data: the coordinates represent a geographical location near to the geographical place established in a country, state, province or county.
4.1 Validate administrative data
With this validation you can verify that data in the elements country, state, province and county correspond to the administrative data obtained from decimal Latitude, and decimal Longitude, entered in the same record (same row), using the web service of MapQuest. If a value does not correspond to the location retrieved, this will be marked in pale violet. To run this validation, previously validate the records (Validation functions>Validate Darwin Core records), and the values for decimal Latitude, and decimal Longitude. These cells should not be blank, nor with errors marks. Afterwards, go to Validation Functions > Validate administrative data.
4.2 Validate decimal latitude and decimal longitude
In this validation reciprocity between the values in the cells for coordinates (decimalLatitude and decimalLongitude), and the values in the cells for administrative data (country, state, province, county) is verified for a record (same row in the sheet), using the web service of MapQuest. If a coordinate value does not correspond to the coordinates retrieved, the cell will be colored in pale violet. A coordinate is valid, if it has the same sign with the retrieved coordinates, if the digits before the point and the first digit after the point are equal. Thereby, coordinates can be found within a range of 10km and be compared with other coordinates within this range. To run this validation, you need to previously validate the records (Validation functions>Validate Darwin Core records); and the values for decimalLatitude, and decimalLatitude should not be blank, or with errors marks. Afterwards, go to Validation Functions > Validate administrative data.
5. Correction functions
With these functions you can correct errors even more easily.
5.1 Remove anomalous characters in cells
This function removes anomalous characters and extra spaces in cells with text values. To run this function, select a group of cells with the right click of the mouse, and then, choose Correction functions > Replace anomalous characters in cells in the menu.
5.2 Find cell with error mark
This function selects a cell with error mark, to help making the corrections. To run this function, in menu Correction functions > Find cell marked with error. If this is the first time that you run this function, the selected cell will be the leftmost cell at the very top. Next time that you run this function, the selected cell will be the cell below with error mark in the same column, and so on to the last row with data. After this, the selected cell will be in the next column.
5.3 Get coordinates of decimal latitude and decimal longitude
This function sets the values in the cells for the elements decimalLatitude and decimalLongitude, using the web service of MapQuest with the administrative data. These values will be entered in empty cells or with error mark. To obtain more accurate results, it is important to have data in the elements country, state, province and county. If you run this function after running the validation of Darwin Core, only the data in cells without error marks will be used. To run this function, in the menu choose Correction functions > Get decimalLatitude and decimalLongitude coordinates. It is possible that the validation does not result in coordinates, if the record has incongruent or error values (e.g. state and province do not correspond to the country) in administrative data.
5.4 Set the sheet without error marks
After having made corrections in the fields with error marks, having entered the correct values or corrected the errors with the correction functions, you can quit the error marks with this function. In the menu choose Correction functions > Set the sheet without error marks.
Glossary
Darwin Core: Is a set of data standards, with a vocabulary of elements to facilitate the discovery, retrieval, and integration of information about organisms by providing reference definitions, examples, and commentaries.
Google Spreadsheet: Is a web-based application to view, edit and process numeric or alphanumeric data in tables organized in rows and columns, and each table element is called cell.
Script: Program that automates execution of diverse tasks.
About
How to cite this guide
Duque, O., Amariles, D. (2013). Guide to make validations of data in Darwin Core using Google Apps. Bogotá D.C. SIB Colombia. Available on Guide to make the validations
Copyright
The scripts of Validations of data in Darwin Core using Google Apps, and this guide are copyright 2013 of SIB Colombia.
Copyright
The scripts of Validations of data in Darwin Core using Google Apps are open source under license Apache version 2.0. You can obtain a copy Apache license 2.0 Unless that you required by law or written agreement, this software is distributed as such be, without guaranty or condition of any kind, plicit or implicit. Please consult the license for the specific language that governing rights and limits.