Example datasets 1 - ices-taf/doc GitHub Wiki

This page was based on using the icesTAF package version 4.2.0 dated 2023-03-21.

The resulting TAF analysis created from this example is on GitHub: github.com/ices-taf-dev/wiki-example-1, if you dont want to build up the code yourself by following the example, you can skip to the end and run on your computer by downloading the complete code and running it like this:

# get code
run_dir <- download.analysis("ices-taf-dev/wiki-example-1")
# view downloaded code
browseURL(run_dir)

# run analysis
run.analysis(run_dir)
# view result
browseURL(run_dir)

Creating an empty TAF project

First we create an empty TAF project, and in this case we will call it example-1, and then moving our working directory to this folder. We do this by running

taf.skeleton("example-1")
setwd("example-1")

resulting in the following:

                 
 example-1       
  ¦--boot        
  ¦   °--initial 
  ¦       °--data
  ¦--data.R      
  ¦--model.R     
  ¦--output.R    
  °--report.R

Adding a local dataset

First lets save or copy a file from somewhere else on our computer and put it into the boot\initial\data folder. Remember, this is the place where you can bring in files that you are unable to get from other online sources, such as web services and urls.

In this example we will use a dataset that comes shipped with R called trees, and save it as a csv file.

data(trees)
write.taf(trees, dir = "boot/initial/data")

and now your project should look like this:

                          
 example-1                
  ¦--boot                 
  ¦   °--initial          
  ¦       °--data         
  ¦           °--trees.csv
  ¦--data.R               
  ¦--model.R              
  ¦--output.R             
  °--report.R

The way TAF works, is that only data in boot/data are allowed to be used by the TAF scripts, and the way that the boot/data folder is populated is by creating entries in a file called DATA.bib, and then running taf.boot().

We will create the DATA.bib file using the useful draft.data() function, and as we do so, will add some useful information to document the dataset we are importing:

draft.data(
  data.files = "trees.csv",
  data.scripts = NULL,
  originator = "Ryan, T. A., Joiner, B. L. and Ryan, B. F. (1976) The Minitab Student Handbook. Duxbury Press.",
  title = "Diameter, Height and Volume for Black Cherry Trees",
  file = TRUE,
  append = FALSE # create a new DATA.bib
)

after running

taf.boot()

## [12:28:57] Boot procedure running...

## Processing DATA.bib

## [12:28:57] * trees.csv

## [12:28:57] Boot procedure done

your project should now look like this:

                          
 example-1                
  ¦--boot                 
  ¦   ¦--data             
  ¦   ¦   °--trees.csv    
  ¦   ¦--DATA.bib         
  ¦   °--initial          
  ¦       °--data         
  ¦           °--trees.csv
  ¦--data.R               
  ¦--model.R              
  ¦--output.R             
  °--report.R

and you will have succesfully save, documented and imported (via running taf.boot()) a local dataset into your project.

Adding a local collection of data files

In this example, we will add another dataset, but this time it will be a folder containing several files. You can create this yourself however you like, but the following code will create an example for you

data(trees)
data(cars)
# make the directory we want to write to
mkdir("boot/initial/data/my-collection")
# save files there
write.taf(trees, dir = "boot/initial/data/my-collection")
write.taf(cars, dir = "boot/initial/data/my-collection")

and now your project should look like this:

                              
 example-1                    
  ¦--boot                     
  ¦   ¦--data                 
  ¦   ¦   °--trees.csv        
  ¦   ¦--DATA.bib             
  ¦   °--initial              
  ¦       °--data             
  ¦           ¦--my-collection
  ¦           ¦   ¦--cars.csv 
  ¦           ¦   °--trees.csv
  ¦           °--trees.csv    
  ¦--data.R                   
  ¦--model.R                  
  ¦--output.R                 
  °--report.R

Again we document this using draft.data() to add it to the DATA.bib file, but this time there are two differences:

We are adding a new record, so we set append = TRUE as we want to add a record to an existing list of records.
We are adding a folder, so we set source = "folder"

draft.data(
  data.files = "my-collection",
  data.scripts = NULL,
  originator = "R datasets package",
  title = "Collection of R data",
  source = "folder",
  file = TRUE,
  append = TRUE # create a new DATA.bib
)

after running

taf.boot()

## [12:28:57] Boot procedure running...

## Processing DATA.bib

## [12:28:57] * trees.csv

## [12:28:57] * my-collection

## [12:28:57] Boot procedure done

your project should now look like this:

                              
 example-1                    
  ¦--boot                     
  ¦   ¦--data                 
  ¦   ¦   ¦--my-collection    
  ¦   ¦   ¦   ¦--cars.csv     
  ¦   ¦   ¦   °--trees.csv    
  ¦   ¦   °--trees.csv        
  ¦   ¦--DATA.bib             
  ¦   °--initial              
  ¦       °--data             
  ¦           ¦--my-collection
  ¦           ¦   ¦--cars.csv 
  ¦           ¦   °--trees.csv
  ¦           °--trees.csv    
  ¦--data.R                   
  ¦--model.R                  
  ¦--output.R                 
  °--report.R

and you will have succesfully save, documented and imported (via running taf.boot()) a local dataset, and a local collection of data files into your project.

Adding a file from a URL

So far we have been importing local datasets and files, and so the boot/initial/data folder is identical to the boot/data folder, and you may be wondering, ‘what is the point?’. An important purpose of this step is to add documentation to a dataset, to add provenance, so that every dataset in boot/data has a coresponding record in DATA.bib.

But this step does not just copy files from one place to the other. It can also fetch data and files from other locations. In this example, we get a file from an URL, in this case, a raster file of sea surface temperatures from the UK metoffice www.metoffice.gov.uk/hadobs/hadsst4/, and we can create the entry in the DATA.bib file, again using draft.data().

draft.data(
  data.files = "HadSST.4.0.1.0_median.nc",
  data.scripts = NULL,
  originator = "UK MET office",
  title = "Met Office Hadley Centre observations datasets",
  year = 2022,
  source = "https://www.metoffice.gov.uk/hadobs/hadsst4/data/netcdf/HadSST.4.0.1.0_median.nc",
  file = TRUE,
  append = TRUE
)

after running

taf.boot()

## [12:28:57] Boot procedure running...

## Processing DATA.bib

## [12:28:57] * trees.csv

## [12:28:57] * my-collection

## [12:28:57] * HadSST.4.0.1.0_median.nc

## [12:28:58] Boot procedure done

your project should now look like this:

                                     
 example-1                           
  ¦--boot                            
  ¦   ¦--data                        
  ¦   ¦   ¦--HadSST.4.0.1.0_median.nc
  ¦   ¦   ¦--my-collection           
  ¦   ¦   ¦   ¦--cars.csv            
  ¦   ¦   ¦   °--trees.csv           
  ¦   ¦   °--trees.csv               
  ¦   ¦--DATA.bib                    
  ¦   °--initial                     
  ¦       °--data                    
  ¦           ¦--my-collection       
  ¦           ¦   ¦--cars.csv        
  ¦           ¦   °--trees.csv       
  ¦           °--trees.csv           
  ¦--data.R                          
  ¦--model.R                         
  ¦--output.R                        
  °--report.R

and you will have succesfully downloaded, documented and imported (via running taf.boot()) a dataset from a URL, and you will notice that the boot/data folder now contains more than what is in boot/initial/data.

Adding data via a boot script

Sometimes it is not possible to download a dataset from a single url, and it requires multiple steps to fetch and process data from an online source. This is common with data accessed via web services. In this example we create a script where we write a short recipe of how to get our data, and then register this in the DATA.bib file. Firstly you need to write a script, and for this example the following code will do that for you:

cat('library(icesTAF)
library(sf)

download(
  "https://gis.ices.dk/shapefiles/OSPAR_Subregions.zip"
)

unzip("OSPAR_Subregions.zip")
unlink("OSPAR_Subregions.zip")

areas <- st_read("OSPAR_subregions_20160418_3857.shp")

# write as csv
st_write(
  areas, "ospar-areas.csv",
  layer_options = "GEOMETRY=AS_WKT"
)

unlink(dir(pattern = "OSPAR_subregions_20160418_3857"))
',
  file = "boot/ospar-areas.R"
)

bootsrap scripts such as this one, goes in the boot folder,

                                     
 example-1                           
  ¦--boot                            
  ¦   ¦--data                        
  ¦   ¦   ¦--HadSST.4.0.1.0_median.nc
  ¦   ¦   ¦--my-collection           
  ¦   ¦   ¦   ¦--cars.csv            
  ¦   ¦   ¦   °--trees.csv           
  ¦   ¦   °--trees.csv               
  ¦   ¦--DATA.bib                    
  ¦   ¦--initial                     
  ¦   ¦   °--data                    
  ¦   ¦       ¦--my-collection       
  ¦   ¦       ¦   ¦--cars.csv        
  ¦   ¦       ¦   °--trees.csv       
  ¦   ¦       °--trees.csv           
  ¦   °--ospar-areas.R               
  ¦--data.R                          
  ¦--model.R                         
  ¦--output.R                        
  °--report.R

and are decsribed in more detail in Bib entries. We then document it and add an entry to the DATA.bib however there are to differences to note:

draft.data(
  data.files = NULL,
  data.scripts = "ospar-areas.R",
  originator = "OSPAR",
  title = "OSPAR areas",
  file = TRUE,
  append = TRUE
)

and after running

taf.boot()

## [12:28:58] Boot procedure running...

## Processing DATA.bib

## [12:28:58] * trees.csv

## [12:28:58] * my-collection

## [12:28:58] * HadSST.4.0.1.0_median.nc

##   Skipping download of 'HadSST.4.0.1.0_median.nc' (already in place).

## [12:28:58] * ospar-areas

## Reading layer `OSPAR_subregions_20160418_3857' from data source 
##   `D:\TAF\templates-examples\doc.wiki\example-1\boot\data\ospar-areas\OSPAR_subregions_20160418_3857.shp' using driver `ESRI Shapefile'
## Simple feature collection with 50 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -4898295 ymin: 4300621 xmax: 5677353 ymax: 30240970
## Projected CRS: WGS 84 / Pseudo-Mercator
## Writing layer `ospar-areas' to data source `ospar-areas.csv' using driver `CSV'
## options:        GEOMETRY=AS_WKT 
## Writing 50 features with 4 fields and geometry type Multi Polygon.

## [12:29:01] Boot procedure done

your project should now look like this:

                                     
 example-1                           
  ¦--boot                            
  ¦   ¦--data                        
  ¦   ¦   ¦--HadSST.4.0.1.0_median.nc
  ¦   ¦   ¦--my-collection           
  ¦   ¦   ¦   ¦--cars.csv            
  ¦   ¦   ¦   °--trees.csv           
  ¦   ¦   ¦--ospar-areas             
  ¦   ¦   ¦   ¦--DISCLAIMER_GIS.txt  
  ¦   ¦   ¦   °--ospar-areas.csv     
  ¦   ¦   °--trees.csv               
  ¦   ¦--DATA.bib                    
  ¦   ¦--initial                     
  ¦   ¦   °--data                    
  ¦   ¦       ¦--my-collection       
  ¦   ¦       ¦   ¦--cars.csv        
  ¦   ¦       ¦   °--trees.csv       
  ¦   ¦       °--trees.csv           
  ¦   °--ospar-areas.R               
  ¦--data.R                          
  ¦--model.R                         
  ¦--output.R                        
  °--report.R

Now we have a folder with the same name as the script we created, and inside this is any files created by the script.