Example datasets 1 - ices-taf/doc GitHub Wiki
See also: Creating TAF datasets, Bib entries, Example data records (bib entries).
This page was based on using the icesTAF
package version 4.2.0
dated
2023-03-21
.
The resulting TAF analysis created from this example is on GitHub: github.com/ices-taf-dev/wiki-example-1, if you dont want to build up the code yourself by following the example, you can skip to the end and run on your computer by downloading the complete code and running it like this:
# get code
run_dir <- download.analysis("ices-taf-dev/wiki-example-1")
# view downloaded code
browseURL(run_dir)
# run analysis
run.analysis(run_dir)
# view result
browseURL(run_dir)
In this guide
- Creating an empty TAF project
- Adding a local dataset
- Adding a local collection of data files
- Adding a file from a URL
- Adding data via a boot script
Creating an empty TAF project
First we create an empty TAF project, and in this case we will call it
example-1
, and then moving our working directory to this folder. We do
this by running
taf.skeleton("example-1")
setwd("example-1")
resulting in the following:
example-1
¦--boot
¦ °--initial
¦ °--data
¦--data.R
¦--model.R
¦--output.R
°--report.R
Adding a local dataset
First lets save or copy a file from somewhere else on our computer and
put it into the boot\initial\data
folder. Remember, this is the place
where you can bring in files that you are unable to get from other
online sources, such as web services and urls.
In this example we will use a dataset that comes shipped with R called
trees
, and save it as a csv file.
data(trees)
write.taf(trees, dir = "boot/initial/data")
and now your project should look like this:
example-1
¦--boot
¦ °--initial
¦ °--data
¦ °--trees.csv
¦--data.R
¦--model.R
¦--output.R
°--report.R
The way TAF works, is that only data in boot/data
are allowed to be
used by the TAF scripts, and the way that the boot/data
folder is
populated is by creating entries in a file called DATA.bib
, and then
running taf.boot()
.
We will create the DATA.bib
file using the useful draft.data()
function, and as we do so, will add some useful information to document
the dataset we are importing:
draft.data(
data.files = "trees.csv",
data.scripts = NULL,
originator = "Ryan, T. A., Joiner, B. L. and Ryan, B. F. (1976) The Minitab Student Handbook. Duxbury Press.",
title = "Diameter, Height and Volume for Black Cherry Trees",
file = TRUE,
append = FALSE # create a new DATA.bib
)
after running
taf.boot()
## [12:28:57] Boot procedure running...
## Processing DATA.bib
## [12:28:57] * trees.csv
## [12:28:57] Boot procedure done
your project should now look like this:
example-1
¦--boot
¦ ¦--data
¦ ¦ °--trees.csv
¦ ¦--DATA.bib
¦ °--initial
¦ °--data
¦ °--trees.csv
¦--data.R
¦--model.R
¦--output.R
°--report.R
and you will have succesfully save, documented and imported (via running
taf.boot()
) a local dataset into your project.
Adding a local collection of data files
In this example, we will add another dataset, but this time it will be a folder containing several files. You can create this yourself however you like, but the following code will create an example for you
data(trees)
data(cars)
# make the directory we want to write to
mkdir("boot/initial/data/my-collection")
# save files there
write.taf(trees, dir = "boot/initial/data/my-collection")
write.taf(cars, dir = "boot/initial/data/my-collection")
and now your project should look like this:
example-1
¦--boot
¦ ¦--data
¦ ¦ °--trees.csv
¦ ¦--DATA.bib
¦ °--initial
¦ °--data
¦ ¦--my-collection
¦ ¦ ¦--cars.csv
¦ ¦ °--trees.csv
¦ °--trees.csv
¦--data.R
¦--model.R
¦--output.R
°--report.R
Again we document this using draft.data()
to add it to the DATA.bib
file, but this time there are two differences:
- We are adding a new record, so we set
append = TRUE
as we want to add a record to an existing list of records. - We are adding a folder, so we set
source = "folder"
draft.data(
data.files = "my-collection",
data.scripts = NULL,
originator = "R datasets package",
title = "Collection of R data",
source = "folder",
file = TRUE,
append = TRUE # create a new DATA.bib
)
after running
taf.boot()
## [12:28:57] Boot procedure running...
## Processing DATA.bib
## [12:28:57] * trees.csv
## [12:28:57] * my-collection
## [12:28:57] Boot procedure done
your project should now look like this:
example-1
¦--boot
¦ ¦--data
¦ ¦ ¦--my-collection
¦ ¦ ¦ ¦--cars.csv
¦ ¦ ¦ °--trees.csv
¦ ¦ °--trees.csv
¦ ¦--DATA.bib
¦ °--initial
¦ °--data
¦ ¦--my-collection
¦ ¦ ¦--cars.csv
¦ ¦ °--trees.csv
¦ °--trees.csv
¦--data.R
¦--model.R
¦--output.R
°--report.R
and you will have succesfully save, documented and imported (via running
taf.boot()
) a local dataset, and a local collection of data files into
your project.
Adding a file from a URL
So far we have been importing local datasets and files, and so the
boot/initial/data
folder is identical to the boot/data
folder, and
you may be wondering, ‘what is the point?’. An important purpose of this
step is to add documentation to a dataset, to add provenance, so that
every dataset in boot/data
has a coresponding record in DATA.bib
.
But this step does not just copy files from one place to the other. It
can also fetch data and files from other locations. In this example, we
get a file from an URL, in this case, a raster file of sea surface
temperatures from the UK metoffice
www.metoffice.gov.uk/hadobs/hadsst4/,
and we can create the entry in the DATA.bib
file, again using
draft.data()
.
draft.data(
data.files = "HadSST.4.0.1.0_median.nc",
data.scripts = NULL,
originator = "UK MET office",
title = "Met Office Hadley Centre observations datasets",
year = 2022,
source = "https://www.metoffice.gov.uk/hadobs/hadsst4/data/netcdf/HadSST.4.0.1.0_median.nc",
file = TRUE,
append = TRUE
)
after running
taf.boot()
## [12:28:57] Boot procedure running...
## Processing DATA.bib
## [12:28:57] * trees.csv
## [12:28:57] * my-collection
## [12:28:57] * HadSST.4.0.1.0_median.nc
## [12:28:58] Boot procedure done
your project should now look like this:
example-1
¦--boot
¦ ¦--data
¦ ¦ ¦--HadSST.4.0.1.0_median.nc
¦ ¦ ¦--my-collection
¦ ¦ ¦ ¦--cars.csv
¦ ¦ ¦ °--trees.csv
¦ ¦ °--trees.csv
¦ ¦--DATA.bib
¦ °--initial
¦ °--data
¦ ¦--my-collection
¦ ¦ ¦--cars.csv
¦ ¦ °--trees.csv
¦ °--trees.csv
¦--data.R
¦--model.R
¦--output.R
°--report.R
and you will have succesfully downloaded, documented and imported (via
running taf.boot()
) a dataset from a URL, and you will notice that the
boot/data
folder now contains more than what is in
boot/initial/data
.
Adding data via a boot script
Sometimes it is not possible to download a dataset from a single url,
and it requires multiple steps to fetch and process data from an online
source. This is common with data accessed via web services. In this
example we create a script where we write a short recipe of how to get
our data, and then register this in the DATA.bib
file. Firstly you
need to write a script, and for this example the following code will do
that for you:
cat('library(icesTAF)
library(sf)
download(
"https://gis.ices.dk/shapefiles/OSPAR_Subregions.zip"
)
unzip("OSPAR_Subregions.zip")
unlink("OSPAR_Subregions.zip")
areas <- st_read("OSPAR_subregions_20160418_3857.shp")
# write as csv
st_write(
areas, "ospar-areas.csv",
layer_options = "GEOMETRY=AS_WKT"
)
unlink(dir(pattern = "OSPAR_subregions_20160418_3857"))
',
file = "boot/ospar-areas.R"
)
bootsrap scripts such as this one, goes in the boot
folder,
example-1
¦--boot
¦ ¦--data
¦ ¦ ¦--HadSST.4.0.1.0_median.nc
¦ ¦ ¦--my-collection
¦ ¦ ¦ ¦--cars.csv
¦ ¦ ¦ °--trees.csv
¦ ¦ °--trees.csv
¦ ¦--DATA.bib
¦ ¦--initial
¦ ¦ °--data
¦ ¦ ¦--my-collection
¦ ¦ ¦ ¦--cars.csv
¦ ¦ ¦ °--trees.csv
¦ ¦ °--trees.csv
¦ °--ospar-areas.R
¦--data.R
¦--model.R
¦--output.R
°--report.R
and are decsribed in more detail in Bib entries. We then
document it and add an entry to the DATA.bib
however there are to
differences to note:
draft.data(
data.files = NULL,
data.scripts = "ospar-areas.R",
originator = "OSPAR",
title = "OSPAR areas",
file = TRUE,
append = TRUE
)
and after running
taf.boot()
## [12:28:58] Boot procedure running...
## Processing DATA.bib
## [12:28:58] * trees.csv
## [12:28:58] * my-collection
## [12:28:58] * HadSST.4.0.1.0_median.nc
## Skipping download of 'HadSST.4.0.1.0_median.nc' (already in place).
## [12:28:58] * ospar-areas
## Reading layer `OSPAR_subregions_20160418_3857' from data source
## `D:\TAF\templates-examples\doc.wiki\example-1\boot\data\ospar-areas\OSPAR_subregions_20160418_3857.shp' using driver `ESRI Shapefile'
## Simple feature collection with 50 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -4898295 ymin: 4300621 xmax: 5677353 ymax: 30240970
## Projected CRS: WGS 84 / Pseudo-Mercator
## Writing layer `ospar-areas' to data source `ospar-areas.csv' using driver `CSV'
## options: GEOMETRY=AS_WKT
## Writing 50 features with 4 fields and geometry type Multi Polygon.
## [12:29:01] Boot procedure done
your project should now look like this:
example-1
¦--boot
¦ ¦--data
¦ ¦ ¦--HadSST.4.0.1.0_median.nc
¦ ¦ ¦--my-collection
¦ ¦ ¦ ¦--cars.csv
¦ ¦ ¦ °--trees.csv
¦ ¦ ¦--ospar-areas
¦ ¦ ¦ ¦--DISCLAIMER_GIS.txt
¦ ¦ ¦ °--ospar-areas.csv
¦ ¦ °--trees.csv
¦ ¦--DATA.bib
¦ ¦--initial
¦ ¦ °--data
¦ ¦ ¦--my-collection
¦ ¦ ¦ ¦--cars.csv
¦ ¦ ¦ °--trees.csv
¦ ¦ °--trees.csv
¦ °--ospar-areas.R
¦--data.R
¦--model.R
¦--output.R
°--report.R
Now we have a folder with the same name as the script we created, and inside this is any files created by the script.