Practicals7 - nielshintzen/vmstools GitHub Wiki
Outputting data to ICES is an important deliverable for member states that need to adhere to the ICES datacall. Every year, around January, ICES sends out a request to supply VMS and logbook data. This data goes into a database, fishframe, and requires a certain format. The workflow below links many steps from P1 to P4 to get to an appropriate product for the ICES datacall.
The steps we will go through are: (please note that this is a suggested workflow, as the ICES call may vary from year to year, this is not a perfect template at all!)
- Cleaning the tacsat and eflalo data
- Merging eflalo to tacsat to get some info on gears used assigned to tacsat
- Define activity
- Assign euro's and kilo's from eflalo to tacsat
- Assign some lenght categories
- Assign some temporal and spatial references
- Aggregate data at the correct detail level in fishframe format
Now onto processing the data
# 1. Cleaning the tacsat data
# We've done this before, so we go a bit quick over the next steps (see practical 2 for more info on cleaning)
data(tacsat)
#Remove points that are not possible, not unique, on land, in harbour or have an interval rate < 5 minutes
idx <- which(abs(tacsat$SI_LATI) > 90 | abs(tacsat$SI_LONG) > 180)
idx <- unique(c(idx,which(tacsat$SI_HE < 0 | tacsat$SI_HE > 360)))
tacsat$SI_DATIM <- as.POSIXct(paste(tacsat$SI_DATE,tacsat$SI_TIME,sep=" "),
tz="GMT", format="%d/%m/%Y %H:%M") #create one date-time stamp
#Check whether there are any values in idx
length(idx)
#In this instance there are not any but if there were you would need to remove those rows, e.g. tacsat <- tacsat[-idx,]
uniqueTacsat <- paste(tacsat$VE_REF,tacsat$SI_LATI,tacsat$SI_LONG,tacsat$SI_DATIM)
tacsat <- tacsat[!duplicated(uniqueTacsat),] #get rid of the duplicates
data(europa) #a dataset already build into the package with the coastlines embedded
pols <- europa
idx <- pointOnLand(tacsat,pols);
pol <- tacsat[which(idx==1),] #points on land
tacsat <- tacsat[which(idx==0),] #tacsat not on land
data(harbours)
idx <- pointInHarbour(tacsat$SI_LONG,tacsat$SI_LATI,harbours,saveHarbourList=F)
pih <- tacsat[which(idx==1),]
tacsat <- tacsat[which(idx==0),]
tacsat <- sortTacsat(tacsat) #sort the data by vessel and time
tacsatp <- intervalTacsat(tacsat,level="vessel",fill.na=T)
intThres<- 5
tacsat <- tacsatp[which(tacsatp$INTV > intThres),]
# 1. Cleaning the eflalo data
data(eflalo)
eflalop <- eflalo
eflalop$FT_DDATIM <- as.POSIXct(paste(eflalo$FT_DDAT, eflalo$FT_DTIME, sep=" "), tz="GMT", format="%d/%m/%Y %H:%M") #turn date and time into one date-time stamp
eflalop$FT_LDATIM <- as.POSIXct(paste(eflalo$FT_LDAT, eflalo$FT_LTIME, sep=" "), tz="GMT", format="%d/%m/%Y %H:%M") #turn date and time into one date-time stamp
idx <- which(eflalop$FT_LDATIM >= eflalop$FT_DDATIM)
eflalo <- eflalo[idx,]
The cleaning should be a breeze by now. Note that there may be more points to check for, but that is really for you to decide as you should hopefully be familiar with the data Now let's move on to linking VMS and Logbooks and defining activity
# 2. Merging eflalo to tacsat (see practical 3 for more info)
tacsatp <- mergeEflalo2Tacsat(eflalo,tacsat)
tacsatp$LE_GEAR <- eflalo$LE_GEAR[ match(tacsatp$FT_REF,eflalo$FT_REF)]
tacsatp$LE_MSZ <- eflalo$LE_MSZ[ match(tacsatp$FT_REF,eflalo$FT_REF)]
tacsatp$VE_LEN <- eflalo$VE_LEN[ match(tacsatp$FT_REF,eflalo$FT_REF)]
tacsatp$VE_KW <- eflalo$VE_KW[ match(tacsatp$FT_REF,eflalo$FT_REF)]
tacsatp$LE_RECT <- eflalo$LE_RECT[ match(tacsatp$FT_REF,eflalo$FT_REF)]
tacsatp$LE_MET <- eflalo$LE_MET[ match(tacsatp$FT_REF,eflalo$FT_REF)]
tacsatp$LE_WIDTH <- eflalo$LE_WIDTH[match(tacsatp$FT_REF,eflalo$FT_REF)]
tacsatp$VE_FLT <- eflalo$VE_FLT[ match(tacsatp$FT_REF,eflalo$FT_REF)]
tacsatp$LE_CDAT <- eflalo$LE_CDAT[ match(tacsatp$FT_REF,eflalo$FT_REF)]
tacsatp <- subset(tacsatp,FT_REF != 0)
# 3. Define activity (here for simplicity reasons, we only select beamtrawling, see practical 3 for more info)
tacsatp <- subset(tacsatp,LE_GEAR == "TBB")
eflalo <- subset(eflalo,LE_GEAR == "TBB")
storeScheme <- expand.grid(years = 1800:1801, months = 0,weeks = 0,
analyse.by = unique(tacsatp[,"LE_GEAR"]))
storeScheme$peaks <- NA
storeScheme$means <- NA
storeScheme$fixPeaks <- FALSE
storeScheme$sigma0 <- 0.911
storeScheme$means[which(storeScheme$analyse.by == "TBB")] <- c("-11.5 -6 0 6 11.5")
acTa <- activityTacsat(tacsatp,units="year",analyse.by="LE_GEAR",
storeScheme=storeScheme,plot=FALSE,level="all")
tacsatp$SI_STATE <- acTa
tacsatp$SI_STATE[which(tacsatp$SI_STATE != "f")] <- 0
tacsatp$SI_STATE[which(tacsatp$SI_STATE == "f")] <- 1
As said before, we only analysed a subset of gears. We created our own run schedule for that. That is a powerful way of setting up the scenario's you wish to evaluate. Now we will dispatch the kilo's and euro's from our logbook data to the VMS pings
# 4. Assign euros and kilos from eflalo to tacsat (see practical 4 for more info)
idx <- kgeur(colnames(eflalo)) #We get the columns that contain either LE_KG or LE_EURO values
eflalo$LE_KGS <- rowSums(eflalo[,grep("LE_KG_",colnames(eflalo))],na.rm=T)
eflalo$LE_EUROS <- rowSums(eflalo[,grep("LE_EURO_",colnames(eflalo))],na.rm=T)
#To reduce the dataset a bit, we drop all individual LE_KG_ and LE_EURO_ columns and only work with the totals
eflalo <- eflalo[,-idx]
# 5. Assign length classes to fishing vessels.
#We use the 'cut' function here which assigns the absolute length value to each of the length categories
eflalo$LENGTHCAT <- cut(eflalo$VE_LEN,breaks=c(0,12,15,200))
eflalo$LENGTHCAT <- ac(eflalo$LENGTHCAT)
#We overwrite the length category with a value that is easier to read
eflalo$LENGTHCAT[which(eflalo$LENGTHCAT == "(0,12]")] <- "<12"
eflalo$LENGTHCAT[which(eflalo$LENGTHCAT == "(12,15]")] <- "12-15"
eflalo$LENGTHCAT[which(eflalo$LENGTHCAT == "(15,200]")] <- ">15"
#We split eflalo into two sets: one with only those eflalo records that have VMS associated per trip and one with all remaining trips (without VMS, e.g. small vessels)
eflalo$FT_DDATIM <- as.POSIXct(paste(eflalo$FT_DDAT, eflalo$FT_DTIME, sep=" "), tz="GMT", format="%d/%m/%Y %H:%M") #turn date and time into one date-time stamp
eflalo$FT_LDATIM <- as.POSIXct(paste(eflalo$FT_LDAT, eflalo$FT_LTIME, sep=" "), tz="GMT", format="%d/%m/%Y %H:%M") #turn date and time into one date-time stamp
eflaloM <- subset(eflalo, FT_REF %in% unique(tacsatp$FT_REF))
eflaloNM <- subset(eflalo,!FT_REF %in% unique(tacsatp$FT_REF))
# Split the euro's and kg's over the pings
tacsatEflalo <- splitAmongPings(tacsat=tacsatp,eflalo=eflaloM,variable="all",level="day",conserve=T)
Finally, ICES uses the CSquare notation which we add including some other spatial-temporal aspects of the data. We use these settings to aggregate at the right scale later on
# 6. Assign some temporal and spatial references
data(ICESareas)
tacsatEflalo <- subset(tacsatEflalo,ICESareas@data$Area_27[anf(ICESarea(tacsatEflalo,as(ICESareas,"SpatialPolygons")))+1] %in% c("4.a","4.b","4.c"))
tacsatEflalo$Csquare <- CSquare(tacsatEflalo$SI_LONG,tacsatEflalo$SI_LATI,degrees=0.05)
tacsatEflalo$Year <- year(tacsatEflalo$SI_DATIM)
tacsatEflalo$Quarter <- quarter(tacsatEflalo$SI_DATIM)
tacsatEflalo$Month <- month(tacsatEflalo$SI_DATIM)
tacsatEflalo$Area <- ac(ICESareas@data$Area_27[anf(ICESarea(tacsatEflalo,ICESareas))+1])
tacsatEflalo$kwHour <- tacsatEflalo$VE_KW * tacsatEflalo$INTV/60
eflaloNM$Year <- year(eflaloNM$FT_LDATIM)
eflaloNM$Quarter <- quarter(eflaloNM$FT_LDATIM)
eflaloNM$Month <- month(eflaloNM$FT_LDATIM)
eflaloNM$Area <- "ICESarea" #shortcut to limit converting time between ICES rectangles and ICES areas
eflaloNM$LE_MET <- eflaloNM$LE_MET_level6
eflaloNM$INTV <- difftime(eflaloNM$FT_LDATIM,eflaloNM$FT_DDATIM,units="hours")
eflaloNM$dummy <- 1
res <- aggregate(eflaloNM$dummy,by=list(eflaloNM$FT_REF),FUN=sum,na.rm=T)
colnames(res) <- c("FT_REF","nrRecords")
eflaloNM <- merge(eflaloNM,res,by="FT_REF")
eflaloNM$INTV <- eflaloNM$INTV / eflaloNM$nrRecords
eflaloNM$kwHour <- eflaloNM$VE_KW * eflaloNM$INTV
# 7. Aggregate data at the correct detail level
RecordType <- "CE" #We have to identify the data type
#We first process the first year, and then append the other years
tacsatEflaloTotal <- tacsatEflalo
eflaloNMTotal <- eflaloNM
for(year in c(1800,1801)){
tacsatEflalo <- subset(tacsatEflaloTotal,year(tacsatEflaloTotal$SI_DATIM)==year)
eflaloNM <- subset(eflaloNMTotal,year(eflaloNMTotal$FT_LDATIM)==year)
if(year == 1800){
table1 <- cbind(RT=RecordType,tacsatEflalo[,c("VE_COU","Year","Quarter","Month","Area","Csquare","LE_MET","INTV","kwHour","LE_KGS","LE_EUROS")])
} else {
table1 <- rbind(table1,
cbind(RT=RecordType,tacsatEflalo[,c("VE_COU","Year","Quarter","Month","Area","Csquare","LE_MET","INTV","kwHour","LE_KGS","LE_EUROS")]))
}
table1Save <- aggregate(table1[,c("INTV","kwHour","LE_KGS","LE_EUROS")],
by=as.list(table1[,c("RT","VE_COU","Year","Quarter","Month","Area","Csquare","LE_MET")]),
FUN=sum,na.rm=T)
colnames(table1Save) <- c("RecordType","VesselFlagCountry","Year","Quarter","Month","Area","C-square","Europeanlvl6","FishingHour","kWHour","TotWeight","TotValue")
if(year == 1800){
table2 <- cbind(RT=RecordType,eflaloNM[,c("VE_COU","Year","Quarter","Month","Area","LE_RECT","LE_MET","LENGTHCAT","INTV","kwHour","LE_KGS","LE_EUROS")])
} else {
table2 <- rbind(table2,
cbind(RT=RecordType,eflaloNM[,c("VE_COU","Year","Quarter","Month","Area","LE_RECT","LE_MET","LENGTHCAT","INTV","kwHour","LE_KGS","LE_EUROS")]))
}
table2Save <- aggregate(table2[,c("INTV","kwHour","LE_KGS","LE_EUROS")],
by=as.list(table2[,c("RT","VE_COU","Year","Quarter","Month","Area","LE_RECT","LE_MET","LENGTHCAT")]),
FUN=sum,na.rm=T)
colnames(table2Save) <- c("RecordType","VesselFlagCountry","Year","Quarter","Month","Area","ICESrect","Europeanlvl6","LengthCat","FishingHour","kWHour","TotWeight","TotValue")
}
In some occasions, it can be useful to output VMS or logbook maps not in jpeg or png image formats, but rather in the more GIS friendly shapefile format. Writing a shapefile is not difficult, especially if you use the the routines described under Practical 9 for creating pictures.
A couple of R libraries are crucial for these exercises. These are the 'sp' package, the 'rgdal' and 'rgeos' packages. One of the key persons behind this package is Roger Bivand and he most certainly has made our lives easier!
Now, we'll load some VMS data, calculate how much each ping represents in time (its interval rate) and make a picture. Then we output the same data used in the picture as a shapefile.
data(tacsat)
library(rgdal) # You may need to install this.
tacsat <- intervalTacsat(tacsat,level="vessel",fill.na=TRUE)
tacsat$INTV[tacsat$INTV>230] <- 115 #if some values are missing, the interval rate can become really large, so reduce the interval rate to the average interval value
Here's a bit more information on creating a grid with the createGrid function which allows you to select different types of grid, e.g. gridTopology, SpatialGrid, SpatialPixels, SpatialGridDataFrame etc. We most often use SpatialGrid (and its associate SpatialGridDataFrame). Read the first 2 Chapters of the 'spatial' book which will help you understand everthing about these spatial classes! Then there also is the 'exactBorder' option which ensures that the lower-left corner of your grid is fixed to the minimum x and y value you set, or wether the minimum x and y value are use as midpoitn of the first gridcell being defined.
# Create a grid at which we want to display the results
xrange <- c(-4,8) #define your study area in terms of xrange and yrange
yrange <- c(52,58)
resx <- 0.1 #set the step size of grid cells in the longitudinal direction
resy <- 0.05 #set the step size of grid cells in the latitudinal direction
grd <- createGrid(xrange,yrange,resx,resy,type="SpatialGridDataFrame",exactBorder=TRUE)
#Overlay the tacsat positions with the grid, to see to what gridcell each ping belongs
coords <- SpatialPoints(data.frame(SI_LONG=tacsat$SI_LONG,SI_LATI=tacsat$SI_LATI))
idx <- over(coords,as(grd,"SpatialGrid")) #idx simply indicates to which gridcell each of the VMS pings belongs. As you can see, we need to convert the grd to SpatialGrid first.
tacsat$grID <- idx #grID is the gridcell ID
#Now store the effort (INTV) into the grids dataframe
grd@data[names(table(idx)),1] <- aggregate(tacsat$INTV,by=list(tacsat$grID),FUN=sum,na.rm=T)$x
There are now two options to convert the grid into a format that can be written to shapefile. By hand or the automatic option. We show both here, as the 'by hand' option may be useful at other occasions.
#1) by hand: First take the midpoints of the gridcell, and create polygons by yourself using the lonLat2SpatialPolygons fuction
coordGrd <- coordinates(grd)[an(names(table(idx))),]
grdPols <- lonLat2SpatialPolygons(lst=lapply(as.list(1:nrow(coordGrd)),
function(x){data.frame(SI_LONG=c(coordGrd[x,"s1"]-resx/2,rep(coordGrd[x,"s1"]+resx/2,2),coordGrd[x,"s1"]-resx/2),
SI_LATI=c(rep(coordGrd[x,"s2"]-resy/2,2),rep(coordGrd[x,"s2"]+resy/2,2)))}))
grdPolsDFHand <- as(grdPols,"SpatialPolygonsDataFrame")
#2) automatic:
grdPolsDFAuto <- as(grd,"SpatialPolygonsDataFrame") #hmm, that's easy and simple...
Now the only thing left is convert this file to a shapefile. Thanks to Roger Bivand, we're only a couple of lines of code away! As you can see, it's no longer only a VMStools course, but we are happy to show the best R tools around that work very well together with VMStools.
setwd("D:/") #Set a working directory where you can locate your files
fileName1 <- "VMStoolsExampleShapefile1"
fileName2 <- "VMStoolsExampleShapefile2"
#Remove old shapefiles with the same name
file.remove(paste(fileName,".shp",sep=""));
file.remove(paste(fileName,".shx",sep=""));
file.remove(paste(fileName,".dbf",sep=""));
file.remove(paste(fileName,".prj",sep=""));
#If you want to store some data for each of the grid cells, you can store it into the data.frame. Remember that
#each row corresponds to a specific gridcell. What we often do is store the colour of the gridcells into the shapefile so
#whenever you've created a picture, and someone else want to use the exact same settings, in GIS software, they
#can pick the data column as colour settings.
require(RColorBrewer) #install the nice colour ramps from ColorBrewer if you don't have it yet
colyellowred <- brewer.pal(6,"YlOrRd")
cutbreakval <- c(-1,0,500,1250,2500,5000,10000,3000000)
#Make a choice if you want to take the one done by hand or the automatic conversion
#Make a choice if you want to take the one done by hand or the automatic conversion
# of your data.
#By hand
cols <- c("white",colyellowred)[cut(grd@data[an(names(table(idx))),1],breaks=cutbreakval)]
grdPolsDFHand@data <- data.frame(value=grd@data[an(names(table(idx))),1],color=cols)
proj4string(grdPolsDFHand) <- CRS("+proj=longlat +datum=WGS84 +no_defs")
#set the correct projection settings.
## You can then reproject your data if needed (North Sea = zone 31):
grdPolsDFHand <- spTransform(grdPolsDFHand, "+proj=utm +zone=31 ellps=WGS84")
writeOGR(grdPolsDFHand, dsn = '.', layer = fileName1, driver = "ESRI Shapefile")
#Automatic
cols <- c("white",colyellowred)[cut(grd@data[,1],breaks=cutbreakval)]
grdPolsDFAuto@data <- data.frame(value=grd@data[,1],color=cols)
proj4string(grdPolsDFAuto) <- CRS("+proj=longlat +datum=WGS84 +no_defs") #set the correct projection settings. Often WGS84 is used, an select the correct zone your country has most interest in (North Sea = zone 31)
writeOGR(grdPolsDFAuto, dsn = '.', layer = fileName2, driver = "ESRI Shapefile")
Ok, that was relatively straight forward. We can also display our data in google earth. For that, we need to write the spatialPolygons data to
grdPolsDFKML <- as(grdPols,"SpatialPolygonsDataFrame")
cols <- c("#FFFFFF",colyellowred)[cut(grd@data[an(names(table(idx))),1],breaks=cutbreakval)]
grdPolsDFKML@data <- data.frame(value=grd@data[an(names(table(idx))),1],color=cols)
proj4string(grdPolsDFKML) <- CRS("+proj=longlat +datum=WGS84 +no_defs")
writeOGR(grdPolsDFKML,'VMStools.kml','example', driver = "KML",overwrite_layer=TRUE)
Note: the gdal library contains a function (make_EPSG) that allows you to make a data frame of the now-defunct European Petroleum Survey Group (EPSG) geodetic parameter datasets as distributed with PROJ.4 software. Because finding the correct projection specification is not easy, lists still known as EPSG lists are maintained, and more generally retrieved from Access databases, e.g.
require(rgdal)
EPSG <- make_EPSG()
IRLproj <- EPSG[EPSG$code == 2157 & ! is.na(EPSG$code), "prj4"]
- Go and install QGIS (http://www.qgis.org/) which is free and useful GIS system or go to e.g. arcgis online viewer
- Load one of the shapefiles (either the automated one or the by hand one) into the QGIS system (or alike software)
- In QGIS apply a color scale to the monochrome grid that you should be able to see.
- Can you tell the difference between the automated and the by hand one? They are different! What is causing this?
Note that the idea here is just to show that vmstools need not be used in isolation from more formal GIS, and that many spatial file formats can be handled in R. Furthermore you don't have to pay for good GIS systems either! Udig is another good option by the way http://udig.refractions.net/.