Practicals4_2021 - nielshintzen/vmstools GitHub Wiki

Uncertainty in spatial fisheries data

Introduction

VMS and logbook data are linked a posteriori to increase the spatial resolution of fishing activities. In this part we investigate how the assumptions made while linking the data influence the results of spatial analysis. We will look at three main aspects of uncertainty in this tutorial:

  1. definition of vessel activity (i.e. when a vessel is fishing) based on speed thresholds
  2. distribution the daily catch per ICES rectangle (resolution in logbooks) on fishing VMS points
  3. temporal resolution of VMS data (interval between to pings) and how it relates to the spatial resolution of maps as more frequent pings allow higher resolution maps.

Definition of vessel activity based on Fishing speed threshold

In VMS data we have a number of important information but the status of the fishing vessel (fishing or not) is missing. Because of this we have to estimate the activity based on the information available, speed (as seen in the practical 3). Once a VMS point is linked to a fishing trip in the logbook (based on vessel ID and date), we can also include information about the gear operated at the time. With gear and speed it is possible to estimate whether a vessel is fishing but how do the different estimation methods differ?

library(vmstools)
data(tacsat)
data(eflalo)

tacsatp <- mergeEflalo2Tacsat(eflalo,tacsat) # merge tacsat and eflalo on vessel and date to identify the fishing trip
tacsatp$LE_GEAR <- eflalo$LE_GEAR[match(tacsatp$FT_REF,eflalo$FT_REF)] # add gear per trip to tacsat

To make the analysis easier, let's select a single gear. We can look at the gears present in our data set (for a complete list of the gear codes and their meaning please look here).

# checking the frequency of the different gears
table(tacsatp$LE_GEAR)
# DRB   GNS   MIS   OTB   OTM   PTB   TBB 
#  680     9   123 13527 12249    91 54140 

TBB (beamtrawl) has the most pings so our statistical analysis will be easier if we select that gear

# Looking at TBB separately
TBB <- subset(tacsatp,LE_GEAR == "TBB" & format(SI_DATIM,format='%Y') == 1801 & !is.na(SI_SP))

#Let's see what speed profiles are associated with this gear
library(ggplot2)
X11()
ggplot(TBB,aes(x=SI_SP))+xlim(c(0,20)) + geom_histogram(binwidth=0.3) + xlab("Knots")

Based on the speed distribution of beam trawlers (see practical 3 of the introductory course for more details on the methods to define fishing activity) we can estimate that the vessels are fishing between 3 and 8 knots. Let's store the activity in the SI_STATE variable and check the speed distribution for different states (floating, fishing and steaming).

# create the SI_STATE variable
TBB$SI_STATE <- NA

# We assume that for TBB
# speed < 3 kn --> floating
# 3kn <= speed <= 8 kn --> fishing
# 8kn > speed --> steaming
TBB[TBB$SI_SP<3,]$SI_STATE <- "h"
TBB[TBB$SI_SP>=3&TBB$SI_SP<=8,]$SI_STATE <- "f"
TBB[TBB$SI_SP>8,]$SI_STATE <- "s"

#Lets look at the results
X11()
ggplot(TBB,aes(x=SI_SP, fill=SI_STATE))+xlim(c(0,20)) + geom_histogram(binwidth=0.3) + xlab("Knots")

This feels about right, on average, the states are well defined. Now lets look at our data a bit closer to check whether the gear was the right level to look at. Beam trawls (TBB) are used for different types of fisheries, targeting different fish species. Those different fisheries use different mesh sizes that we can also use to investigate the speed distribution by fishery.

# let's add mesh sizes
TBB$LE_MSZ <- eflalo$LE_MSZ[match(TBB$FT_REF,eflalo$FT_REF)]

# to make categories, we can look at the distribution of the mesh sizes
ggplot(TBB,aes(x=LE_MSZ))+ geom_histogram()

# here we have 3 groups, less than 25, 80-90 and >100mm
TBB$MSZ_cat <- cut(TBB$LE_MSZ, breaks=c(0,50,100,130))

# now let's look again if the speed thresholds we've used correspond to to different fisheries
ggplot(TBB,aes(x=SI_SP, fill=SI_STATE))+xlim(c(0,20)) + geom_histogram(binwidth=0.3) + xlab("Knots")+facet_wrap(~MSZ_cat,scales="free_y",ncol=1)

Note how the middle peak seems to move to higher speed with larger meshsizes (big fish swim faster?). It seems that we got it right for the middle category but the one with the smallest mesh size seems particularly off.

Exercise 1 - What impacts fishing speed?

In this first exercise you'll be investigating some systematic sources of uncertainty that you can correct by looking more closely at your data.

  1. Does vessel size have an impact on the speed threshold? How does it work? (Tip: use the vessel categories 0-18m, 18-24m, >24m)
Show answer

Let's plot the speed by vessel length category

TBB_ex <- TBB

  # let's add vessel length
TBB_ex$VE_LEN <- eflalo$VE_LEN[match(TBB_ex$FT_REF,eflalo$FT_REF)]

# to make categories, we can look at the distribution of the mesh sizes
ggplot(TBB_ex,aes(x=VE_LEN))+ geom_histogram()

# here we have 
TBB_ex$LEN_cat <- cut(TBB_ex$VE_LEN, breaks=c(0,18,24,50))

# now let's look again if the speed thresholds we've used correspond to to different fisheries
ggplot(TBB_ex,aes(x=SI_SP, fill=SI_STATE))+xlim(c(0,20)) + geom_histogram(binwidth=0.3) + xlab("Knots")+facet_wrap(~LEN_cat,scales="free_y",ncol=1)

On the figure we can see that the middle length category seems to be fishing slower than the large or small vessels. This may be due to the mesh size they use. We can quickly check it.

table(TBB_ex$LEN_cat,TBB_ex$MSZ_cat)

and indeed, most of the small meshsize happens in the middle length vessel group while the others use almost exclusively the 50-100mm meshsize.

  1. Is there a seasonal pattern visible in the speed threshold? What would you expect? (optional)
Show answer

# let's add month
TBB_ex$SI_MONTH <- format(TBB_ex$SI_DATIM,format='%m')


# now let's look again if the speed thresholds we've used correspond to to different fisheries
ggplot(TBB_ex,aes(x=SI_SP, fill=SI_STATE))+xlim(c(0,20)) + geom_histogram() + xlab("Knots")+facet_wrap(~SI_MONTH,scales="free_y")

some months (January, May, July) seem to fish at slower speed than the average. Again this may be due to the type of meshsize used in those months.

table(TBB_ex$SI_MONTH,TBB_ex$MSZ_cat)

Not really. But we see here that those months have few observations so it may come from the selection of records in the example data.

table(TBB_ex$SI_MONTH)

It is also possible to look at it by quarter

# let's quarter
require(lubridate)
TBB_ex$quarter <- quarter(TBB_ex$SI_DATIM)


# now let's look again if the speed thresholds we've used correspond to to different fisheries
ggplot(TBB_ex,aes(x=SI_SP, fill=SI_STATE))+xlim(c(0,20)) + geom_histogram() + xlab("Knots")+facet_wrap(~quarter,scales="free_y")

At that level, we cannot see any meaningfull pattern.


Other factors known to impact the speed thresholds are fuel price, direction (empty going to fishing grounds or full coming back), weather conditions, etc. If those factors are important, one could develop a function taking them into account.

So for the next step we'll look at how much of a difference does it make if we improve the speed thresholds?

For this next step we'll define the activity using the activityTacsat() function (already seen in practical 3).

The first step is to define how many peaks or activities can be identified on a mirrored distribution (again refer to practical 3 for deatils on this). This can be done using the activityTacsatAnalyse() function or simply by hand as a data.frame (for the use of activityTacsatAnalyse, check ?activityTacsatAnalyse or practical 3).

 
 # let's include our mesh size categories in LE_GEAR (activityTacsat only runs on LE_GEAR or VE_REF)
TBB$LE_GEARorig <- TBB$LE_GEAR
TBB$LE_GEAR     <- paste(TBB$LE_GEAR,TBB$MSZ_cat,sep="-")

# the information on the number of peaks is stored in a data.frame called storeScheme which can also be made by hand

storeScheme               <- expand.grid(years = unique(format(TBB$SI_DATIM,"%Y")), months = 0,weeks = 0,
                                           analyse.by = unique(TBB[,"LE_GEAR"]))
                                           
storeScheme$peaks         <- 5      # we have three activities corresponding to 5 peaks when mirrored at 0 for all meshsizes
#storeScheme$means         <- c(-10,-5,0,5,10)    # to help the activityTacsat function, you can give it a prior 
                                      # of the mean of each peak as c(-steamSpeed, -fishSpeed,0, fishSpeed,steamSpeed) 
                                      # with steamSpeed being the average of the steaming speed peak, and fishSpeed, the average of the fishing speed peak
                                      # Note only add the "means" column if you put information in it
storeScheme$fixPeaks      <- FALSE  # we are not providing peaks here, so not fixing them either
storeScheme$sigma0        <- 0.911  # 0.911 corresponds to a width of 1.5knots on each side of the mean for the zero speed (floating activity) under a 90 percent CI


require(mixtools) #- install.packages("mixtools",repos=getOption("repos"))

#After identifying the number of peaks, its time to let the magic fit normal density distributions to the data
res           <- activityTacsat(TBB,units="year",analyse.by="LE_GEAR",storeScheme)

TBB$SI_STATE2 <- res                  # store the activity defined with activityTacsat in a new SI_STATE2 variable

# plot the activity profiles per mesh size category
ggplot(TBB,aes(x=SI_SP, fill=SI_STATE2))+xlim(c(0,20)) + geom_histogram(binwidth=0.3) + xlab("Knots")+facet_wrap(~MSZ_cat,scales="free_y",ncol=1)

The fishing peaks are much better aligned for the three mesh size categories.

Now we have two different activity estimates, let's look at whether there are differences.

# Let's plot the two sets of data
require(RColorBrewer)

#Sort the data to calculate the interval between subsequent pings and calculate the fishing effort
TBB <- sortTacsat(TBB) #Make sure it is ordered correctly
TBB <- effort(TBB, by = "byRow", unit = "hours",fill.na=T)

# calculate fishing effort based on the 2 activity definitions
TBB <- within(TBB,{Fishing1 <- as.numeric(SI_STATE=="f")  # put 1s if the state of fixed speed is fishing
                   Fishing2 <- as.numeric(SI_STATE2=="f") # put 1s id the state of mesh size group is fishing
                   fishEff1 <- Fishing1 * EFFORT          # store effort when fishing (0 when state 1 is not fishing) 
                   fishEff2 <- Fishing2 * EFFORT          # store effort when fishing (0 when state 2 is not fishing) 
                   })

colSums(TBB[c("fishEff1","fishEff2")],na.rm=T)

With the fixed speed we estimate 42.260hr of fishing while with tacsatActivity including meshsizes, "only" 36.232hr. That's 15% difference!

Let's map the both effort estimates

plotTools(TBB,level="gridcell",xlim=c(0,8),ylim=c(50,56),zlim=NULL,log=F,gridcell=c(0.1,0.05),color=NULL,control.tacsat=list(clm="fishEff1"))
title("fishing effort - fixed speed")
X11()
plotTools(TBB,level="gridcell",xlim=c(0,8),ylim=c(50,56),zlim=NULL,log=F,gridcell=c(0.1,0.05),color=NULL,control.tacsat=list(clm="fishEff2"))
title("fishing effort - tacsatActivity gear + meshsize")

We see some differences on the maps, but limited!

However remember that the color sceme in plotTools is automatically calculated for each map. If we had used the same colour scale we probably would have seen more difference with light colours in the map of effort calculated including meshsize.

We can try to identify hotspots of differences by looking at the differences between the two effort distribution.

# calculate difference in effort per ping
TBB <- within(TBB,{Fishdiff <- fishEff1 - fishEff2
                   })
summary(TBB$Fishdiff)

# visualise it at the gridcell level
plotTools(TBB,level="gridcell",xlim=c(0,8),ylim=c(50,56),zlim=NULL,log=F,gridcell=c(0.1,0.05),control.tacsat=list(clm="Fishdiff"))
title("fishing effort fixed minus tacsatActivity")

On this map we can see that the darker points are located along the western coast, indicating that we have the highest differences between our two effort estimates there.

Ideally you would plot this map using a double colour scale from negative (e.g. shades of blue) to no difference (white) to positive differences (e.g. shades of red).

Exercise 2 - Visualize effect of different speeds

  1. Check the thresholds defined automatically for each meshsize category. How do they differ from the fixed one?
Show answer

you can look at the threshold visually on the figure

ggplot(TBB,aes(x=SI_SP, fill=SI_STATE2))+xlim(c(0,20)) + geom_histogram(binwidth=0.3) + xlab("Knots")+facet_wrap(~MSZ_cat,scales="free_y",ncol=1)

or try to compute them

#min
aggregate(TBB["SI_SP"],by=list(TBB$SI_STATE2,TBB$LE_GEAR),FUN=min)
  #max
aggregate(TBB["SI_SP"],by=list(TBB$SI_STATE2,TBB$LE_GEAR),FUN=max)

So the minimum fishing speed is lower for the small meshsize (2 knots) and higher for the other (4.8 and 6.6 knots). And the maximum fishing speed is also lower for the small meshsize (4.6 knots) while the maximum speed was about right (7.8 and 7.6 knots).

  1. In Exercise 1 you investigated how other factors may impact speed of fishing. Check how the activity defined using vessel length categories changes the effort maps and total effort of our TBB fleet. In other words, run activityTacsat with vessel length and look at the change in effort maps and total fishing effort
Show answer

TBB_ex2 <- TBB
  # let's add vessel length
TBB_ex2$VE_LEN <- eflalo$VE_LEN[match(TBB_ex2$FT_REF,eflalo$FT_REF)]

TBB_ex2$LEN_cat <- cut(TBB_ex2$VE_LEN, breaks=c(0,18,24,50))
 # make a new gear type
TBB_ex2$LE_GEAR <- paste(TBB_ex2$LE_GEARorig,TBB_ex2$LEN_cat,sep="-")
 
 # make a new storeScheme
storeScheme               <- expand.grid(years = unique(format(TBB_ex2$SI_DATIM,"%Y")), months = 0,weeks = 0,
                                           analyse.by = unique(TBB_ex2[,"LE_GEAR"]))
storeScheme$peaks         <- 5      
storeScheme$fixPeaks      <- FALSE  
storeScheme$sigma0        <- 0.911  

res           <- activityTacsat(TBB_ex2,units="year",analyse.by="LE_GEAR",storeScheme)


TBB_ex2$SI_STATE3 <- res                  # store the activity defined with activityTacsat and vessel length in a new SI_STATE3 variable


# calculate fishing effort for the new activity definition
TBB_ex2 <- within(TBB_ex2,{Fishing3 <- as.numeric(SI_STATE3=="f")
                           fishEff3 <- Fishing3 * EFFORT 
                   })

# calculate difference in effort per ping
TBB_ex2 <- within(TBB_ex2,{Fishdiff13 <- fishEff1 - fishEff3
                   })
summary(TBB_ex2$Fishdiff13)

# visualise it at the gridcell level
plotTools(TBB_ex2,level="gridcell",xlim=c(0,8),ylim=c(50,56),zlim=NULL,log=F,gridcell=c(0.1,0.05),control.tacsat=list(clm="Fishdiff13"))
title("fishing effort fixed minus tacsatActivity vessel")

colSums(TBB_ex2[c("fishEff1","fishEff2","fishEff3")],na.rm=T)

so here even less total effort 34.013 fishing hours, and differences seem to be in a few hotspots above the Frisian islands.


Effort is an important indicator, especially in input based management. Other indicators that can be important for management are landings and value of landings.

Let's allocate the landings and value of landings on fishing pings. First for the fixed speed threshold.

# calculate  total landings and value and fishing effort

  TBB <- within(TBB, {LE_GEAR <- LE_GEARorig                # set the gear back to original gears to match with eflalo 
                    SI_STATE <- ifelse(SI_STATE=="f",1,0)   # state needs to be set as 0 or 1s here using the SI_STATE using the fixed speed thresholds
                    })
  # first allocate the landings based on the activity based on fixed speed thresholds
  TBBfxsp_catch      <- splitAmongPings(tacsat=TBB,
                            eflalo=subset(eflalo,LE_GEAR == "TBB" & substring(LE_CDAT,7) == "1801"),
                            variable="all",level="day",conserve=T)

Do not worry if you get the following error message, Error in FUN(X[[i]], ...) : only defined on a data frame with all numeric variables as long as the function runs for the other levels of aggregation it works.

Then do the same with the activity per mesh size category calculated with tacsatActivity

  TBB2 <- within(TBB,{SI_STATE <- ifelse(SI_STATE2=="f",1,0)}) # state needs to be set as 0 or 1s here using the SI_STATE using the tacsataActivity speed thresholds

  TBBmsz_catch      <- splitAmongPings(tacsat=TBB2,
                                  eflalo=subset(eflalo,LE_GEAR == "TBB" & substring(LE_CDAT,7) == "1801"),
                                  variable="all",level="day",conserve=T)

We then put the two data sets together identifying the different records with the variable "method".


  TBBcatch <- rbind(cbind(TBBfxsp_catch,method="fixedSpeed_gear"),
                    cbind(TBBmsz_catch, method="activityTacsat_msz"))

We aggregate the landings and value of all the species to look at totals only


  # Calculate total landings
  TBBcatch$LE_KG_TOT  <- rowSums(TBBcatch[c(grep("KG",colnames(TBBcatch)))],na.rm=T)
  TBBcatch$LE_EURO_TOT  <- rowSums(TBBcatch[c(grep("EURO",colnames(TBBcatch)))],na.rm=T)

We also recalculate fishing effort

  # Calculate fishing effort
  TBBcatch <- within(TBBcatch,FishEff <- EFFORT * SI_STATE)

and look at our 3 indicators at the same time

aggregate(TBBcatch[c("LE_KG_TOT","LE_EURO_TOT","FishEff")],by=list(TBBcatch$method),FUN=sum,na.rm=T)
#             Group.1 LE_KG_TOT LE_EURO_TOT  FishEff
#1 activityTacsat_msz   4048973    16734123 36232.52
#2    fixedSpeed_gear   4048973    16734123 42260.10

We see again the difference in total fishing effort. The total landings in kg and euro are the same. This makes sense as they are allocated proportionally to the fishing effort of each method. Given that there is less fishing effort identified in method=="activityTacsat_msz the landings and value allocated per ping is therefore expected to be higher.

aggregate(TBBcatch[c("LE_KG_TOT","LE_EURO_TOT")],by=list(SI_STATE=TBBcatch$SI_STATE,method=TBBcatch$method),FUN=mean)

Let's check whether there are difference when we select a specific area. We select an area along the coast and estimate the differences in indicators there.

# draw area
coastal <- lonLat2SpatialPolygons(lst=list(data.frame(SI_LONG=3.6+c(-0.5,0.2,0.4,-0.3,-0.5),
                                       SI_LATI=51.7+c(-0.25,0.25,0.25,-0.25,-0.25))))
plotTools(TBB,level="gridcell",xlim=c(0,8),ylim=c(50,56),zlim=NULL,log=F,gridcell=c(0.1,0.05),control.tacsat=list(clm="Fishdiff"))
title("fishing effort fixed minus tacsatActivity")
plot(coastal,border="black",add=T)

# calculate the indicators in that area
pings  <- SpatialPoints(TBBcatch[c('SI_LONG','SI_LATI')],proj4string=CRS("+proj=longlat +datum=WGS84"))
proj4string(coastal) <- CRS("+proj=longlat +datum=WGS84") 

# then see which pings are in our coastal area
coastPings <- over(pings,coastal)

TBBcatch$coastal <- as.factor(!is.na(coastPings))
aggregate(TBBcatch[c("LE_KG_TOT","LE_EURO_TOT","FishEff")],by=list(method=TBBcatch$method,coastal=TBBcatch$coastal),FUN=sum,na.rm=T)
 

So we see about 30% less effort when taking the meshsize into account, 1% less value and 1% more landings. This can be explained by the fact that each ping with fishing effort is "worth" more in terms of landings with the activityTacsat_msz than fixedSpeed_gear.

aggregate(TBBcatch[c("LE_KG_TOT","LE_EURO_TOT")],by=list(SI_STATE=TBBcatch$SI_STATE,method=TBBcatch$method,coastal=TBBcatch$coastal),FUN=mean)


An area where there are special restrictions is the 12 mile zone. Larger vessels (>225kw) are not allowed to fish within 12 nm of the coast.

Save the twelve miles zone polygon on your computer and plot it

load("./output/twelveMile.RData")
X11()
plotTools(TBB,level="gridcell",xlim=c(0,8),ylim=c(50,56),zlim=NULL,log=F,gridcell=c(0.1,0.05),control.tacsat=list(clm="Fishdiff"))
title("fishing effort fixed minus tacsatActivity")
plot(twelveMile, add=T)

Given the proximity with the coast and the density of potential observers, it is unlikely that fishers would illegally fish in those areas. However they will probably reduce speed while arriving close to harbour, leading to "steaming" pings being classified as "fishing" pings.

Exercise 3 - implications for management

  1. Map the fishing effort in the twelve nm zone for the vessels not supposed to fish there (>225kW, use the variable "VE_KW" from eflalo), with the 2 activity estimations.
Show answer

TBB_ex3 <- TBBcatch
  # let's add engine power
TBB_ex3$VE_KW <- eflalo$VE_KW[match(TBB_ex3$FT_REF,eflalo$FT_REF)]

# select only those large vessels
TBB_ex3 <- subset(TBB_ex3,VE_KW>225)

add 12 mile zone information to data

# here you can use pointOnLand with "land" = your twelve mile zone area
idx <- pointOnLand(TBB_ex3,twelveMile,proj4string=CRS("+proj=longlat +ellps=WGS84"))

# the points in the 12 nm zone gets a 1 while points outside the area gets a 0
TBB_ex3$twMile <- idx

and plot the effort for each method

for (mt in unique(TBB_ex3$method)){
  X11()
  plotTools(subset(TBB_ex3,method==mt),level="gridcell",xlim=c(0,8),ylim=c(50,56),zlim=NULL,log=F,gridcell=c(0.1,0.05),control.tacsat=list(clm="FishEff"))
  title(paste("fishing effort 12 nm zone",mt))
  plot(twelveMile,border="black",add=T)
}

so both methods performed quite poorly here and a lot of effort is allocated to this area. How much exactly?

tapply(TBB_ex3$FishEff,list(TBB_ex3$method,TBB_ex3$twMile),FUN=sum,na.rm=T)

Between 600 and 700 fishing hours in the twelve mile zone.

  1. Quantify the value of landings and landings in the twelve nm zone for those vessels.
Show answer

Here it is easy, you just use the data made earlier

tapply(TBB_ex3$LE_KG_TOT,list(TBB_ex3$method,TBB_ex3$twMile),FUN=sum,na.rm=T)
tapply(TBB_ex3$LE_EURO_TOT,list(TBB_ex3$method,TBB_ex3$twMile),FUN=sum,na.rm=T)

So depending on the activity method, between 85 and 88 tonnes of fish worth 320 to 325 thousand euro.

  1. How can you manually correct this error of classification?
Show answer

Before allocating the landings to ping you could force the status of the pings within the 12 mile zone and for vessels >225kW to be "non-fishing" First add a twMile variable on the TBB tacsat (before adding landings).


TBB$twMile <- pointOnLand(TBB,twelveMile,proj4string=CRS("+proj=longlat +ellps=WGS84"))

Then get the engine power

TBB$VE_KW <- eflalo$VE_KW[match(TBB$FT_REF,eflalo$FT_REF)]

and set the SI_STATE to 0 (non fishing) for the pings within the 12 mile zone and for vessels >225kW allows you to correct the effort. You can then apply splitAmongPings to get the correct landings and values (not shown here).

TBB <- within(TBB, {SI_STATE <- replace(SI_STATE, VE_KW>225 & twMile==1,0)
                    Fishing1 <- replace(Fishing1, VE_KW>225 & twMile==1,0)
                    fishEff1 <- Fishing1 * EFFORT          # recalculate fishing effort
})
  
plotTools(subset(TBB,VE_KW>225),level="gridcell",xlim=c(0,8),ylim=c(50,56),zlim=NULL,log=F,gridcell=c(0.1,0.05),control.tacsat=list(clm="fishEff1"))
title(paste("fishing effort Vessels >225kW 12 nm zone fixed speed"))
plot(twelveMile,border="black",add=T)

Hmmm it looks like we still have some effort in the 12 mile zone. Let's check if we actually do.

# look at the points for those vessels outside the 12 mile zone (to see if our overlay failed)
TBBsub <- subset(TBB,VE_KW>225&twMile==0)
points(TBBsub$SI_LONG,TBBsub$SI_LATI,pch=".",cex=1)# add actual VMS points
tapply(TBBsub$fishEff1,TBBsub$twMile,FUN=sum,na.rm=T)

So this looks good. No point in the twelve mile zone The colour just comes from the way the legend of plotTools works. The lowest values (0s) get the fairer colour. By selecting only the positive values you can prevent printing those 0s. You can also make better plots for which you use white for 0s using practical 9.

plotTools(subset(TBB,VE_KW>225 &fishEff1>0),level="gridcell",xlim=c(0,8),ylim=c(50,56),zlim=NULL,log=F,gridcell=c(0.1,0.05),control.tacsat=list(clm="fishEff1"))
title(paste("fishing effort Vessels >225kW 12 nm zone fixed speed"))
plot(twelveMile,border="black",add=T)

Much better!

The case of passive gear fisheries

Using speed to define activity is particularly valid for active gears (with the uncertainties pointed above). In case of passive gears, we are interested in the soaking time. This can be inferred from the logbooks, more research and methods still need to be developed for those fisheries. So the one thing we'll try to do is to paint a picture of how a typical trip of a gillnet fisher looks like and if that behavior tells us something about soaking time. For that, we look at departime times and arrival times.

data(eflalo)
#- Select only gillnets
GNS             <- subset(eflalo,LE_GEAR == "GNS")
#- Get a posix style for departure date and time
GNS$FT_LDATIM   <- as.POSIXct(paste(GNS$FT_LDAT,GNS$FT_LTIME),format="%d/%m/%Y %H:%M:%S")
GNS$FT_DDATIM   <- as.POSIXct(paste(GNS$FT_DDAT,GNS$FT_DTIME),format="%d/%m/%Y %H:%M:%S")
#- Calculate time outside of the harbour
GNS$outHarb     <- GNS$FT_LDATIM - GNS$FT_DDATIM

barplot(table(GNS$outHarb)) #peak between 2 - 11h

GNSsub          <- subset(GNS,outHarb >=2 & outHarb <=11)
barplot(table(hour(GNSsub$FT_DDATIM))) #nice peak between 5-9 AM

barplot(table(hour(GNSsub$FT_LDATIM))) #a bit more mixed

#- So what does that tell us about soaking time?
#- Should we also check what happened the day before?

Catch distribution

As seen earlier in this and other practicals, catch is distributed using the splitAmongPings function. This function comes with the assumption that the kilo and euro of fish are distributed evenly among the pings that correspond to the eflalo level chosen.

Here we will play with the splitAmongPings function and the resolution of eflalo level you chose to assign the landings and revenue (trip, day, ICES rect). Up to now we've allocated landings and value based on the highest resolution available (day). Let's check how changing this parameter affects the result of our indicators.

First let's prepare the data for splitAmongPings(), we'll chose another gear for a change.


# subset otter trawls, OTB
OTB <- subset(tacsatp,LE_GEAR == "OTB" & format(SI_DATIM,format='%Y') == 1801 & !is.na(SI_SP))

To apply splitAmongPings we need to have the SI_STATE define so that the function knows the fishing pings

# define activity based on activityTacsat
storeScheme               <- expand.grid(years = unique(format(OTB$SI_DATIM,"%Y")), months = 0,weeks = 0,
                                           analyse.by = unique(OTB[,"LE_GEAR"]))
storeScheme$peaks         <- 5      
storeScheme$fixPeaks      <- FALSE  
storeScheme$sigma0        <- 0.911  

res           <- activityTacsat(OTB,units="year",analyse.by="LE_GEAR",storeScheme)

OTB$SI_STATE <- ifelse(res=="f",1,0)

Let's look a bit deeper at the function

# always good to have a look at the documentation
?splitAmongPings

Then apply the splitAmongPings function using level="day"

# distribute catch per day
 OTB_day      <- splitAmongPings(tacsat=OTB,
                                  eflalo=subset(eflalo,LE_GEAR == "OTB" & substring(LE_CDAT,7) == "1801"),
                                  variable="all",level="day")
  # Only keep total landings
OTB_day$LE_KG_TOT  <- rowSums(OTB_day[c(grep("KG",colnames(OTB_day)))],na.rm=T)
OTB_day$LE_EURO_TOT  <- rowSums(OTB_day[c(grep("EURO",colnames(OTB_day)))],na.rm=T)
OTB_day <- OTB_day[c(names(OTB),"LE_KG_TOT","LE_EURO_TOT")]

Again do not worry about the error messages Error in FUN(X[[i]], ...) : only defined on a data frame with all numeric variables.

And look at our landings distribution (in kg and euro)

X11()
plotTools(OTB_day,level="gridcell",xlim=c(0,8),ylim=c(50,56),zlim=NULL,log=F,gridcell=c(0.1,0.05),control.tacsat=list(clm="LE_KG_TOT"))
title("OTB - TOT KG - level='day'")
points(OTB_day$SI_LONG,OTB_day$SI_LATI,pch=".",cex=1)# add actual VMS points

X11()
plotTools(OTB_day,level="gridcell",xlim=c(0,8),ylim=c(50,56),zlim=NULL,log=F,gridcell=c(0.1,0.05),control.tacsat=list(clm="LE_EURO_TOT"))
title("OTB - TOT EURO - level='day'")
points(OTB_day$SI_LONG,OTB_day$SI_LATI,pch=".",cex=1) # add actual VMS points

Then apply the splitAmongPings function using level="ICESrectangle"


# distribute catch per ICESrectangle
 OTB_rect      <- splitAmongPings(tacsat=OTB,
                                  eflalo=subset(eflalo,LE_GEAR == "OTB" & substring(LE_CDAT,7) == "1801"),
                                  variable="all",level="ICESrectangle")
  # Only keep total landings
OTB_rect$LE_KG_TOT  <- rowSums(OTB_rect[c(grep("KG",colnames(OTB_rect)))],na.rm=T)
OTB_rect$LE_EURO_TOT  <- rowSums(OTB_rect[c(grep("EURO",colnames(OTB_rect)))],na.rm=T)
OTB_rect <- OTB_rect[c(names(OTB),"LE_KG_TOT","LE_EURO_TOT")]

And look at that landings distribution (in kg and euro)

X11()
plotTools(OTB_rect,level="gridcell",xlim=c(0,8),ylim=c(50,56),zlim=NULL,log=F,gridcell=c(0.1,0.05),control.tacsat=list(clm="LE_KG_TOT"))
title("OTB - TOT KG - level='ICESrectangle'")
points(OTB_rect$SI_LONG,OTB_rect$SI_LATI,pch=".",cex=1)


X11()
plotTools(OTB_rect,level="gridcell",xlim=c(0,8),ylim=c(50,56),zlim=NULL,log=F,gridcell=c(0.1,0.05),control.tacsat=list(clm="LE_EURO_TOT"))
points(OTB_rect$SI_LONG,OTB_rect$SI_LATI,pch=".",cex=1)
title("OTB - TOT EURO - level='ICESrectangle'")

We can see some differences on the maps for the full fishery. Now let's try to look at a single trip.

# let's select a trip with many points
table(OTB$FT_REF)

Let's pick FT_REF==296880 and plot that trip for level="day"

tripOTBday <- subset(OTB_day,FT_REF==296880&LE_KG_TOT>0)
X11()
plotTools(tripOTBday,level="gridcell",xlim=c(3,7),ylim=c(53,56),zlim=NULL,log=F,gridcell=c(0.1,0.05),control.tacsat=list(clm="LE_KG_TOT"))
title("FT_REF==296880 - TOT KG - level='day'")
points(tripOTBday$SI_LONG,tripOTBday$SI_LATI,pch=".",cex=2)

sum(subset(OTB_day,FT_REF==296880)$LE_KG_TOT)

Here we see the total landings of 4004 kgs for this trip allocated on the map.

Let's also plot that trip for level="ICESrectangle"

tripOTBrect <- subset(OTB_rect,FT_REF==296880&LE_KG_TOT>0)
X11()
plotTools(tripOTBrect,level="gridcell",xlim=c(3,7),ylim=c(53,56),zlim=NULL,log=F,gridcell=c(0.1,0.05),control.tacsat=list(clm="LE_KG_TOT"))
title("FT_REF==296880 - TOT KG - level='ICESrectangle'")
points(tripOTBrect$SI_LONG,tripOTBrect$SI_LATI,pch=".",cex=2)
sum(subset(OTB_rect,FT_REF==296880)$LE_KG_TOT)

Here too the total kgs is 4004 kg There can be substantial difference seen in the distribution at the level of a trip. Note how the total catch is the same in both case.

You can also try to run it with the lowest resolution (trip level) which would be the level you would have if you only had access to pay slips from the auctions and not the full logbook. So if you trust your data you should use the highest level possible. Changing it can also give you an estimate of the uncertainty around your estimate in case you doubt the data provided in the logbooks is correct.

Interpolation of VMS pings

One thing very visible on the maps at the trip level is the patchiness of the data and the gaps between pings (black dots) and the fact that catch is only allocated to pixels with an observation (ping, every 2 hours). Of course when looking at an area with many observations (many vessels, many years, etc) those gaps don't matter as the sample is large enough that the estimates would be statistically significant.

When looking at areas with little observations, it can be interesting to also play with the resolution of tacsat pings and increase the number of pings using interpolation (see Practical 5 for more information).

Basically using the data on speed and direction you try to reconstruct (interpolate) the track between 2 consecutive pings.

#Sort the Tacsat data
OTB     <- sortTacsat(OTB)

#Filter the data keep the fishing pings 
range(subset(OTB,SI_STATE==1)$SI_SP)
#[1] 2.0 4.6

So with OTB, fishing occurs between 2 and 4.6 knots.

OTB     <- filterTacsat(OTB ,c(2,5),hd=NULL,remDup=T) # by selecting 2 to 5 we are sure to keep all our fishing pings

We then interpolate our data. If we want pings every 30 minutes we will need 5 points within our current interval of 120 minutes (including the first and last points). To visualise it you can draw it.

1st point 2d point 3rd point 4th point 5th point
0 min 30 min 60 min 90 min 120 min

First a quick look at the documentation to check what everything means for the interpolateTacsat() function.

?interpolateTacsat 

As a rule of thumb you ask for 10 times more points than you eventually need in your interpolated tacsat file so here res=50. This is to ensure that all parts of the tracks is properly captured, straight lines and curves.

interpolation <- interpolateTacsat(OTB,interval=120,margin=10,res=50,method="cHs",params=list(fm=0.5,distscale=20,
                                   sigline=0.2,st=c(2,6)),headingAdjustment=0)

You then transform the interpolated data into a tacsat file, using here npoints = 5.

# create dummies for some variables (due to a bug in newest version)
OTB$INTV <- 0;OTB$LE_SURF <- 0;OTB$LE_SUBSURF <- 0
OTBInt <- interpolation2Tacsat(interpolation,OTB,npoints=5)

and allocate the landings based on day level

 # speed has been reestimated
 OTBInt$SI_STATE  <- ifelse(OTBInt$SI_SP>=2&OTBInt$SI_SP<=4.6,1,0)   
 OTBInt_day      <- splitAmongPings(tacsat=OTBInt,
                                  eflalo=subset(eflalo,LE_GEAR == "OTB" & substring(LE_CDAT,7) == "1801"),
                                  variable="all",level="day")
  # Only keep total landings
OTBInt_day$LE_KG_TOT  <- rowSums(OTBInt_day[c(grep("KG",colnames(OTBInt_day)))],na.rm=T)
OTBInt_day$LE_EURO_TOT  <- rowSums(OTBInt_day[c(grep("EURO",colnames(OTBInt_day)))],na.rm=T)
OTBInt_day <- OTBInt_day[c(names(OTB),"LE_KG_TOT","LE_EURO_TOT")]


X11()
plotTools(OTBInt_day,level="gridcell",xlim=c(0,8),ylim=c(50,56),zlim=NULL,log=F,gridcell=c(0.1,0.05),control.tacsat=list(clm="LE_KG_TOT"))
title("OTB - interpolated - TOT KG - level='day'")
points(OTBInt_day$SI_LONG,OTBInt_day$SI_LATI,pch=".",cex=1)# add actual VMS points

tripOTBintday <- subset(OTBInt_day,FT_REF==296880&LE_KG_TOT>0)
X11()
plotTools(tripOTBintday,level="gridcell",xlim=c(3,7),ylim=c(53,56),zlim=NULL,log=F,gridcell=c(0.1,0.05),control.tacsat=list(clm="LE_KG_TOT"))
title("FT_REF==296880 - interpolated - TOT KG - level='day'")
points(tripOTBintday$SI_LONG,tripOTBintday$SI_LATI,pch=".",cex=2)

sum(subset(OTBInt_day,FT_REF==296880)$LE_KG_TOT)

Compare the resulting maps with the previous one. What difference do you see?

The patchiness has decreased and true fishing grounds are more visible.

Exercise 4

  1. increase the resolution of the pings to every 15mins. What happens to the size of your data?
Show answer

To get points every 15 min, we need 9 points instead of the 2 points from the original data.

1st point 2d point 3rd point 4th point 5th point 6th point 7th point 8th point 9th point
0 min 15 min 30 min 45 min 60 min 75 min 90 min 105 min 120 min
# first interpolate with 90 (9 x 10) points
interpolation_ex <- interpolateTacsat(OTB,interval=120,margin=10,res=90,method="cHs",params=list(fm=0.5,distscale=20,
                                   sigline=0.2,st=c(2,6)),headingAdjustment=0)
OTB_ex <- interpolation2Tacsat(interpolation_ex,OTB,npoints=9)

Look at the number of rows of the original data and the interpolated tacsat

nrow(OTB)
nrow(OTB_ex)

nrow(OTB_ex)/nrow(OTB)

So our data is a bit more than 6 times larger than the original data.

  1. look at the effect of changing the resolution for a single trip
Show answer

Let's look at the landings for our trip 296880.

 # speed has been reestimated
 OTB_ex$SI_STATE  <- ifelse(OTB_ex$SI_SP>=2&OTB_ex$SI_SP<=4.6,1,0)   
 OTB_ex_day      <- splitAmongPings(tacsat=OTB_ex,
                                  eflalo=subset(eflalo,LE_GEAR == "OTB" & substring(LE_CDAT,7) == "1801"),
                                  variable="all",level="day")
  # Only keep total landings
OTB_ex_day$LE_KG_TOT  <- rowSums(OTB_ex_day[c(grep("KG",colnames(OTB_ex_day)))],na.rm=T)

tripOTBexday <- subset(OTB_ex_day,FT_REF==296880&LE_KG_TOT>0)
X11()
plotTools(tripOTBexday,level="gridcell",xlim=c(3,7),ylim=c(53,56),zlim=NULL,log=F,gridcell=c(0.1,0.05),control.tacsat=list(clm="LE_KG_TOT"))
title("FT_REF==296880 - interpolated - TOT KG - level='day'")
points(tripOTBexday$SI_LONG,tripOTBexday$SI_LATI,pch=".",cex=2)

At this spatial resolution, decreasing the intervals from 30 to 15 min doesn't seem to add anything for this gear (fishing at speed between 2 and 5kn).

interpolation uncertainty (from practical 5)

Obviously, interpolation isn't without uncertainty either. Hence, tools have been developed to show / calculate this uncertainty and to allow us to do other analyses on the basis of those results. It should be stressed, however, that we are talking about uncertainty in the track of the interpolation, not the VMS position itself. Remember that the VMS 'pings' aren't without error (although it is probably negligible in the grand scheme of things).

From the interpolation and VMS points themselves, we can already distill a bit of guidance on how to construct 'confidence intervals'.

  1. Confidence at the VMS pings equals 1
  2. The further away from, either of these two points, the lower the certainty.
  3. The further away from the interpolation, the lower the certainty (as we assume that the interpolation is the best possible track between the two pings)

We like to do these calculations on a grid again, as that allows us to plot the results too.

require(lattice)
data(tacsat)

#Sort the Tacsat data
tacsat     <- sortTacsat(tacsat)
tacsat     <- tacsat[1:1000,]

#Filter the Tacsat data
tacsat          <- filterTacsat(tacsat,c(2,6),hd=NULL,remDup=T)

#Interpolate the VMS data
interpolation <- interpolateTacsat(tacsat,interval=120,margin=10,
res=100,method="cHs",params=list(fm=0.5,distscale=20,
sigline=0.2,st=c(2,6)),headingAdjustment=0)

#Create the final grid where all interpolations should fit on
xrange        <- c(2.1,2.6); yrange <- c(51.3,51.65)
grid          <- createGrid(xrange,yrange,resx=0.001,resy=0.0005)
gridSP          <- createGrid(xrange,yrange,resx=0.001,resy=0.0005,type="SpatialGrid")
#Do some fancy stuff and calculate the Confidence Interval
X11()
res           <- calculateCI(interpolation[[4]],
tacsat[interpolation[[4]][1,],],
params=list(fm=0.25,distscale=3.1,sigline=0.4,st=c(2,6)),
grid=grid,
plot=TRUE,spatialGrid=gridSP)

#Scale all values between 0 and 1
res <- (res[[1]] - min(res[[1]])) / (max(res[[1]]) - min(res[[1]]))

levelplot(matrix(res,ncol=458,nrow=379)[,379:1])

That was the code to just calculate one confidence interval. But what if I want the confidence interval of a whole trip. That requires a short loop! Let's see how far we get.

data(tacsat)

#Sort the Tacsat data
tacsat     <- sortTacsat(tacsat)
tacsat     <- tacsat[1:1000,]

#Filter the Tacsat data
tacsat          <- filterTacsat(tacsat,c(2,6),hd=NULL,remDup=T)
#Interpolate the VMS data
interpolation <- interpolateTacsat(tacsat,interval=120,margin=10,
res=100,method="cHs",params=list(fm=0.5,distscale=20,
sigline=0.2,st=c(2,6)),headingAdjustment=0)

Ok, data has been prepared and interpolated, now define the grid

ranges        <- do.call(rbind,lapply(interpolation,function(x){return(apply(x[-1,],2,range))}))
xrange        <- range(ranges[,1]); yrange <- range(ranges[,2])
grid          <- createGrid(xrange,yrange,resx=0.01,resy=0.005)
sPDF          <- createGrid(xrange,yrange,resx=0.01,resy=0.005,type="SpatialGridDataFrame")

#Now I don't add zeros, but ones instead, because it's about the chance of NOT being trawled which declines with the values of the CI
sPDF@data     <- data.frame(rep(1,nrow(sPDF@data)))
colnames(sPDF@data) <- "data"

I've created a data frame where I can store the chance that an area is untrawled, which obviously ranges between 0 and 1. I call this the sPDF (spatial data frame). Hereafter I need to fill the spatial dataframe based on the calculations of the CI. The functions returns the index of the dataframe where data on the CI should be stored (not in every grid cell, but just a selection) and the value of the CI cells.

library(maps); library(mapdata); library(RColorBrewer)
#Start the loop
gridSP          <- createGrid(xrange,yrange,resx=0.01,resy=0.005,type="SpatialGrid")
for(iNt in 1:length(interpolation)){
print(iNt)
resCI <- calculateCI(interpolation[[iNt]],tacsat[interpolation[[iNt]][1,],],
params=list(fm=0.5,distscale=3.1,sigline=0.42,st=c(2,6)),grid=grid,
plot=F,spatialGrid=gridSP)

idx       <- getGridIndex(coordinates(resCI),grid,all.inside=F)
#some coordinates might be located outside of the grid, we need to check and correct for that
idxdat    <- which(is.na(idx)==F)
sPDF@data$data[idx[idxdat]] <- (1-resCI@data$data[idxdat]) * sPDF@data$data[idx[idxdat]]
}#end for loop

As I am only interested in those areas where I do have trawling, I simply only use that part where the chance of no trawling is smaller than 1. Thereafter I associate the value of the data with a color, which makes it easier to plot afterwards.

#Select those areas where trawling intensity is smaller than 1
idxwithdata <- which(sPDF@data<1)

#Get the coordinates of these areas
coords      <- data.frame(coordinates(sPDF)[idxwithdata,])
coords$data <- sPDF@data[idxwithdata,]

#some nasty code here just to determine in which category (in total 9) the data belongs. This category automatically becomes the number of the color to use
coords$color<- apply(abs(outer(seq(0,1,length.out=9),coords$data,"-")),2,which.min)

#Define the ranges in longitude and lattitude
xrange <- range(coords[,1]); yrange <- range(coords[,2])
X11()
map("worldHires",fill=T,col="darkgreen",xlim=xrange,ylim=c(yrange[1],54.6))
map.axes()
color <- rev(brewer.pal(9,"YlOrRd"))

Time to plot, a 'for' loop with a 'polygon' call is not the best way to do it, but it works for now.

for(i in 1:nrow(coords)){
polygon(x=c(coords[i,1],coords[i,1]+0.01,coords[i,1]+0.01,coords[i,1]),
y=c(coords[i,2],coords[i,2],
coords[i,2]+0.005,coords[i,2]+0.005),
col=color[coords[i,"color"]],lwd=1,border=NA)
}
#Some plotting occurred on top of the map, so let's add the map again (but do not overwrite, hence the 'add=T' argument
map("worldHires",fill=T,col="darkgreen",xlim=xrange,ylim=yrange,add=T)

That gives us a nice picture, let's see where the original VMS datapoints were located!

#compare to VMS points
x11()
map("worldHires",fill=T,col="darkgreen",xlim=xrange,ylim=c(yrange[1],54.6))
map.axes()
points(tacsat$SI_LONG,tacsat$SI_LATI,col=1,pch=19,cex=0.5)

That is pretty much all VMStools can help you with related to interpolations. Go ahead and have fun!

Temporal resolution of pings

The temporal resolution of VMS data is regularly pointed as being insufficient. In most countries the ping frequency is every two hours. It is of course possible to increase the frequency but it is costly and a trade off analysis between the advantages and the costs of increasing the resolution and frequency should be taken.

In vmstools it is possible to explore the added value of a high frequency system as we actually have access to such a high frequency dataset. If time allows you can have a look at the first part of practical 5.

You can then subset the VMShf dataset to get to 2hr resolution, interpolate the data to the original resolution and compare the resulting effort with the original VMShf.

⚠️ **GitHub.com Fallback** ⚠️