Strainmeter offsets: real or spurious - janeshdev/ggplot2 GitHub Wiki

Strainmeter offsets: real or spurious?

Andrew Barbour

Scripps Institution of Oceanography

Motivation

As a geophysics graduate student, I study data from instruments that measure strain (ratio of length change to original length) in the Earth’s crust (where most earthquakes happen); they are installed along the Pacific and North American tectonic plate boundary.

Since the installation of these instruments, some large earthquakes (near and far) have caused the raw data to contain static offsets at the time of the quake. From first-order modeling of the crust, we expect to see an offset, depending mainly on earthquake-magnitude and -distance, but often the magnitude of the expected signal is well below the instrumental precision (0.1 nano-strain). Our observations of offsets are often orders of magnitude larger than expected, and frequently have incorrect sign, as the case study below demonstrates.

Code

This code is intended to provide a means to thoroughly characterize all the observed offsets for a single earthquake so I can inter-compare earthquakes, stations, or channels; this facilitates identification or similarities, differences, and/or trends in my observations. The theoretical calculations are not done in R; rather, they are imported. This was my first attempt at using sqldf. I like the syntax very much and plan to use it frequently. It made joining datasets tractable and efficient.

##
## shell script to access the R code for multiple datasets; hence, take a 
## command line input of the event name (for labeling):
##

event<-commandArgs(6)
parts<-unlist(strsplit(event,' '))
date<-parts[1]
mag<-parts[2]
event<-parts[3]
eventdf<-data.frame(eq=event)
title<-paste('during',date,mag,event)

##
##Initialize figure and parameters:
##

pdf("Rplots.pdf", width=16, height=8)
par(mfcol=c(2,4))
fontpar<-par(ps=16)

##
##Start dealing with data sets:
##

www <- "http://kook.ucsd.edu/pub/offset.values"
offsetdf <- read.table(www,h=F)
data <- "http://kook.ucsd.edu/pub/elmayor.txt"
thydf=read.table(data,header=TRUE)
thy<-thydf$theory
stafact=factor(thydf$sta)
azim<-thydf$azim
ang=numeric(length(azim))

# strainmeters have an inherent directional ambiguity, so map the azimuth to a 0-180 scale
ang[azim>=0]=azim[azim>=0]
ang[azim<0]=abs(azim[azim<0]+180)
ang[azim>=180]=azim[azim>=180]-180

# find only positives
posind<-which(thy>0)
possta<-thydf$sta[posind]
poscha<-thydf$cha[posind]
posthy<-log10(thy[posind])+1
posang<-ang[posind]
posfil<-stafact[posind]
postit<-"Predicted strain orientation by station\nextensional offsets"
posxax<-"" #"extensional"
posdf <- data.frame(sta=possta,cha=poscha,thy=posthy,ang=posang,fill=posfil,title=postit,xlab=posxax)
histbr=seq(from=-2,to=4,by=0.1)
histxl=c(-2,4)
histyl=c(0,1.6)
# form histogram
poshist<-hist(posthy,main="Predicted strain at PBO BSMs\nextensional offsets",
                    freq=FALSE,br=histbr,xlab="log 0.1 nanostrain (0 = 0 dB, resolution limit)",
                    xlim=histxl,ylim=histyl)
lines(density(posthy),col="red")
text(4.0,1.60,title,pos=2)
text(4.0,1.39,"coseismic offset hist.",pos=2,font=3)
text(4.0,1.25,"gaussian-kernel density",pos=2,col="red",font=3)
fontpar

# find only negatives
negind=which(thy<0)
negsta<-thydf$sta[negind]
negcha<-thydf$cha[negind]
negthy=log10(abs(thy[negind]))+1
negang=ang[negind]
negfil<-stafact[negind]
negtit<-"\ncontractional offsets"
negxax<-"" #"contractional"
negdf <- data.frame(sta=negsta,cha=negcha,thy=negthy,ang=negang,fill=negfil,title=negtit,xlab=negxax)
# form histogram
neghist<-hist(negthy,main="contractional offsets",
                    freq=FALSE,br=histbr,xlab="log 0.1 nanostrain (0 = 0 dB, resolution limit)",
                    xlim=histxl,ylim=histyl)
lines(density(negthy),col="red")
fontpar

##
## Do some table joins to merge theoretical and observed offsets by station and event-name:
##

library(sqldf)
posres<-sqldf('
select  o.V1 as sta, 
        o.V2 as cha, 
        t.ang as ang, 
        t.fill as fill,
        t.thy as thy, 
        1e11*abs(o.V3) as obs,
        o.V3 as rawobs
from    eventdf e, offsetdf o, posdf t
where   o.V1=t.sta and o.V2=t.cha and o.V5=e.eq')
negres<-sqldf('
select  o.V1 as sta, 
        o.V2 as cha, 
        t.ang as ang, 
        t.fill as fill,
        t.thy as thy, 
        1e11*abs(o.V3) as obs,
        o.V3 as rawobs
from    eventdf e, offsetdf o, negdf t
where   o.V1=t.sta and o.V2=t.cha and o.V5=e.eq')

##
## Now start dealing with ggplot2:
##

library(ggplot2)

# opts to add to every panel
opt<-opts(
        panel.title=theme_text(face="bold")
        ,axis.title.x=theme_text()#face="bold")
        ,axis.title.y=theme_text(angle=90)
        ,panel.grid.major = theme_line(colour="gray")
        ,panel.background = theme_rect(colour=NA,fill=NA)
        ,background = theme_rect(colour=NA,fill=NA)
        ,plot.background = theme_blank()
        ) # end opts

# set radial breaks and labels
compass_breaks<-seq(from=0,to=179,by=30)
compass_labels<-c("NS","N30E","N60E","EW","N60W","N30W")

##
## Create function to plot theoretical values
##

plotthy <- function(ang,thy,fill,title,note,vp){
        # radials dont like negative vals, so add and adjust limits and breaks
        toadd<-2
        df=data.frame(ang=ang,thy=thy+toadd,fill=fill)
        jit<-position_jitter(width=0.2, height=0.)
        id<-position_identity()
        labdf=data.frame(ang=180,thy=6,lab=note)
        resdf=data.frame(ang=c(0,180),thy=c(2,2),fill="red")
        p<-ggplot(df,aes(ang,thy,fill=fill),legend=FALSE)
        print(  p+
                # resolution limits
                geom_area(data=resdf,aes(x=ang,y=thy,alpha=0.6))+
                # and theoretical
                geom_jitter(position=jit,colour=alpha("black",0.5),legend=FALSE,size=3
                )+
                scale_x_continuous(
                        limits=c(0,180), breaks=compass_breaks, labels=compass_labels
                )+
                scale_y_continuous(
                        # assume y comes in as log10(0.1 nanostrain)
                        limits=c(-2,4.5)+toadd,
                        breaks=0:4+toadd,
                        labels=c(0.1,expression("1 n"*epsilon),10,100,expression("1"~mu*epsilon))
                )+
                # annotation (wish this were easier)
                geom_text(aes(ang,thy,fill="black",label=lab,size=4),data=labdf)+
                #polar coordinate transformation
                coord_polar(theta="x",start=0)+
                # axis labels
                labs(x="",y="")+
                # plot options
                opt+opts(title=title,legend.position="none")
                ,vp=vp) # end print (to device)
} # end plotthy function

# run the plotthy function twice (pos and neg)
vp<-viewport(height=0.5+0.05,width=c(1/4)+0.05,x=unit(5.8,"in"),y=unit(5.85,"in"))
plotthy(posang,posthy,posfil,postit,posxax,vp)
vp<-viewport(height=0.5+0.05,width=c(1/4)+0.05,x=unit(5.8,"in"),y=unit(1.90,"in"))
plotthy(negang,negthy,negfil,negtit,negxax,vp)

##
## Create function to plot observed values
##

plotobs <- function(dataframe,title,note,vp){
        # radials dont like negative vals, so add and adjust limits and breaks
        toadd<-1
        df<-dataframe
        #summary(df)
        jit<-position_jitter(width=0.2, height=0.)
        id<-position_identity()
        labdf=data.frame(ang=180,thy=6,lab=note)
        resdf=data.frame(ang=c(0,180),obs=c(1,1))
        p<-ggplot(df,aes(ang,obs))
        print(  p+
                # resolution limitsw
                geom_area(data=resdf,aes(
                        x=ang,y=log10(obs)+1
                        ,fill="red",alpha=0.6
                        )
                )+
                # bars, by stations
                geom_jitter(data=df,position=jit,aes(
                        x=ang,y=log10(obs)
                        ,size=3,alpha=0.6
                        ,shape=factor(sign(rawobs))
                        )
                )+
                scale_x_continuous(
                        limits=c(0,180), breaks=compass_breaks, labels=compass_labels
                )+
                scale_y_continuous(
                        # assume y comes in as log10(0.1 nanostrain)
                        limits=c(-2,4.5)+toadd,
                        breaks=0:4+toadd,
        labels=c(0.1,expression("1 n"*epsilon),10,100,expression("1"~mu*epsilon))
                )+
                # annotation
                #geom_text(aes(ang,thy,fill="black",shape=0,label=lab,size=4),data=labdf)+
                # polar coordinate transformation
                coord_polar(theta="x",start=0)+
                # axis labels
                labs(x="",y="")+
                # plot options
                opt+opts(title=title,legend.position="none")+
                # fix the shape scales
                #       agreement:        -1,0,1
                scale_shape_manual(value=c(1,2,16))
                #scale_shape_manual(value=c(15,17,16),aes(size=2))
                ,vp=vp) # end print (to device)
} # end plotobs function

# run the plotthy function twice (pos and neg)
label=c("absolute strain")
vp<-viewport(height=0.5+0.05,width=c(1/4)+0.05,x=unit(9.6,"in"),y=unit(5.85,"in"))
plotobs(posres,c("Observed BSM offset\nat extensional stations"),label,vp)
vp<-viewport(height=0.5+0.05,width=c(1/4)+0.05,x=unit(9.6,"in"),y=unit(1.90,"in"))
plotobs(negres,c("\nat contractional stations"),label,vp)

##
## Create function to plot residuals (observations less theory)
## on log scale, this corresponds to the ratio between the two quantities
##

plotresid <- function(dataframe,title,vp){
        dfna<-dataframe
        # filter and process
        na.omit(dfna)
        df=subset(dfna,obs>0)
        df$logabs<-log10(abs(1e9*df$rawobs))
        df$facet<-factor(sign(df$rawobs))
        print(levels(df$facet))
        levels(df$facet)<-c("Opposite sign","Predicted sign")
        # remove trend from theory and obs
        detrend<-lm(logabs~0+thy,data=df)
        df$resid<-resid(detrend)
        jit<-position_jitter(width=0.2, height=0.)
        id<-position_identity()

        p<-ggplot(df,
                aes(
                        thy, resid,
                        #thy, logabs, #,(abs(obs-10**thy)),
                        shape=factor(sign(rawobs)),
                        group=factor(sign(rawobs))
                ) # end aes
        ) # end ggplot

        print(  p+
                scale_y_continuous(
                          limits=c(-2.25,2.25),
                          breaks=-2:2,
                          labels=c("1/100","1/10",1,10,100),
                          name="magnitude ratio: obs / thy"
                )+
                scale_x_continuous(
                          limits=c(-1.3,3.5),
                          breaks=-1:3,
                          labels=c(0.1,expression("1 n"*epsilon),10,100,expression("1"~mu*epsilon)),
                          name="predicted offset"
                )+
                geom_abline(intercept=0,slope=0)+
                geom_point(aes(alpha=0.45),size=4,stat="identity"
                )+
                stat_smooth(method="loess",span=1.8,fullrange=TRUE,se=FALSE)+
                scale_shape_manual(
                        value=c(1,16)
                )+
                coord_equal(ratio=.85)+
                facet_grid(facet ~ .)+
                opt+opts(
                          plot=theme_rect(),
                          panel.background=theme_rect(),
                          panel.grid.major=theme_blank(),
                          panel.grid.minor=theme_blank(),
                          axis.title.x=theme_text(size=11),
                          axis.title.y=theme_text(size=11,angle=90),
                          title=title,
                          legend.position="none"
                )
        ,vp=vp) # end print
}

## plot the residuals one
##   at this point only need once because symbol encodes sign difference

resdf<-rbind(posres,negres)
vp<-viewport(height=0.7,width=0.25,x=unit(13.4,"in"),y=unit(4,"in"))
plotresid(resdf,"Observed versus predicted strains\nall stations",vp)

Case Study: the M7.2 Sierra El Mayor earthquake of April 4th, 2010

This earthquake occurred on Easter Sunday, 2010, in a predominately agricultural region of northern Baja California; fortunately
the region is not as densely populated as, say, Tijuana and the resulting casualties were less than 5 people. You can read more about
the earthquake from the USGS official report.

The command line to produce this plot (if the R code is in ‘radial.R’) is

Rscript radial.R "2010:094 M7.2 MAYOR"

Here’s the result, after ghostscript conversion (to 32-bit alpha png):

and the vector output: elmayor.pdf

Analysis

Projects such as this (multi-faceted data) often suffer from data visualization. The tool I’ve developed here allows immediate visualization of a number of different aspects to this data. The histograms and kernel-density curves show that offsets we might be able to see are often below the resolution limitation of the instrument; the radial plots echo this with the red filled-region. In the “observed…” radial plot, we see two symbol types: a filled circle indicating the sign of the offset agrees with theory, and an open circle indicating disagreement. As a final measure, the observations are de-trended by removing the predictions. As the far right panel shows, this reveals a wide range of differences (observations that are too large and too small).

It’s clear that most stations often disagree in sign and magnitude from predictions based on simple models of strain redistribution in a region from a large earthquake; the magnitude discrepancy may be up to 100 times greater, or up to 20 times smaller. Our models of predicted strain are sufficient approximations at great distances, so we conclude these instrument either behave unpredictably (causing spurious offsets), or highly localized geologic features control the strain measurement.

Data for El Mayor

The script should standalone; I’ve made the two datasets (observed and theoretical) available in my public directory kook.

⚠️ **GitHub.com Fallback** ⚠️