Workflow Example examine different geographic area - uaf-arctic-eco-modeling/dvm-dos-tem GitHub Wiki

How to setup and run `dvmdostem` over a different geogrpahic area

For this example, we will select and setup the input driving data for running dvmdostem over a different geographic area.

In this case, we have the input data for a 10x10km selection of pixels around Toolik Field Station, and we have been running one pixel - the default lower left corner, which is "tussock tundra", (or community type 05; CMT05) in the IEM project. What we'd like to do now is find a pixel in a different part of the state that is also "tussock tundra" (CMT05) and prepare a dataset with that pixel so that we can run the same community type, but with a differnt climate.

Therefore, we have three tasks:

Using the full spatial domain input data map, find a "tussock tundra" pixel to use for our experiment.
Prepare a dataset with the selected pixel.
Run the model over the pixel see what differences there are between pixels of the same community type, but with different climates.

Finding a new pixel

From field work, we have a spreadsheet with the lat/lon coordinates of several sites where the community type was classified on the ground. From field notes, we know that "Willow-birch" corresponds to IEM's "Tussock Tundra". Here are the sites, setup as a python variable (2D array):

# Copied from excel sheet to a text editor and then used block editing
# to add commas, etc...
field_sites = [
  ["TT-2"    , "Tussock tundra",                      65.16225,  164.80856,],
  ["TT-3"    , "Tussock tundra",                      65.16626,  164.81894,],
  ["TT-4"    , "Tussock tundra",                      65.16720,  164.81679,],
  ["TT-5"    , "Tussock tundra",                      65.15893,  164.81047,],
  ["TT-WBT-4", "Tussock tundra, willow-birch tundra", 65.16048,  164.81447,],
  ["TT-WBT-5", "Tussock tundra, willow-birch tundra", 65.16603,  164.81821,],
  ["WBT-1"   , "Willow-birch tundra",                 65.16422,  164.83080,],
  ["WBT-3"   , "Willow-birch tundra",                 65.15993,  164.81975,],
  ["WBT-4"   , "Willow-birch tundra",                 65.15939,  164.82047,],
  ["WBT-5"   , "Willow-birch tundra",                 65.16524,  164.82913,],
  ["DSLT-1"  , "Dwarf-shrub, lichen tundra",          65.16550,  164.82451,],
  ["DSLT-2"  , "Dwarf-shrub, lichen tundra",          65.15924,  164.82753,],
  ["DSLT-3"  , "Dwarf-shrub, lichen tundra",          65.16244,  164.83061,],
  ["DSLT-4"  , "Dwarf-shrub, lichen tundra",          65.15875,  164.82368,],
  ["DSLT-5"  , "Dwarf-shrub, lichen tundra",          65.16181,  164.82494,],
  ["AS-1"    , "Alder shrubland",                     65.16483,  164.82314,],
  ["AS-2"    , "Alder shrubland",                     65.15861,  164.82213,],
  ["AS-3"    , "Alder shrubland",                     65.16179,  164.83173,],
  ["AS-4"    , "Alder shrubland",                     65.16428,  164.82372,],
  ["AS-5"    , "Alder shrubland",                     65.15751,  164.82510,],
  ["NAMC-2"  , "Non-acidic mountain complex",         65.16265,  164.82611,],
  ["NAMC-3"  , "Non-acidic mountain complex",         65.16119,  164.82651,],
  ["NAMC-4"  , "Non-acidic mountain complex",         65.15863,  164.82516,],
  ["NAMC-5"  , "Non-acidic mountain complex",         65.15730,  164.82515,],
]

It would be really nice to get a visual on how these points relate to the IEM 1x1km pixel grid. Do all the points fall within one pixel? How close is the IEM vegetation map to agreeing with the classification of the field sites?

Note: There many other ways to carry out the following process using completely different software. If you are familar with GIS software such as Q or Arc, then that might be an easier path to follow.

First I want to simply plot the field sites on a basemap and see where they are. This is accomplished (after some frustration) with the mpl_toolkit.basemap package:

#!/usr/bin/env python

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap

# use field_points variable, defined earlier in the example.

# Resolution options: coarse, low, intermediate, high, fine (clihf)
# High resolution is slow...
m = Basemap(
		width=600000, 
		height=437500, 
		resolution='i', 
		projection='laea', 
		lon_0=-163.606, lat_0=65.56
)
m.drawcoastlines()
m.drawcountries()
m.fillcontinents(color='coral',lake_color='aqua', zorder=0) # <-- zorder IS ESSENTIAL!!!!
m.drawmapboundary(fill_color='aqua')

# Convert from geographic coordinates to map coordinates (meters from origin?)
lats = [i[2] for i in field_sites]
lons = [-1.0*i[3]for i in field_sites] # <-- "-165"
lons2 = 180+(180-(-1*np.array(lons)))  # <-- "195"
x, y = m(lons2, lats)

points = m.scatter(x,y, marker='o', color='k')

plt.show(block=False)

Note: Using IPython.embed() after the plt.show(), you should be able to run the remainder of the snippets in this example.

images/workflow-example-examine-different-area/field-points-zoomed-out.png images/workflow-example-examine-different-area/field-points-zoomed-in.png

Next I want to look at the IEM vegetation map (from SNAP), just to get an idea of what the area looks like. The SNAP data is distributed as a .tif file, and due to the way color scaling works for OSX Preview, it is not very useful to look at the .tif:

images/workflow-example-examine-different-area/osx-preview-tiff.png

So my process is to convert the .tif to .nc and then use the ncview program to view the file:

$ gdalwarp -of netCDF -co WRITE_LONLAT=YES iem_ancillary_data/Landcover/LandCover_iem_TEM_2005.tif temopoary-veg.nc

Using ncview is also nice because it has a bunch of features for looking at both the data in the tif file and also the geographic coordinates associated with the data.

$ ncview temporary-veg.nc

images/workflow-example-examine-different-area/ncview-iem-veg-map.png

Using ncview's readout of both latitude and longitude as well as the pixel coordinates, (i and j), we could carefully examine all the coordinates of the field sites and get an idea where the sites are in relation to the IEM pixels. But that would be very tedious. So ideally we can find a way to overlay the IEM vegetation map pixels with the field site pixels. The first step is simply opening the vegetation mapm and displaying it:

vf = nc.Dataset("../snap-data/temporary-veg.nc", 'r')
plt.figure()
plt.imshow(vf.variables['Band1'][1000:1500,50:550], origin='lower')

We have to specify origin='lower' because the default for matplotlib is to use the upper left corner as the (0,0) point, whereas ncview defaults to lower left. Also, we've taken a crude slice of the data, just to zoom in on our area of interest a bit more. The values to use for the slice were chosen by a combination of trial and error, and examining the (i,j) pixel coordinates with ncview.

To get the field site points overlaid on the image of the IEM vegetation map data, my first instinct was to use another call to plt.imshow(), but then I realized that I would need to be able build a regular rectangular grid of points and then color in only the pixels that contain the field-site (lat, lon) pairs. This is not totally straight forward as the IEM vegetation map projection means that the longitude and latitude don't vary perfectly linearly across the map. You can see this by viewing the "Lat" or "Lon" variables in temporary-veg.nc using ncview.

Then I realized that the mpl_toolkits.basemap package has a facility for translating between latitude and longitude and "map coordinates" (I think in meters from the origin of the projected map?). So instead of plotting the field points on the vegetation image, I ended up plotting the IEM vegetation points on a Basemap with the field points.

This method of plotting will draw one circle for every IEM vegetation pixel (1x1km cell), and I've found that the plt.scatter() function can be slow with many points, so it makes sense to select a subset of the full IEM vegetation map. Also, it was also necessary to re-work the Longitude coordinates to fit the form expected by mpl_toolkits.basemap (from +/-180 degees to 0-360 degrees). For a subsequent step, it was also helpful to "flatten" the 2-D array that was returned by slicing the data from the netcdf file.

LATS = vf.variables['lat'][1200:1275,200:275].reshape(5625,)
LONS = 180+(180-(-1*np.array(vf.variables['lon'][1200:1275,200:275].reshape(5625,))))

Then we can find our X, Y coordinate vectors using the Basemap() instance we created above:

X, Y = m(LONS, LATS)

Finally, we can actually plot the points on the map, assigning a different color for each vegetation class.

veg_values = vf.variables['Band1'][1200:1275, 200:275].reshape(5625,)
color_map = ['r', 'orange','y','g','c','b','m','k','w']
color_vec = [color_map[i] for i in veg_values]
veg_legend = [
  "BLANK",
  "black spruce forest",
  "white spruce forest",
  "deciduous forest",
  "shrub tundra",
  "tussock tundra",
  "wet sedge tundra",
  "heath tundra",
  "maritime forest"
]
color_label_vec = [veg_legend[i] for i in veg_values]


X, Y = m(LONS, LATS)
points = m.scatter(X, Y, marker='o', color=color_vec)
for i in np.arange(0,8):
  p = plt.scatter(np.arange(0,8), np.arange(0,8), marker='o', color=color_map[i], label=veg_legend[i])
plt.legend()
plt.show()

images/workflow-example-examine-different-area/veg-map-field-points-zoomed-out.png

Zooming in, we can find the field points in the lower part of the grid of IEM vegetation map points. All the field points fall within a roughly 1x1km area and we can see that the IEM veg map points are on a regular grid. Also, while the field points cover 6 vegetation types, the pixels near the field points are only 'shrub tundra'. The nearest 'tussock tundra' pixel is about 4km to the east.

images/workflow-example-examine-different-area/veg-map-field-points-zoomed-in.png

Note: I am not sure if IEM vegetation points plotted using this method represent the center of the 1x1km pixels or the upper left corner! To figure this out would require digging a bit into the metadata for the original IEM .tif file and looking carefully at how the GDAL utilities handle the conversion from .tif to netCDF.

One way to find the (i,j) coordinates of tussock tundra pixel is to run the plotting commands again, but experimenting with the slice of data to grab from the IEM vegetation map file. For example, using slice 1205:1215, 228:238, we can easily count the offset to the closest for the 'tussock tundra' pixel in the IEM Vegetation map. We can actually just use this slice (or these pixel coordinates) to create a sample 10x10km input dataset for dvmdostem, using the create_region_input.py script.

images/workflow-example-examine-different-area/cropped-veg-map-field-points-overlay-zoomed-in.png

Preparing the dataset

With the slice selected, now we can move on to preparing the dataset. This is fairly straightforward using the create_region_input.py script. There are a few small idiosyncracies with the script, but with those addressed, it is not difficult. Two of the odd points with the script are: on some computers, it won't create datasets smaller than 10x10 pixels (some kind GDAL error), and the default location where the prepared datasets end up is usually not quite right so you have to manually move the files into their final home.

With some trial and error, I discovered that the gdal_translate call that is is used to do the subsetting in the create_region_input.py script does it's referncing from the upper left corner, while ncview and matplotlib (due to passing origin='lower' to plt.imshow()) use the lower left corner for the zero-zero point. So to find the offset we need, we have to convert. For this it is handy to find the pixel dimensions:

$ gdalinfo ../snap-data/temporary-veg.nc 
Driver: netCDF/Network Common Data Format
Files: ../snap-data/temporary-veg.nc
Size is 2560, 1850
Coordinate System is:
PROJCS["Albers_Conic_Equal_Area",
...

Then we can convert our offset from the lower left corner to an offset from the upper left corner:

1850 - 1215 = 635

Now we can use the input-creation script. For prototyping we will ask for just 10 years because it is faster. Actually for preparing the vegetation dataset, the number of years is not important, as it does not have a time dimension, but when we are working with the climate, (or fire), preparing all the years can take a bit of time.

$ ./scripts/create_region_input.py --tifs ../snap-data/ --tag SewardPen --years 10 --yoff 635 --xoff 228 --xsize 10 --ysize 10 --which veg
Parsing command line arguments
args:  Namespace(crtf_only=False, outdir='some-dvmdostem-inputs', start_year=0, tag='SewardPen', tifs='../snap-data/', which='veg', xoff='228', xsize=10, years=10, yoff='635', ysize=10)
Will be looking for files in:       ../snap-data/
Will be (over)writing files to:     some-dvmdostem-inputs/SewardPen_10x10
Creating a vegetation classification file, 10 by 10 pixels. Fill with random data?: None
Input file size is 2560, 1850
0...10...20...30...40...50...60...70...80...90...100 - done.
DONE

This creates a file in some-dvmdostem-inputs/SewardPen_10x10 that we can then compare with our plotted dots to make sure that we have the offsets specified correctly:

images/workflow-example-examine-different-area/cropped-veg-map-field-points-overlay-zoomed-in.png

images/workflow-example-examine-different-area/ncview-cropped-veg-map.png

Finally, with this done, we can create the entire input dataset.

$ ./scripts/create_region_input.py --tifs ../snap-data/ --tag SewardPen --years -1 --yoff 635 --xoff 228 --xsize 10 --ysize 10 --which all

This takes about 30 minutes on a laptop; the script has not been parallelized, and could probably be made much more efficient.

The first time thru this, I discovered a few small glitches with the create_region_input.py script that I think I have fixed and pushed to upstream (Oct 2016).

Run model over new pixel

Now that we have a complete input data set with a "tussock tundra" (CMT05) pixel for the Seward Peninsula, we can run the model and see how things look. However there are two small things we need to do first:

Set the run-mask.nc so that the correct pixel is run.
Setup the parameter file with the latest "PEST discovered optimal values"

Modifying the run-mask.nc file

Looking at the (i,j) pixel coordinates with ncview, we find that pixel i=6, j=4 for the closest CMT05 pixel to our field sites. Knowing that, we can open the run-mask.nc file and edit it with a little Python:

import netCDF4 as nc

mask = nc.Dataset("DATA/SewardPen_10x10/run-mask.nc", 'a')

# turn off the default pixel
mask.variables['run'][0,0] = 0

# turn on the right pixel
mask.variables['run'][4,6] = 1

mask.close()

Now just to check that we haven't transposed the coordiates, we check with ncview that we enabled the correct pixel and turned off the origin pixel:

images/workflow-example-examine-different-area/ncview-new-mask.png

Setting up the parameters

For this example, I want to compare the model's performance at two different geographic locatons that have the same ecosystem (community) type. Also I want to run the model using the most recent PEST-discovered optimal parameters. I could manually find the parameter values from the PEST output file (tussock_full.rec) and then manually enter those values into the dmvdostem's 'cmt_calparbgc.txt' file, but that sounds tedious and error prone. So instead I cobbled together the following workflow to get the PEST parameters into place.

First run this command that selects the values from the PEST outputs and gets it close to the format we want:

$ grep -A 114 Optimal /big_scratch/tem/oct-10-pestpp-posterior-results/master-00000/tussock_full.rec | tail -n+5 | paste -d " " - cmt_calparbgc.par > cmt_calparbgc_optimal.par

$ head cmt_calparbgc_optimal.par 
    ------------  ------------  double point
    cmax0                    1  cmax0          2.5           1.0    0.0
    cmax1              59.7244  cmax1          60.40         1.0    0.0
    cmax2              58.7601  cmax2          50.7          1.0    0.0
    cmax3              151.063  cmax3          150.6         1.0    0.0
    cmax4              47.5764  cmax4          50.40         1.0    0.0
    cmax5              92.6898  cmax5          90.1          1.0    0.0
    cmax6               70.265  cmax6          70.7          1.0    0.0
    cmax7              87.9704  cmax7          90.3          1.0    0.0
    nmax0              1.84077  nmax0          0.9           1.0    0.0

Then notice that this file has some extra dashes in the first line and two extra columns, so use a text editor with block select mode to remove the extra columns and fix the first line so that the file looks like this:

$ head cmt_calparbgc_optimal.par
    double point
    cmax0                    1   1.0    0.0
    cmax1              59.7244   1.0    0.0
    cmax2              58.7601   1.0    0.0
    cmax3              151.063   1.0    0.0
    cmax4              47.5764   1.0    0.0
    cmax5              92.6898   1.0    0.0
    cmax6               70.265   1.0    0.0
    cmax7              87.9704   1.0    0.0
    nmax0              1.84077   1.0    0.0

Finally, use the PEST utility, TEMPCHEK, to copy the values from this ".par" file into a new file that we can later re-use directly with dmvdostem.

Note: Might be good to start including this "pest optimal parameter file" in the bundles of output I've been putting on atlas.

$ tempchek cmt_calparbgc.tpl /big_scratch/tem/oct-10-pestpp-posterior-results/cmt_calparbgc_pestoptimal.txt cmt_calparbgc_optimal.par

Running the code

Now to run the code, change to the dvmdostem directory. I have chosen to run this experiment on my local laptop, so first I need to get the "pest optimal" parameter file that we just created (and stored on atlas) downloaded to my local computer:

$ scp atlas:/big_scratch/tem/oct-10-pestpp-posterior-results/cmt_calparbgc_pestoptimal.txt parameters/

$ mv parameters/cmt_calparbgc_pestoptimal.txt parameters/cmt_calparbgc.txt

I am not worried about overwriting the original cmt_calparbgc.txt file because I can always get it back by simply checking it out from git.

Next step is to modify the "IO" section of the config.js file so that the model will use the dataset we just created. And then finally run the model:

$ ./dvmdostem --inter-stage-pause -c -p 100 -e 1000 -s 250 -t 109 -n 100

And in a different terminal window at each inter-stage-pause:

$ tar -czf pr-data.tar.gz /tmp/dvmdostem

Finally, make a new directory on atlas where I can upload and share the results:

$ ssh atlas
$ mkdir /big_scratch/tem/oct28-pestpp-sewardpen-results
$ logout

And upload the files:

$ scp pr-data.tar.gz atlas:/big_scratch/tem/oct28-pestpp-sewardpen-results
$ scp eq-data.tar.gz atlas:/big_scratch/tem/oct28-pestpp-sewardpen-results
$ scp sp-data.tar.gz atlas:/big_scratch/tem/oct28-pestpp-sewardpen-results
$ scp tr-data.tar.gz atlas:/big_scratch/tem/oct28-pestpp-sewardpen-results
$ scp sc-data.tar.gz atlas:/big_scratch/tem/oct28-pestpp-sewardpen-results

Then on my local machine, I can take a quick look by moving the data files into their own directory and running the generate-html-viewer.py script. I already have the plots generated for the october 10th run using the PEST optimal parameters and the default Toolik tussock tundra site.

$ mv pr-data.tar.gz ~/Downloads/oct28-seward-pen-cmt05-pest-optimal-params/
$ mv eq-data.tar.gz ~/Downloads/oct28-seward-pen-cmt05-pest-optimal-params/
$ mv sp-data.tar.gz ~/Downloads/oct28-seward-pen-cmt05-pest-optimal-params/
$ mv tr-data.tar.gz ~/Downloads/oct28-seward-pen-cmt05-pest-optimal-params/
$ mv sc-data.tar.gz ~/Downloads/oct28-seward-pen-cmt05-pest-optimal-params/

$ cd ~/Downloads/oct28-seward-pen-cmt05-pest-optimal-params/
$ ~/Documents/SEL/dvm-dos-tem/calibration/calibration-viewer.py --bulk \
  --save-name SewPen-cmt05-pop --from-archive pr-data.tar.gz \
  --archive-series eq-data.tar.gz sp-data.tar.gz tr-data.tar.gz sc-data.tar.gz

$ cd ~/Downloads
$ ~/Documents/SEL/dvm-dos-tem/scripts/generate-html-viewer.py --left oct-10-pestpp-posterior-results/ --center oct28-seward-pen-cmt05-pest-optimal-params/

$ open -a Safari.app three-view.html

Here are a few plots to glance at. To look at the remainder, you can download the files from atlas and run the calibration-viewer.py and the generate-html-viewer.py scripts.

images/workflow-example-examine-different-area/compare-env.png

images/workflow-example-examine-different-area/compare-ncycle-pft0.png

images/workflow-example-examine-different-area/compare-ncycle-pft2.png

images/workflow-example-examine-different-area/compare-soil-pft1.png

images/workflow-example-examine-different-area/compare-vegetation-pft1.png