PreppingSpeciesData - PNHP/Regional_SDM GitHub Wiki

Background

This serves as a guide to examine, compile an preproprocess species data for use in our aquatic modeling approach. As stated in the introduction of this Wiki, the modeling units for this project are the NHD stream reaches (and associated catchments).

Output format

Ultimately, we are looking to produce a CSV file that shows the intersection between aquatic Element Occurence Source Features and stream reaches. Currently (2017-05-23), the input format is a csv file that looks like this:

`OBJECTID,EO_ID,ELCODE,SNAME,SCOMNAME,COMID,RA,EO_ID_ST
1,11104,IMBIV22020,Lasmigona compressa,Creek Heelsplitter,9050941,High,PA_11104

2,9809,IMBIV22020,Lasmigona compressa,Creek Heelsplitter,9052783,Medium,PA_9809

3,4659,IMBIV22020,Lasmigona compressa,Creek Heelsplitter,9053475,Medium,PA_4659

4,4659,IMBIV22020,Lasmigona compressa,Creek Heelsplitter,9053485,Medium,PA_4659

5,4659,IMBIV22020,Lasmigona compressa,Creek Heelsplitter,9053387,Medium,PA_4659

... `

See the GIS methods below for additional guidance on creating this file.

Example Cases

While most of these methods seem relatively straightforward, some special considerations should be done in order to create the best possible training datasets for the models. Here are a few examples case studies:

Simple Point Overlap

Tagging point data to the reaches may be the simplest case as the point is typically in or outside reach/catchment. While this example has two points for the same taxon along the reach (within the catchment), they should both be joined. In our testing so far, we haven't dealt with any locational accuracy issues with any of the source feature data. However, if this was desired, the points could be buffered by the LU distance and treated as polygon input below.

Line or Polygon Overlap

Much like points, lines and polygon source features are just based on the overlap. In this example, there are two line source features along one reach. Therefore, this occurrence should be tagged as being present in reach 9053487.

Polygons should be treated much the same as lines. However, note that in the northern part of this occurrence, there is a slight overlap of the polygon with the next upstream stream reach (9052079). In the strictest sense based on our current methods, this tagged reaches should incluide 9052091 and 9052079, even though a small portion of the polygon intersects the latter. Given the size and conditions present in the stream (based on the photo), it is very likely that the conditons are similar across both catchments. However, this can be left up to the best judgement of an aquatic ecologist and the boundaries can be adjusted as needed. One caveat to the above is when reach has a physical intersect with the source feature, but that reach should not be included in the training dataset as in the final example.

Multiple intersecting tributaries

Some cases can be a little more complex, such as this occurrence of a freshwater mussel that is mapped in the mainstem of a creek, but a small side tributary intersects the occurrence: In this case, we can make the assumption that the mussel only occurs in the larger mainstem of French Creek and likely doesn't occur in the much smaller, and very likely less suitable, tributary. Therefore, for the purposes of identifying reaches to tag as overlapping with the occurrences, reaches 9050009 and 9050027 should be tagged as present, and reach 9050007 should be considered not present. However, if reach 9050007 was a larger stream that is known or possibly considered suitable habitat, it should be included in the training data.

Recommended GIS Methods

We're currently working on series of Python-based tools to assist with the preparation of model training data. In the meantime here are some recommendation procedures for prepping the data.

Still Developing

  • Near Features
  • Spatial Join
  • Merge tables form the three types of source features after spatial joins