Vector Tutorial (ArcMap section) - NetLogo/GIS-Extension GitHub Wiki

Step 2: Importing the Data & a Tour of the Interface (ArcMap)

When you first open up ArcMap, you will be greeted with a welcome window asking you to choose a template. The default works just fine for us so you can go ahead and click OK and continue on.

Before beginning real work with ArcMap, it is important to get something out of the way: ArcMap is, well, a bit tempermental. In addition to the rather steep learning curve, it is a program that takes its sweet time, doesn't like being rushed, and will often just decide it needs to take a break and crash on you. To make your time with ArcMap less frustrating, I have two suggestions. First, if you click on a button and nothing seems to happen, I would advise waiting a bit before trying to hit the button again -- just give it some time to think. Second, save often. Save your map often (Control + s works just fine) and save intermediate versions of data layers often (this will make sense in a bit) so that if ArcMap does decide to freeze or crash on you, you will not have lost all that much work. With those disclaimers out of the way, lets import some data

To import files into ArcMap, we use the Catalog Panel, located on the right hand side of the window. If you click on the tab that says Catalog, the Catalog Panel will show up, like so.

CatalogPanel

From here, click on the Connect to Folder Button, highlighted above with a red rectangle, and a window will pop up that allows you to select a folder to "Connect" to ArcMap. Select the folder where you saved your datasets and hit OK.

Now, it probably occured to you that this is a little bit less than straightforward. Why do you have to do this whole folder connection thing if you just want to add some data to your map. Well the first answer is that you actually can do just that by dragging any ".shp" file from your file explorer onto the ArcMap window, and for small projects this makes a lot of sense. But the second answser is that ArcMap has its own file system of sorts to organize all the geospatial data it knows about. This data can come from folders on your hard drive, like the datasets we'll be working with, but it could also come from a local server, a remote server, a geodatabase etc. To make working with all those different sources easier, there is a common intermediary layer on top of all of it: The Catalog, which we can interface with using, you guessed it, the Catalog Panel we just opened.

Now that you have a folder connection to your data all set up, you can click on the little "+" icon to the left of your new folder connection and see the contents within. Start by dragging in the ne_10m_airports.shp dataset onto your map. (Take note, the order actually is important here, as we'll see.) You can drag the ne_10m_populated_places_simple.shp dataset next (the Catalog may close each time, but you can keep it open by clicking the push pin icon), and finally the cb_2019_us_state_20m.shp dataset as well. After dragging this final layer, you will get a "Geographic Coordinate Systems Warning", which you can go ahead and close. If you're curious, read below for an explanation, otherwise, you can skip this next paragraph.

Geographic Coordinate Systems are, via a very rough similie, like units of measurement. If you want to work with measures of distance with someone else, you both need to agree on a unit. You can convert between, say, feet and meters, but it doesn't make much sense to have an architectural schematic that has some measurements in feet and some in meters. Because we dragged on the natural earth layers first, ArcMap set its own geographic coordinate system, its own internal units, to the coordinate system of those files, "WGS84" (a good default if you ever have to choose a geographic coordinate system to export in.) However our states data, being from the US Census Bureau and depicting only North America, uses a different coordinate system that is better suited to working with North American datasets. (This is where the units of measure analolgy breaks down: different geographic coordinate systems are better or worse at measuring position on different parts of the globe, and its not like centimeters are somehow better at measuring wood or feet better at measuring steel.) This warning is simply telling us that ArcMap is about to do some conversions, and that as a result, there may be some minor alignment and/or accuracy problems. At the very zoomed-out scale we're working with, however, we won't run into any issues. InterfaceAfterImport

Your interface now should look something like this, with a few possible exceptions.

First, the colors for all these symbols are chosen randomly. You can change them by double clicking on the symbol (the little dot for point data and the little rectangle for polygon data) in the layers panel on the left and modifying them in the "Symbol Selector" Panel that pops up, either by changing their color on the right or picking from a preset on the left.

Second, the order of layers may not be the same as the ones shown. To reorder the layers so that, say, the airports are not occuluded by the states layer, just drag the layer names around in the layers panel on the left.

A Quick Tour of The Interface

First, lets take a look at the navigation tools, found on the second toolbar from the top, all the way on the left. The hand tool Hand Pans around the map, and, with it selected, you can also zoom in or out by scrolling. Moving over to the left, you have the zoom in zoomIn and zoom out ZoomOut Tools which do exactly what they say on the tin. (One quick tip about the zoom in tool: you can draw a rectangle around the area you wish to fill up your screen and it will zoom and pan accordingly.) Next is one of the most useful tools for navigation, the "Full Extent" tool FullExtent which will zoom such that your entire map is filling the screen so that you can reset and reorientate while navigating. Finally, there is the undo and redo zoom tools, technically called the "Go back to Previous Extent" tool and the "Go to Next Extent" tools. These allow you to return to prior zoom levels if you ever accidentally zoom somewhere you didn't intend to.

Much of the rest of these tools will come up naturally throughout the course of this tutorial, but one last useful tool you should know about is the Identify tool Identiy . If you select this tool and click on any shape feature, a panel will show up and display someinformation about that feature. For example, if I were to select the Identify Tool and click on Texas, a window should pop up showing information like this: IdentifyDialog

This panel represents everything ArcGIS knows about that feature representing Texas. Each of these fields and values represent the all the data for Texas stored in the attribute table for the cb_2019_us_state_20m dataset. This is the most direct way to investigate individual features in ArcMap and can be quite useful.

Step 3: GIS Data Cleaning (ArcMap)

Like all data, GIS data is rarely exactly ready to use for your application right out of the box. For non-spatial data, perhaps you would want to filter out all data points that are more than five years old and so you could use Excel and sort by collection date and delete all rows that are too old. Or perhaps you would write an SQL query to only grab recent data. Conveniently, GIS tools can do this kind of table-based cleanup, but the real power of a GIS system is in its spatial intelligence.

Selection Tool

Let's start simple. Since we don't plan on including any airports in Alaska, Hawaii, or Puerto Rico, let's go ahead and filter out those three places. We could do this non-spatially in the attribute table, but the quickest and easiest way is just to drag a selection box over the whole continental United States and save it as a new shapefile.

First on the layers menu on the bottom left of the window, click the checkboxes of both of the point layers so that the only active layer is the cb_2019_us_state_20m layer. Next, select the Select Features tool SelectRectangle . (Remember that You can hover over any tool's icon to get its name.) Once you have the selection tool active, drag a rectangle around the continental United States. All the selected features should gain a bright blue outline. Now right click on the cb_2019_us_state_20m layer and click on Data > Export Data. ExportDataDialog A window like this one should pop up. Make sure that the "Export" drop-down on top is set to "Selected Features" and then click on the "Browse" button, highlighted above in red. Navigate to where you want to save the data (you may need to create a new folder connection like we did when importing the data) and then give it a dscriptive name like states_continental. In the "Save as type" drop-down menu, select "Shapefile". This is the only format ArcMap can export to that NetLogo will read, so it makes sense to do just about everything in shapefiles instead of dealing with the other ArcMap-specific formas. Once you've made this change, you can go ahead and click Save. and then afterwards, OK on the "Export Data" dialog box. ExportDataNavigator A pop-up will appear asking if you want to "add the exported data to the map as a layer?". Click yes and it should appear. Right click on the old cb_2019_us_state_20m layer and click "Remove" to remove it from the map.

Working non-destructively like this is a useful habit to get into. Often GIS operations take a not insignificant amount of time and so it would be a shame for you to make some not-undo-able error and have to start back at the base data and work your way back to where you were. By making each step of the process a new shapefile, your directories may get a bit crowded (hence the need for descriptive names) but you have the peace of mind that you can always go back to prior versions of shapefiles.

Cookie Cutter

Now that we've narrowed down the states that we are interested in we have to do the same for the airports. First, enable the ne_10m_airports layer by clicking on its checkbox. What we want is to get rid of all those airports that are not within the continental states. We could do this manually with the selection tool again, but here we can leverage a GIS tool to make our lives easier.

Open up the Geoprocessing menu and click on Clip tool.

ClipToolDropDown

This will open up a new Window like so. As the tooltip on the right demonstrates (if your tooltips aren't showing up, hit the Show Help >> button -- they really are quite useful), the Clip tool acts like a cookie cutter. The input layer is the cookie dough and the "Clip Features" layer is the cookie cutter. Everything inside the overlay layer is kept and everything outside is discarded. ClipDialogBox Since we want to keep all of the airports within the continental US, we select the ne_10m_airports layer as our input layer and our states_continental layer as our "Clip Features" layer, or cookie cutter. Give the output a descriptive name like ne_10m_airports_continental_US and hit OK. After a bit the newly clipped layer will appear in the map and you can remove the original ne_10m_airports layer.

Before moving on from the Clip Tool, it would be remiss not to mention that the Clip Tool can be used to clip more than just points, it can clip line and polygon features as well. Say for example you had a line dataset representing the all the highways in the United States, but only wanted the parts of those highways that were inside of the state of Texas. You could clip the highways layer by the Texas layer and it would "cut" the highways appropriatly so that everything outside of the cookie cutter of Texas would be discarded.

Using the Attribute Table

We have all of the airports in the US, but now we want to narrow down our set of airports to only relatively major ones. Thankfully the dataset we are using has a field called "scalerank". While this metric was created to allow map makers to specify at what zoom level a certain airport should be displayed, it can be used as an imperfect, but good enough, metric to select a few major airports for our toy model. Since we want to narrow down our data by using one of its fields, one convenient way to do so is to use the layer's attribute table.

We've discussed attribute tables a few times so far but have yet to open one up, so lets go ahead and do that now. Right click on the newly created ne_10m_airports_continental_US layer and click on Open Attribute Table.

AttributeTable

With this table you can navigate around by scrolling or using the scroll bars. If you had any features in the dataset selected, you could click on the Show Selected Records button (highlighted in red at the bottom) to only display those rows associated with the selected features. If you wanted to invert the selection, such that all currently selected features are deslected and vice versa, you can click the Switch Selection button, highlighted in green at the top.

Getting back to our task of grabbing only major airports, double-click on the column header for "scalerank" to sort all the rows by this value. Then click on the row header (the littel box on the left of each row) for the first row, hold down "Shift" and click on the row header for the twlfth row (San Francisco Int'l airport, or SFO). The selected columns, all the rows with a scalerank of 2, should be highlighted blue like so:

AttributeTableAirportsSelected

Once you have these features selected, close the attribute table and you should see the twleve points on the map cooresponing to the twleve rows we selected highlighted in blue as well. This is an important thing to note: selections are shared between the attribute table view of a dataset and the spatial/map view of a dataset. Now that we have selected these major airports, we can export these selected features as a new layer like we did before by right clicking on the ne_10m_airports_continental_US layer and clicking on Data > Export Data and exporting it as a shapefile like we did after selecting just the continental states. Give it a name like ne_10m_airports_major_continental_US.

To quickly illustrate some more features of ArcMap, we are going to look at an alternate method of isolating major airports like we just did, this time by deleting features we don't want instead of exporting those we do. Follow along if you like.

To modify features in ArcMap, whether that be modifying their shape, the values of their fields, or outright deleting them, you need to be inside of an editing session. While inside an editing session, all changes you make are temporary until you manually save them. To start an editing session, click on the Editor Toolbar button EditorToolbarButton which will open up the editor toolbar itself, which looks like this:

EditorToolbar

Click on Editor and then Start Editing. A pop up will appear asking what layers you want to edit. Select the ne_10m_airports_continental_US layer and the click OK. If you get an error about the "Spatial reference", just click through it by clicking Continue.

Now that we're in an editing session, we can select the airports that we want to delete. To do so, open up the attribute table for the ne_10m_airports_continental_US layer. If you click within any of the cells of the attribute table, you'll notice that you can edit them like you would an Excel spreadsheet now that we're inside an edting session. We could once again sort by scalerank and manually select those airports with a scalerank of 2 and then use the Switch Selection button to select the airports we want to delete, but instead we're going to use the Select by Attributes feature. Open up the tool by clicking on the Select by Attributes button SelectByAttributesButton which will open up the dialog box below.

SelectByAttributes

This dialog box allows us to construct SQL SELECT queries to create selections based on the field values of features. (For more about the specifics of the SQL dialect used by ArcMap, you can click on the Help button in this toolbar.) The top list displays all the field names you can use in your queries, which you can double-click to insert into the bottom text field. Start by double-clicking on the "scalerank" field name to insert it into the statement-builder, then click on the "<>" button (the not equals symbol in this dialect of SQL) to insert it as well. At this point you can click Get Unique Values and a list of all unique values for scalerank for all the features in the table will be listed. Double-click the "2" to insert it as well, such that the final statment (including the implied prefix supplied by ArcMap) reads: SELECT * FROM ne_10m_airports_continental_US WHERE "scalerank" <> 2 Click Verify to ensure that the statement is entered correctly, then hit Apply. Now, if you move the attribute table so you can see the map below, you will see that most of the airports will be slected and highlighted in blue. To delete these small airports, hit the Delete Selected button DeleteSelected at the end of toolbar.

Now remember, because we started an editing session to makes these changes, we have to now save and exit the editing session to keep them. To do so, go back to the editor toolbar and underneath were you earlier clicked to start the editing sesion, click Stop Editing and, when prompted, keep the changes. You will now see that only the same twleve airports from our earlier method remain.

Whichever method you used, we are going to go forward assuming that you have a layer with only those twelve airports which we will refer to as ne_10m_airports_major_continental_US

Spatial Joins

By now you should have a set of 49 polygons representing the 48 continental united states and DC as well as a set of 12 points representing a few major US Airports. If you look around in attribute table for the airports layer or inspect one of them with the identify features tool, you will see that Natural Earth provided us with plenty of useful information about each airport including its name, IATA airport code, its wikipedia unique ID, and more. However it does not contain the key piece of information that we need for our virus transmission model: the size of the population it serves.

Now, there are a few ways we could go about this. First of all, since we are only dealing with 12 data points, manually entering the data is a perfectly reasonable, and probably the most accurate, choice. This allows for judgement calls like noticing that Newark Airport is on our list but not JFK or LaGuardia from New York City, so we should probably make Newark's population number encompass New York City's population as well. (This is another bit of evidence that scalerank is an imperfect metric as JFK airport served around 16 million more passengers than Newark in 2019 according to wikipedia.) But for the sake of this tutorial I'm going to show you a different way to estimate population served using one of the most powerful tools in GIS -- the spatial join.

A traditional non-spatial table join is any operation where data from two different tables is combined or joined together into a third new table with some amount of information from both. For example, an SQL inner join could be used when you have a table with literacy data about each state in one table and education funding data about each state in another. You can use an inner join to match the data up by state so that in the end you have a table with both literacy and education funding data by state.

Spatial joins are similar except that instead of matching up data rows based on a shared key, table rows are matched up based on spatial information like polygon intersection, point distances, etc. Given our US states shape dataset and our populated places dataset, various spatial joins could help us answer questions like: "What is the largest City in each state?" or "How many cities with population over 50,000 are there in each state?". Or, if we change our perspective from getting information about the cities within each state to getting information about the state each city is in, we could use a spatial join to answer the question "What state is each city in?" since our dataset doesn't have that information already.

For our purposes we are going to be asking a fairly simple question (with a more complicated question as an optional addendum): what is the population of the city nearest to each of our major airports.

To answer this question, we will need to open up the Spatial Join tool. You can find it by clicking on Geoprocessing and then slecting Search for Tools. In the panel that pops up on the right, you can type "Spatial Join" and it should pop up under "Spatial Join (Analysis)". Double-click on it and the Spatial Join dialog box should pop up.

Now, there are a lot of settings to change here, so we'll go through them one-by-one, but before you hit OK your dialog box should look like this:

SpatialJoinExample

Target Features
- The "Target Features" field represents the layer you want the data being joined to. In our case, we want data from cities to be joined to our exisiting data on airports, so we set this field to ne_10m_airports_continental_US.
Join Features
- The "Join Features" field represents the other half of the equation above, the layer we want to take the data from, in this case, our cities layer, ne_10m_populated_places_simple.
Output Feature Class
- This is where the output file should be stored. Give the file a name like final_airports_with_populations.
Keep All Target Features
- If unchecked, then any features that do not have match in the join will be discarded. If checked, all features from the target layer will be preserved even if there isn't a match for them in the "Join Features" layer.
Join Operation
- Because we are only matching the one closest point, the value of this setting doesn't matter, but if you select it you can read about it in the help menu on the right if you're curious.
Field Map of Join Features
- This list allows you to be more specific about how each field in each table get mapped together. Again, in our simple case, we don't need to mess with this one but you can read more about it in the help pane.
Match Option
- This setting allows us to determine how, geographically, the features from the two layers are joined together. For example, if we were mapping US state polygon features onto US Cities point features, then we could use INTERSECT or CONTAINS. For us, since we want to Join the single closest city to each airport, we will use CLOSEST

Once you hit OK, wait a bit and you will see the new layer show up in the layers panel. If you remove the old airports layer and use the inspector on any point in the new layer, you should see that in addition to all of the fields the ariports had initially, they also have fields from their nearest city. For example, if I click on Seattle's airport in the top left of the map, I will see that it still has the airport fields like its name and its abbreviation, and once I scroll down I will see that it also has fields from the Seattle point in the populated places layer like the city name and, most importantly, the population we need for our model.

Congratulations, you've completed the ArcMap portion of this tutorial. Before we move on let's review what we learned how to do:

Import data into ArcMap
Navigate around the map and inspect features
Manually select and selectively export data using the map window
Clip datasets by another polygon (cookie cutter)
Manually select and selectively export data using the attribute table
Perform a basic spatial join operation to combine two geospatial datasets.

If you want to learn more about how to use ArcMap, I recommend looking at the official documentation and tutorials from Esri. Among other things, these tutorials will teach you how to use symbology tools to create visually informative maps, perform more powerful geospatial analysis and processing operations, and work with different map projections and geospatial datums or reference frames.

More Comprehensive Population Data: (Advanced; Optional)

One flaw with our final dataset is that it assumes that each airport only services the single city closest to it, however, if you zoom in near O'Hare international airport you can see that there the city of Evanston is only a little bit farther away than Chicago and has a population that we should take into account, and we've already discussed the glaring issue with Newark Airport not taking New York City's population into account. To remedy these errors we need to employ a more sophisticated multi-step spatial analysis.

At a high level view: we are going to create a 50 mile buffer around each of our airports and sum up the populations of every city that falls within that buffer and then transfer that value back to the airport. This is not the only way to do this operation (in fact you can do it in a single spatial Join) but it does illustrate that often when working with a GIS program you may need to create auxiliary layers and combine a number of different geoprocessing tools to get the value that you want.

1. Clip The Populated Places Layer by the Continental States:

While it is possible that someone in Vancouver might cross the border to fly out of SEA-TAC in Seattle, it isn't all that likely, so let's go ahead and only consider US cities. Use this opportunity to check if you can use the clip tool to clip the ne_10m_populated_places_simple layer by the states_continental layer. Remember that one layer is the cookie cutter and the other is the cookie dough. Go ahead and save that clipped layer as us_cities.

2. Create a 50 mile Buffer Polygon Around Each Airport:

To create a buffer around a point, you use the buffer tool. You could search for it like we did with the spaital join tool, but you can find it quickly by clicking on Geoprocessing > Buffer. It should look something like this:

BufferDialog

The buffer tool, like the help pane illustrates, creates series of circular polygons around each of our points. (The buffer tool can also be used on lines and polygons, in which case it "grows" them by a given distance.)

Set the input layer to our ne_10m_airports_continental_US layer, the output file name to ne_10m_airports_continental_US_50mi_buffer.shp, and the distance from to 50 and the unit from "Decimal Degrees" to "Miles". Since none of our buffers will overlap, we don't need to think about what behavior we want for dissolving.

Once you hit OK, a new layer should appear like so:

CreatedBuffers

Now if you take a look at this map and think that the buffer around Seattle's airport looks a little squished, well, you're not wrong it totally is on this map. However, if we take this same exact shapefile we just created and open it up in a virtual globe like ArcGlobe (another program made by Esri; you can think of it as a mashup of ArcMap and Google Earth), then Seattle's circle will look, well, actually circular. Here's Seattle's buffer on the left followed by Chicago and NYC's. BufferSizesInArcGlobe

This illustrates an important thing to keep in mind when working with GIS: even though you will spend all of your time looking at 2D maps, the globe just isn't 2D and will refuse to cleanly be smooshed down onto just two dimensions. The buffer tool, by default when working with Geographic Projections like WGS84 (the projection we are using), will calculate distance based on geodesic distance, or distance according to the shortest arc you can draw between two points on a globe. This is why on a globe, even a 2D "image" of a globe like the one on a computer screen, the circles all look circular, but when you try to flatten them out onto WGS84, they get distorted.

If you were to change the display coordinate system however, the circles would be distorted differently. To change coordinate systems in ArcMap, right click the map display area and click Data Frame Properties at the bottom. Switch over to the "Coordinate System" tab and pick one of the many choices built into ArcMap. If we switch over to "USA_Contiguous_Equidistant_Conic" (which you can find by searching for "102005", an ID that ArcMap recognizes), then suddenly our circles look a lot less distorted because this coordinate system was designed to minimize distance distortion within the continental US. (If you want to go back to the prior coordinate system, you can search for "GCS_WGS_1984" in the search box.)

ReprojectedBuffers

Step 3: Spatial Joins, Revisited

If you turn the us_cities layer back on, you should see that we now have circles of 50 mile radius surrounding each airport and containing a number of cities beyond just the single nearest. Our challenge now is to take each city within each of those buffers and sum up their populations. To do so, we are going to return to our old friend, the Spatial Join.

This time around, we want the target layer to be ne_10m_airports_continental_US_50mi_buffer.shp and the join layer to be the recently created us_cities layer. This way, our 12 buffer polygons take on the values from all the cities within their boundaries. Give it a name like airport_buffers_with_surrounding_populations.shp and make sure the "Join Operation" is "JOIN_ONE_TO_ONE". If we were to hit "OK" now, then not much useful would happen because the default behavior when there are multiple possible values to grab during a spaital join is just to grab whichever one ArcMap happens to see first. However, when working with numeric data, we can change that "Merge rule" to any number of things from the mean, the mode, the minimum, maximum, standard deviation, etc. For us, we just want to sum up all the populations, so the "Sum" merge rule is what we're looking for. Now, merge rules are defined per field, so scroll down in the center "Field Map of Join Features" table until you find pop_max (Long) . Right click on it, go to "Merge Rule", and select "Sum". After that, you can hit OK and wait for the operation to complete.

Spatial Join Merge Rule

Step 4: Joining back to the airports point layer.

Now that we have a layer of buffer polygons that know their own aggregate population, we can spatial join that information back onto our airports point layer. To do so, open up the Spatial Join tool one last time and set the target layer to be thene_10m_airports_continental_US.shp layer and the join layer to be our just-created airport_buffers_with_surrounding_populations layer. Give it a descriptive name like final_airports_with_surrounding_populations.shp and hit OK as all the defaults should work just fine. If you inspect one of these newly created airport points, you should see that it has a new "pop_max_sum" field the represents the sum of all the populations of all the cities within 50 miles, exactly what we wanted to calculate. (If you want to use these more accurate numbers in the model itself, just make sure to specify the right file name when importing).

Vector Tutorial (ArcMap section) - NetLogo/GIS-Extension GitHub Wiki