Vector Tutorial - NetLogo/GIS-Extension GitHub Wiki
The GIS extension for NetLogo allows you to import and export real-world spatial data into your models. If you can picture your model taking place on top of a 2D map of real-world geography, then the GIS extension is your best bet for making that happen. In this tutorial, I will be showing an example of how to use a GIS application (either the free and open source tool QGIS or the industry giant ArcMap) to clean up and prepare GIS data and then use the GIS NetLogo Extension to import and create a model with it.
Spatial data is, unsurprisingly, any data with a spatial component, from simple points of latitude and longitude — such as locations of post offices around a nation — to lines — such as road networks — to closed polygons — such as the boundaries of nations or municipalities. In addition to the spatial data inherent to such geographic features, a GIS dataset almost always also contains many more fields of data associated with each feature or individual element of that dataset. To reuse the above examples: each post office point feature could have a field representing its address, each road line feature could have a field representing its speed limit, and each nation's polygon feature could have a field representing its population or its GDP.
One way to think about it (which also happens to be the way it is represented "under the hood") is to imagine that every spatial feature is given a unique ID that corresponds to a row within a large table or spreadsheet (commonly called the attribute table, as you will see). In this way, spatial data is composed of spatial features that all have a one-to-one relationship to a row within a table.
GIS stands for Geographic Information System -- that much almost everyone agrees, but there is some geography-nerd controversy about some of the semantics beyond that. GIS can commonly refer to one of two things: first, GIS apps, which can manipulate and analyize spatial data much in the same way apps like Excel can manipulate and analyze tabular data, or second, the idea of geospatial data and the manipulation thereof more broadly (which is sometimes called Geographic Information Science, also, unhelpfully, acronym-ized to GIS). For our purposes, we are going to use one of two GIS apps, QGIS, or ArcMap, to clean up data before importing it into NetLogo using the GIS extension.
By the end of this tutorial, you will have created a highly simplified model of disease transmission via air travel between a number of cities in the continental United States. We will model cities in their real world locations with estimations of their real-world populations. Each of these cities will have with a number of airplanes traveling between them, each with a possibility of having an infected passenger based on how many infected individuals are in the departing city.
Through this process, you will learn a number of valuable GIS skills, including:
- Finding and importing geospatial data
- Performing cleanup and manipulation of that data, including cross-dataset operations
- Exporting GIS data into universal formats
- Importing GIS data into NetLogo
- Using the GIS extension's primitives to work with the geospatial data within your model
- Exporting NetLogo state back into output GIS data
(Note that for long-term compatibility's sake, all the data you need to complete this tutorial can also be downloaded from a mirror link we've set up)
While not strictly necessary, it would be a little hard to visualize the model without a background map to give some visual context. While there are a number of places you can download maps of the US states, we might as well go straight to the source and use the files provided by the Census Bureau. Scroll down until you see the subheading "States" and click on one of the options below. The multiple versions each correspond to the level of fidelity to the actual outlines of the states, with a higher number after the 1 (and thereby a lower ratio) corresponding to higher fidelity. We won't need any higher fidelity than the minimum 1:500,000 version, so download that one and move the zip somewhere you can keep track of.
If we're going to model air travel, we're going to need a map of airports, and here, we are going to learn on the good community of Natural Earth, a repository of public domain geospatial datasets supported by the North American Cartographic Information Society. They have a dataset of airports that serves our needs nicely. Go ahead and download the latest version and move the zip somewhere you can keep track of.
Finally, we are going to also go ahead and grab the Natural Earth's "Populated Places" dataset while we're here. We will use this to estimate the populations served by each of the airports we are simulating.
(For both of these natural earth point datasets, you will see that the versions we are using are labeled as having a "1:10m scale". However, unlike the scales we talked about earlier with the polygon state dataset, this "scale" does not actually signify a level of fidelity but instead a measure of how many datapoints -- cities or airports in this case -- are included or omitted. It is a bit confusing, but the way to think about it is that Natural Earth makes datasets that are meant to be used in reference maps, including printed reference maps. In a general reference map the size of a piece of letter paper, you don't want every single city you have in your database cluttering up the page because it makes it harder to see the large cities that you are more likely to care about at that scale.)
If you unzip any of these files, you will see that there are a number of different files that comprise the dataset, each with the same name and a different file extension. The most important thing to know when working with these spatial files is not to break apart these many disparate files. The ".shp" file is the main file of the bunch, but you don't need to know much more than that. I've given a quick summary of all the different kinds of files below if you're so inclined. Otherwise skip down to step 1.5.
- .shp file: shapefile containing the actual spatial component of data
- .dbf file: the file that contains the attribute table, or the "spreadsheet" of all the values for every field for every row/spatial feature. This file is actually an antiquated file format from the 80's but thanks to Microsoft's obsession with backwards compatability, Excel can actually open up these files and read (but not write) them.
- .shx file: file used to help connect the .shp and .dbf file
- .prj file: the projection file, which tells the GIS software how to interpret the spatial data within the .shp file and how to project it into a 2D map. This file is actually just plain-text if you're curious and want to open it up in a text editor.
- Possibly more, each of which you can learn about at this helpful Library of Congress file format database.
Before we begin the tutorial proper, you have to choose which GIS app you wish to use and set it up. And really there are two clear options.
First, there is the industry-standard ArcMap. ArcMap and its cousins are made by ESRI, the commercial giant in the industry. Indeed all the file formats we are going to be working with today were developed by ESRI and have become de-facto defaults. The benefits of using ArcMap is that, if you are part of a large commercial or educational institution, its probably what everyone else in your organization is using if they use GIS and that can be a great resource if you need to ask for help. In addition, if you want to build GIS skills that are highly transferrable, then learning the de-facto default is not at all a bad route. (One thing to note: ArcMap itself is Windows only, but ESRI offers less-mature web-based GIS services that are cross platform.) However, if you are not part of an institution that gives you access to ArcMap already, then the steep cost is almost certain to turn you looking for an alternative, which brings us to QGIS.
QGIS is a free and open source alternative piece of GIS software that can more than hold its own against ESRI's offerings. It has versions for MacOS, Linux, and Windows, and stands right next to other high-quality open-source tools like Blender and LibreOffice in terms of stability and community support. What it lacks in the larger ecosystem and more team-focused features of the ArcGIS family of products, it makes up for in performance, stability, and deep extensability. For individuals or small organizations, it is a no-brainer.
For each step of the process here, I have created both a QGIS and an ArcMap set of instructions so you can follow along regardless of which tool you have chosen.