Spatial Data Beast - aseldawy/bdtutorials GitHub Wiki
Around 60% of all the publicly available data has a geospatial component. It is hard to think of any big data science project without dealing with geospatial data in one form or another. This tutorial explains how to load and process big spatial data in Spark using Beast. Beast is a Spark add-on for Big Exploratory Analytics on Spatio-temporal data.
Before starting this tutorial, make sure that you have the big-data development setup ready and that you can create a Spark project either in Java or Scala. Download the following sample file for testing.
To start using Beast, add the following dependency to your pom.xml
file.
<!-- https://mvnrepository.com/artifact/edu.ucr.cs.bdlab/beast-spark -->
<dependency>
<groupId>edu.ucr.cs.bdlab</groupId>
<artifactId>beast-spark</artifactId>
<version>0.8.2</version>
</dependency>
The first step is to load the spatial file to create a SpatialRDD. The easiest way is to use the spatialFile
function which automatically detects the input file format.
val features: RDD[IFeature] = sparkContext.spatialFile("tl_2017_us_state.shp")
The spatialFile
function cannot always detect the file format, e.g., when the input file extension is not expected. You can use one of the functions shapefile
and geojson
to load the Shapefile and GeoJSON formats, respectively. For more details on the supported input formats, check this page.
To filter all the records based on a spatial filter, use the following command.
val range: Geometry = new GeometryFactory().toGeometry(new Envelope(-128.1, -63.8, 27.3, 54.3))
val matchingFeatures = features.filter(_.intersects(range))
A simple way to summarize a dataset is by collecting some aggregate values such as the range and count.
val summary: Summary = features.summary
Another way to summarize a dataset is to create a two-dimensional histogram that summarizes the data.
val histogram: AbstractHistogram = features.countHistogram(100, 100)
You can manipulate features by adding non-spatial attributes. The following examples computes the area of each polygon and adds it as an additional attributes.
val featuresWithArea = features.map({ f =>
val newF = new Feature(f)
newF.appendAttribute("area", f.getGeometry.computeArea)
newF
})
You can also modify the geometry attribute in the feature.
val summarizedFeatures = features.map({ f =>
val newF = new Feature(f)
newF.setGeometry(f.getGeometry.convexHull)
newF
})
A simple way to plot the features to an image is as follows.
features.plotImage(2000, 2000, "states.png")
You can write features to a spatial file in several formats.
features.saveAsShapefile("states.shp")