Spatial Data Beast - aseldawy/bdtutorials GitHub Wiki

Around 60% of all the publicly available data has a geospatial component. It is hard to think of any big data science project without dealing with geospatial data in one form or another. This tutorial explains how to load and process big spatial data in Spark using Beast. Beast is a Spark add-on for Big Exploratory Analytics on Spatio-temporal data.

Prerequisites

Before starting this tutorial, make sure that you have the big-data development setup ready and that you can create a Spark project either in Java or Scala. Download the following sample file for testing.

Steps

To start using Beast, add the following dependency to your pom.xml file.

<!-- https://mvnrepository.com/artifact/edu.ucr.cs.bdlab/beast-spark -->
<dependency>
    <groupId>edu.ucr.cs.bdlab</groupId>
    <artifactId>beast-spark</artifactId>
    <version>0.8.2</version>
</dependency>

Load spatial files

The first step is to load the spatial file to create a SpatialRDD. The easiest way is to use the spatialFile function which automatically detects the input file format.

val features: RDD[IFeature] = sparkContext.spatialFile("tl_2017_us_state.shp")

The spatialFile function cannot always detect the file format, e.g., when the input file extension is not expected. You can use one of the functions shapefile and geojson to load the Shapefile and GeoJSON formats, respectively. For more details on the supported input formats, check this page.

Spatial filter

To filter all the records based on a spatial filter, use the following command.

val range: Geometry = new GeometryFactory().toGeometry(new Envelope(-128.1, -63.8, 27.3, 54.3))
val matchingFeatures = features.filter(_.intersects(range))

Summarization

A simple way to summarize a dataset is by collecting some aggregate values such as the range and count.

val summary: Summary = features.summary

Another way to summarize a dataset is to create a two-dimensional histogram that summarizes the data.

val histogram: AbstractHistogram = features.countHistogram(100, 100)

Mainpulating non-spatial attributes

You can manipulate features by adding non-spatial attributes. The following examples computes the area of each polygon and adds it as an additional attributes.

val featuresWithArea = features.map({ f =>
  val newF = new Feature(f)
  newF.appendAttribute("area", f.getGeometry.computeArea)
  newF
})

Manipulating the geometry

You can also modify the geometry attribute in the feature.

val summarizedFeatures = features.map({ f =>
  val newF = new Feature(f)
  newF.setGeometry(f.getGeometry.convexHull)
  newF
})

Plotting the features

A simple way to plot the features to an image is as follows.

features.plotImage(2000, 2000, "states.png")

Writing the features to a spatial file

You can write features to a spatial file in several formats.

features.saveAsShapefile("states.shp")
⚠️ **GitHub.com Fallback** ⚠️