What Type of RDD Does Your Transformation Return? - ignacio-alorre/Spark GitHub Wiki
-
RDDs are an abstracted concept in two ways:
- They can be of almost any arbitrary type of record [String, Row, Tuple]
- They can be members of one of several implementations of the RDD interface with varying properties
-
First distinction is important because some transformations can only be applied to RDDs with certain record types
-
Second distinction is important because each transformation returns one of the several implementations of the RDD interface, and same transformation called on two different RDD implementations may be evaluated differently
You can define the type of an RDD as follows RDD[String]
. It is important you define, since some transformations are only defined on RDDs of a particular type.