Actions - ayushmathur94/Spark GitHub Wiki

  • Actions are cretain operations that returns a final value to the driver program or write data to an external storage system.
  • Actions performed will force the evaluation of transformations required for the RDD they were called on.
  • Actions must be performed in Spark , without an Action, spark will not evaluate any transformation.

Java error count records using actions.

System.out.println("Input had" + badLinesRDD.count() + "concerning lines");
System.out.println("Here are 10 examples : ");
for(String line : badLinesRDD.take(10))
{
System.out.println(line);
}

Scala error count code

 println("Input had" + badLinesRDD.count() + "concerning lines");
      println("Here are 10 examples :");
      badLinesRDD.take(10).foreach(print); 

Python error count code :

print "Input had" + badLinesRDD.count() + "concerning lines"
print "Here are 10 examples: "
for line in badLinesRDD.take(10);
print line

In the above lines of code, take() is used to get elements in RDD at the driver program which are then iterated to write an output to the driver. Also, the RDD consists of collect() that fetches the complete RDD.

Each time new action is been called, the entire RDD must be computed from scratch. To avoid this type of inefficiency users can persist intermediate results.

⚠️ **GitHub.com Fallback** ⚠️