Practice Learning Spark - ayushmathur94/Spark GitHub Wiki

it is generally not a good idea to use actions inside transformations. Actions are used to trigger the execution of a Spark job and return a result to the driver, while transformations are used to define a set of operations that will be applied to the data in an RDD.

Using an action inside a transformation can cause the entire RDD to be collected and stored in memory on the driver, which can be very expensive and may not scale well with large datasets. It can also cause the Spark job to execute prematurely, before all of the transformations have been applied to the data, which can result in incorrect results.

To avoid these issues, it is generally best to avoid using actions inside transformations and to use transformations to define the operations that should be applied to the data. Then, use actions to trigger the execution of the Spark job and return the results to the driver.


It looks like you are trying to use the foreach() transformation inside the call() function of the flatMap() transformation. This is not allowed in Spark, because the foreach() transformation is an action that triggers the execution of a Spark job, and actions cannot be used inside transformations.

To fix this issue, you can use a transformation instead of the foreach() transformation to generate the pairs and return them as part of the new RDD. For example, you can use the map() transformation to generate the pairs and return them as a list, or you can use the mapToPair() transformation to generate the pairs and return them as a JavaPairRDD.