Cache Vs Persist - ignacio-alorre/Spark GitHub Wiki

With cache, you use only the default storage level:

  • MEMORY_ONLY for RDD
  • MEMORY_AND_DISK for Dataset

With persist() you can specify which storage level you want for both RDD and Dataset

  • MEMORY_ONLY
  • MEMORY_ONLY_SER
  • MEMORY_AND_DISK
  • MEMORY_AND_DISK_SER
  • DISK_ONLY

Complete with:

https://sparkbyexamples.com/spark/spark-difference-between-cache-and-persist/#:~:text=Spark%20Cache%20vs%20Persist&text=Both%20caching%20and%20persisting%20are,to%20user%2Ddefined%20storage%20level.