Cache Vs Persist - ignacio-alorre/Spark GitHub Wiki
With cache, you use only the default storage level:
MEMORY_ONLYfor RDDMEMORY_AND_DISKfor Dataset
With persist() you can specify which storage level you want for both RDD and Dataset
- MEMORY_ONLY
- MEMORY_ONLY_SER
- MEMORY_AND_DISK
- MEMORY_AND_DISK_SER
- DISK_ONLY
Complete with: