Cache Vs Persist - ignacio-alorre/Spark GitHub Wiki
With cache
, you use only the default storage level:
MEMORY_ONLY
for RDDMEMORY_AND_DISK
for Dataset
With persist()
you can specify which storage level you want for both RDD and Dataset
- MEMORY_ONLY
- MEMORY_ONLY_SER
- MEMORY_AND_DISK
- MEMORY_AND_DISK_SER
- DISK_ONLY
Complete with: