Optimization - satyamsingh1004/spark GitHub Wiki Do not use Repartition, rather than use Coalesce or Shuffle partition count DistinctCount, rather use approxCountDistinct() Skew Salt the skewed column with a random number