Optimization - satyamsingh1004/spark GitHub Wiki

Do not use Repartition, rather than use Coalesce or Shuffle partition count
DistinctCount, rather use approxCountDistinct()

Skew

Salt the skewed column with a random number