Optimization - satyamsingh1004/spark GitHub Wiki

  • Do not use Repartition, rather than use Coalesce or Shuffle partition count
  • DistinctCount, rather use approxCountDistinct()

Skew

  • Salt the skewed column with a random number image