Best practices for SQOOP split by - beercafeguy/sqoop-commands GitHub Wiki

In my last article in same repo, I discussed the use of --split-by. Here I will list few best practices for the same.

  • The column in split-by should not have any null value else we will loose some data.
  • Preferably this should be primary key column of the table. If there is no PK in table, then choose a column which is of distributed cardinality otherwise we will get skewed data.
  • This should be a numeric column.