Best practices for SQOOP split by - beercafeguy/sqoop-commands GitHub Wiki
In my last article in same repo, I discussed the use of --split-by. Here I will list few best practices for the same.
- The column in split-by should not have any null value else we will loose some data.
- Preferably this should be primary key column of the table. If there is no PK in table, then choose a column which is of distributed cardinality otherwise we will get skewed data.
- This should be a numeric column.