Copy Into - puneet3663/databricks GitHub Wiki

COPY INTO provides SQL engineers an idempotent option to incrementally ingest data from external systems

Note that this operation does have some expectations:

Data schema should be consistent Duplicate records should try to be excluded or handled downstream This operation is potentially much cheaper than full table scans for data that grows predictably.

We want to capture new data but not re-ingest files that have already been read. We can use COPY INTO to perform this action.

The first step is to create an empty table. We can then use COPY INTO to infer the schema of our existing data and copy data from new files that were added since the last time we ran COPY INTO.

above table is created with no column

COPY INTO loads data from data files into a Delta table. This is a retriable and idempotent operation, meaning that files in the source location that have already been loaded are skipped.