Using DSBulk with Astra - datastaxdevs/awesome-astra GitHub Wiki

Using DataStax Bulk Loader (DSBulk) with Astra DB

๐Ÿ  Back to home | Written by Cรฉdrick Lunven and Artem Chebotko

Reference documentation for DSBulk

Reference documentation for DSBulk with Astra DB

๐Ÿ“‹ On this page

A - Overview

๐Ÿ“˜ What is DSBulk?

DataStax Bulk Loader or DSBulk is an efficient, flexible, easy-to-use command line utility that excels at loading, unloading, and counting data stored in Cassandra-compatible storage engines, such as OSS Apache Cassandraยฎ, DataStax Astra DB and DataStax Enterprise (DSE).

DSBulk is commonly used to:

  • Load data from JSON or CSV files to the database;
  • Unload data stored in the database to JSON or CSV files;
  • Count the number of rows in a given table.
# Load data
dsbulk load <options>

# Unload data
dsbulk unload <options>

# Count rows
dsbulk count <options>

Currently, CSV and JSON formats are supported for both loading and unloading data.

For more information about the DSBulk capabilities, please see the reference documentation for DSBulk.

๐Ÿ“˜ DSBulk and Astra DB

DataStax Bulk Loader or DSBulk can be used to load data into and unload data from your DataStax Astra DB database efficiently and reliably.

For more information about the DSBulk usage with Astra DB, please see the reference documentation for DSBulk and Astra DB.

B - Prerequisites

C - Installation

โœ… Step 1: Download DSBulk

Get the latest distribution of DSBulk by going to https://downloads.datastax.com/#bulk-loader.

Alternatively, use the curl tool to download a specific version of DSBulk:

curl -OL https://downloads.datastax.com/dsbulk/dsbulk-1.8.0.tar.gz

In this tutorial, we use Datastax Bulk Loader version 1.8.0.

โœ… Step 2: Unpack the distribution

Extract the archive:

tar -xvzf dsbulk-1.8.0.tar.gz

Find dsbulk inside the bin directory:

cd dsbulk-1.8.0/bin
./dsbulk help

D - Usage Examples

Note that using DSBulk with Astra DB requires specifying your own client id and client secret, as well as your database secure connect bundle. The client id, client secret, and secure connect bundle used in the examples below are no longer valid.

โœ… Loading data into Astra DB from a CSV file

Example dsbulk load command:

./dsbulk load \
    -url /tmp/input.csv \
    -header true \
    -k my_keyspace \
    -t my_table \
    -u BBygiXTpFXPLeOAdQwRLBZBB \
    -p xUr4qUCjsdexniP5.0PE,e09FeZ6W1,6-OuhXTwYeUcImKvBok_P3Kh8qS1djJlRE6t_tcgneMIKhgznI7Mf6iKEGq6gZOv+MPKURA7c30Ws4atjbCwdx+WcgduZ-o43 \
    -b /tmp/secure-connect-my-database.zip

# where
# -url ........ CSV file
# -header ..... file header presence
# -k .......... keyspace name
# -t .......... table name
# -u .......... client id
# -p .......... client secret
# -b .......... secure connect bundle

Example table schema:

CREATE TABLE my_keyspace.my_table (
    id UUID,
    name TEXT,
    PRIMARY KEY(id)
);

Example CSV file with a header:

id,name
d270543c-62f5-4108-9548-5bbc50cd94fe,Alice
74871405-e108-4bf7-b4bf-2c3477ef7d6d,Bob

Example output:

total | failed | rows/s | p50ms | p99ms | p999ms | batches
    2 |      0 |      8 | 77.59 | 84.93 |  84.93 |    1.00
Operation LOAD_20220319-044851-396835 completed successfully in less than one second.
Last processed positions can be found in positions.txt

โœ… Counting the number of rows in an Astra DB table

Example dsbulk count command:

./dsbulk count \
      -k my_keyspace \
      -t my_table \
      -u BBygiXTpFXPLeOAdQwRLBZBB \
      -p xUr4qUCjsdexniP5.0PE,e09FeZ6W1,6-OuhXTwYeUcImKvBok_P3Kh8qS1djJlRE6t_tcgneMIKhgznI7Mf6iKEGq6gZOv+MPKURA7c30Ws4atjbCwdx+WcgduZ-o43 \
      -b /tmp/secure-connect-my-database.zip 

# where
# -k .......... keyspace name
# -t .......... table name
# -u .......... client id
# -p .......... client secret
# -b .......... secure connect bundle

Example output for the table with 2 rows:

total | failed | rows/s | p50ms | p99ms | p999ms
    2 |      0 |      2 | 66.13 | 73.92 |  73.92
Operation COUNT_20220319-041624-950820 completed successfully in less than one second.
2

โœ… Unloading data from Astra DB into a CSV file

Example dsbulk unload command:

./dsbulk unload \
    -k my_keyspace \
    -t my_table \
    -u BBygiXTpFXPLeOAdQwRLBZBB \
    -p xUr4qUCjsdexniP5.0PE,e09FeZ6W1,6-OuhXTwYeUcImKvBok_P3Kh8qS1djJlRE6t_tcgneMIKhgznI7Mf6iKEGq6gZOv+MPKURA7c30Ws4atjbCwdx+WcgduZ-o43 \
    -b /tmp/secure-connect-my-database.zip \
    > /tmp/output.csv

# where
# -k .......... keyspace name
# -t .......... table name
# -u .......... client id
# -p .......... client secret
# -b .......... secure connect bundle

Example output for the table with 2 rows:

total | failed | rows/s | p50ms | p99ms | p999ms
    2 |      0 |      2 | 71.37 | 83.89 |  83.89
Operation UNLOAD_20220319-181929-004308 completed successfully in less than one second.

For more examples, see

Reference documentation for DSBulk

Reference documentation for DSBulk with Astra DB

โš ๏ธ **GitHub.com Fallback** โš ๏ธ