Bulk insertion - DarkWanderer/ClickHouse.Client GitHub Wiki
ClickHouse is specialized in storing huge volumes of logs/metrics data. While it can accept single-row inserts (and this client supports them), the specialized MergeTree family engines work best when data is inserted in bulk. ClickHouse.Client supports this scenario via specialized ClickHouseBulkCopy class
Using ClickHouseBulkCopy requires:
- Target connection (
ClickHouseConnectioninstance) - Target table name (
DestinationTableNameproperty) - Data source (
IDataReaderorIEnumerable<object[]>)
- To make best use of ClickHouse properties,
ClickHouseBulkCopyutilizes TPL to process batches of data, with up to 4 parallel insertion tasks (tweakable) - Column names can be optionally provided if source data has less columns than target table
- Following parameters are tweakable:
Columns,BatchSize,MaxDegreeOfParallelism - Before copying, a
SELECT * FROM <table> LIMIT 0query is performed to get information about target table structure. Types of provided objects must (reasonably) match the target table - i.e. it is pointless to try to write string to integer column, so an exception will be thrown on such attempt -
Sessions are not compatible with parallel insertion. Connection passed to
ClickHouseBulkCopymust have sessions disabled, orMaxDegreeOfParallelismmust be set to1
using var connection = new ClickHouseConnection("Host=<host>;<..other parameters..>");
using var bulkCopy = new ClickHouseBulkCopy(connection)
{
DestinationTableName = "<database>.<table>",
ColumnNames = new[] {"col1", "col2"},
BatchSize = 100000
};
await bulkCopy.InitAsync(); // Prepares ClickHouseBulkCopy instance by loading target column types
var values = Enumerable.Range(0, 1000000).Select(i => new object[] { (long)i, "value" + i.ToString() }); // Example generated data
await bulkCopy.WriteToServerAsync(values);
Console.WriteLine(bulkCopyInterface.RowsWritten);