1.1. Batch inserter#

1.1.1. Purpose#

The QuasarDB batch inserter provides you with an interface to send data to the QuasarDB cluster. The data is buffered client-side and sent in a single logical batch, ensuring efficiency and transactional consistency.

The batch inserter supports inserting into multiple tables in a single, atomic operation.

1.1.2. APIs#

There are currently three different batch insertion APIs:

API	Status	Description
Regular	Deprecated	Regular batch inserter, which exposes a row-based insertion API. This API has been deprecated for performance reasons.
Pinned	Deprecated	Pinned writer, which exposes a column-oriented API. This API has been deprecated for performance reasons.
Exp(erimental)	Recommended	Next generation batch insertion API, available since 3.13.1. It exposes a column-oriented API, and all new features and functionality are available under this API.

We strongly recommend using the new, experimental batch writer API for any newly written code. Old code should be ported to this API.

1.1.3. Insertion modes#

The batch writer has various modes of operation, each with different tradeoffs:

Insertion mode	Description	Use case(s)
Default	Transactional insertion mode that employs Copy-on-Write	General purpose
Fast	Transactional insert that does not employ Copy-on-Write. Newly written data may be visible to queries before the transaction is fully completed.	High-volume inserts constrained by disk I/O
Asynchronous	Data is buffered in-memory in the QuasarDB daemon nodes before writing to disk. Data from multiple sources is buffered together, and periodically flushed to disk.	Streaming data where multiple processes simultaneously write into the same table(s)
Truncate	Replaces any existing data with the provided data in a single transactional operation.	Replay of historical data

If you intend on doing a lot of small, incremental inserts to the same table, we recommend using the asychronous insertion mode.

If you process and insert data in larger batches, we recommend using the fast insertion mode.

1.1.4. Options#

In addition to different an insertion modes, the batch writer API provides various options and parameters that affect operation, which are documented below.

Name	Description
Truncate ranges	When using Truncate insertion mode, these ranges specify which ranges to truncate. All newly inserted data must fall within these ranges.
Duplicates	Allows specification of deduplication options. See the documentation on deduplication for more information.

1.1.5. Batch size#

The batch inserter parallelizes its operations, based on the number of threads configuration option. It first groups the entire operation into batches of tasks, and then executes these in parallel.

You can configure the maximum size of the batch by setting the client_max_batch_load option. For example, if you are running your application with a connection parallelism of 4, your max batch load is 21, and you attempt to insert data into 210 different shards, these tasks will be grouped in 10 different batches, and executed in parallel using 4 threads.

In Python, you can configure it as such:

import quasardb.pool as pool
pool.initialize(uri='qdb://127.0.0.1:2836')

with pool.instance().connect() as conn:
  conn.options().set_client_max_batch_load(42)

The batch inserter will now group the operation into batches of up to 42 tasks.

1.1.6. Usage#

The steps involved in using the batch writer API is as follows:

Initialize a local batch inserter instance, providing it with the tables and columns you want to insert data for. Note that specifying multiple tables is supported: this will allow you to insert data into multiple tables in one atomic operation.
Prepare/buffer the batch you want to insert. Buffering locally before sending ensures that the tranmission of the data is happening at maximum throughput, ensuring server-side efficiency.
Push the batch to the cluster.
If necessary, go back to step 2 to send additional batches.

For any insertion mode that is not asynchronous, we recommend batch sizes as large as possible: batch sizes of million of rows are not uncommon and encouraged.