2.3. quasardb import tool#

2.3.1. Introduction#

The quasardb import tool qdb_import enables you to insert CSV files into the database in the fastest way possible. To know how to read the values and where to store them, qdb_import needs a configuration file. Otherwise, it will try to guess the configuration and ask you questions (it’s the interactve mode).

2.3.2. Quick Reference#

Option

Usage

Default

-h, --help

display help

--help-config

display information about config file format

--help-parsers

display information about the parsers

--version

display the build version and date

-c, --cluster

set the cluster URI

qdb://127.0.0.1:2836

-f, --file

specify the file to insert

--config

specify the configuration file

--i-make-config

interactive mode: create a config file then exit

--i-auto

interactive mode: guess fields without asking

--i-verbose

interactive mode: becomes more explicit

--check-syntax

check the file’s syntax and display errors

--async

insert the data asynchronously

--transactional

insert the data transactionally

--jobs

maximum number of threads used

number of logical cores

--max-error-count

maximum number of errors ignored

0

--user-credentials-file

path to the user’s private file

--cluster-public-key-file

path to the cluster’s public file

--timeout

connection timeout duration with the server

Option	Usage	Default
`-h`, `--help`	display help
`--help-config`	display information about config file format
`--help-parsers`	display information about the parsers
`--version`	display the build version and date
`-c`, `--cluster`	set the cluster URI	qdb://127.0.0.1:2836
`-f`, `--file`	specify the file to insert
`--config`	specify the configuration file
`--i-make-config`	interactive mode: create a config file then exit
`--i-auto`	interactive mode: guess fields without asking
`--i-verbose`	interactive mode: becomes more explicit
`--check-syntax`	check the file’s syntax and display errors
`--async`	insert the data asynchronously
`--transactional`	insert the data transactionally
`--jobs`	maximum number of threads used	number of logical cores
`--max-error-count`	maximum number of errors ignored	0
`--user-credentials-file`	path to the user’s private file
`--cluster-public-key-file`	path to the cluster’s public file
`--timeout`	connection timeout duration with the server

2.3.3. Configuration file#

2.3.3.1. Basic configuration#

The configuration file is in this JSON format:

{
    "delimiter": <One of ',', ';' or '\t', character separating the values>,
    "has_header": <'true' or 'false', if it has a first line containing the column names>,
    "default_timeseries": <The timeserie's name in which the values are inserted>,
    "columns": [
        // In case of a timestamp key value
        {
            "parser": <One of 'time, date, datetime, iso_8601, epoch, epoch_ms, epoch_ns'>,
            "into": "$timestamp"
        },
        // In case of the line's table name (optional)
        {
            "parser": <One of 'string, quoted_string'>,
            "into": "$table"
        },
        // Otherwise, to insert the value
        {
            "parser": <One of 'time, date, datetime, iso_8601, epoch, epoch_ms, epoch_ns, custom_date, custom_duration, double_, int64, blob, quoted_blob, string, quoted_string, symbol, quoted_symbol, skip_column, skip_quoted_column, taq_time, single_alphanum, base64, datetime_US, hexadecimal, boolean, date_US, military_time'>,
            "into": <default timeseries's column where to put the value>
            // In case of symbol parser
            "symtable": <Symbol table name>
        }
    ]
}

Example of configuration:

The first column creates the timestamp key.

The second column contains text inserted in the table ‘names’ of the time series ‘VIP’.

The third column contains integer inserted in the table ‘ages’ of the time series ‘VIP’.

{
    "delimiter": ",",
    "has_header": false,
    "default_timeseries": "VIP",
    "columns": [
        {
            "parser": "datetime",
            "into": "$timestamp"
        },
        {
            "parser": "blob",
            "into": "names"
        },
        {
            "parser": "skip_column",
            "into": "ages"
        }
    ]
}

This configuration file would allow to insert values like this:

Notice that empty values are allowed if they are not timestamp values.

2.3.3.2. Advanced Configuration#

If several column values are marked as timestamps, the key used will the sum of these timestamps. If can be useful for example if one value is the current date, and the other value the current time.

The main JSON structure can give an optional “shard_size”, a mandatory parameter if there are mentionned timeseries that do not exist yet (otherwise, qdb_import will ask to enter it in command line). Units available are ‘d, h, min, s, ms’.

If a column’s parser is one of ‘skip_column, skip_quoted_column’, the value is ignored.

“into” can also be equal to a structure to have more flexibility:

"into": {
    "timeseries": <The timeseries's name. Overrides "default_timeseries">
    "column":     <The column's name>
}

Finally, “into” can be an array of several ‘into’ values. In this case, it will be inserted in multiple tables, perhaps in different timeseries.

Here is a more complex configuration example:

The first and the third columns create the timestamp key. The second column contains text inserted in: - The table ‘names’ of the time series ‘VIP’. - The table ‘persons’ of the time series ‘traces’. - The fourth columm is ignored.

{
    "delimiter": ",",
    "has_header": false,
    "default_timeseries": "VIP",
    "shard_size": "20min",
    "columns": [
        {
            "parser": "date",
            "into": "$timestamp"
        },
        {
            "parser": "blob",
            "into": [
                "names",
                {
                    "timeseries": "traces",
                    "column":     "persons"
                }
            ]
        },
        {
            "parser": "time",
            "into": "$timestamp"
        },
        {
            "parser": "skip_column"
        }
    ]
}

This second configuration file would allow to insert values like this:

15.05,Steve,11:03:02,aaa
15.05,Duke Nukem,11:03:04,1234
15.05,Germaine,11:03:05,
15.05,,11:03:07,xxx
15.05,Molly,11:03:11,boop
...

Moreover, it allows to create the timeseries automatically (without asking the shard_size).

2.3.4. Parsers#

Timestamp parsers :

time: Parse a time of the day which may contains seconds and nanoseconds, eg. 11:03 or 02:49:31.123456789

date: Parse a date at the format YYYY$MM$DD where $ is ‘/’, ‘.’ or ‘-’, eg. 2018.05.15 or 2019/02/26

datetime: Parse a date and a time separated by spaces, eg. 2018-05-23 11:03:58.007

datetime_US: Like datetime but reads MM/DD/YYYY, eg. 05-23-2018 11:03:58.007

iso_8601: Parse a date and a time in ISO format, eg. 2018-05-15T11:03:58.007

epoch: Parse a number as seconds, eg. 2 or 2.45

epoch_ms: Parse a number as milliseconds, eg. 2000

epoch_ns: Parse a number as nanoseconds, eg. 2000000000

custom_date: Parse a date with a given format, eg. %Y/%m/%d

custom_duration: Parse a duration with a given format, added to a basetime (1970-01-01 by default)

Other parsers :

double: Parse a floating value, eg. -1234.567, 123, -.2 or 3

int64: Parse a signed integer, eg. 123 or -54

boolean: Parse “true” or “false” as integers, respectively 1 and 0

hexadecimal: Parse positive hexadecimal numbers which may have ‘0x’ prefix or ‘h’ suffix, eg. 0x123ABC

blob: Parse text, eg. 123, bip bop or @#$%^&*{[<

quoted_blob: Parse optionally quoted text, e.g. “commas,,” or bop bip. Double quotes escape quotes : “it””s raining”

base64: Parse text encoded with base64, eg. bnVnbGlhcg==

skip_column: Skip a value that could be parsed by ‘blob’

skip_quoted_column: Skip a value that could be parsed by ‘quoted_blob’

For the parsers custom_date and custom_duration, the format used must be given in the parser’s column. The format string specification can be found at https://en.cppreference.com/w/cpp/chrono/duration/from_stream#Format_string .

For custom_duration, the optional parameter ‘basetime’ is the date added to the parsed duration.

Example:

{
    "parser":   "{custom_duration}",
    "format":   "% days + %H:%M - %S",
    "basetime": "2020-04-01 12:00:00",
    "into":     "$timestamp"
}

Reading “2 days + 06:00:12.5” from the column above gives the timestamp “2020-04-03 18:00:12.5”.

2.3.5. Program Options#

-h, --help#: Displays help about the program’s options.

--help-config#: Displays information about config file format, for basic and advanced use cases.

--help-parsers#: Displays information about the parsers.

--version#: Displays the build version and date.

-c, --cluster=<cluster_uri>#

Specifies the address and port of the quasardb cluster to which the insertion tool must connect.

Argument: The URI (list of comma-separated endpoints, i.e. addresses and ports, preceded by qdb://) of a cluster on which the tests will be run.
Default value: qdb://127.0.0.1:2836, the IPv4 localhost address and the port 2836.

-f, --file=<path_to_file>#

Specifies the file used for insertion.

Argument: Path to the file.

--config=<path_to_file>#

Specifies the configuration file used to read the CSV file and insert it’s values.

Argument: Path to the file.

--i-make-config=<path_to_new_file>#

The interactive mode outputs a configuration file at the given path then quits.

Argument: Path to the new file.

--i-auto#: The interactive mode does not ask questions and quits on missing or ambiguous guess.

--i-verbose#: The interactive mode is verbose, and asks more questions if ‘–i-auto’ is not specified.

--check-syntax#: The program will locally check the file’s syntax then display it’s errors and quit. The program can run without any cluster.

--async#: The program insert the data asynchronously.

--transactional#: The program insert the data transactionally.

--jobs=<max_threads>#

Maximum number of threads used. By default, it uses the number of logical cores available on your machine.

Default value: 1

--max-error-count=<max_error_count>#

Maximum number of errors ignored (one per wrong line).

Default value: 0

--user-credentials-file=<path_to_file>#

Specifies the user’s private key file with both username and secret_key.

Argument: Path to the file.

--cluster-public-key-file=<path_to_file>#

Specifies the path to the cluster’s public key file.

Argument: Path to the file.

--timeout=<timeout>#

The connection timeout between the client and the server. Available units: d, h, min, s, ms”)

Argument: A timeout value.