2.2. quasardb import tool#
2.2.1. Introduction#
The quasardb import tool qdb_import enables you to insert CSV files into the database in the fastest way possible. To know how to read the values and where to store them, qdb_import needs a configuration file. Otherwise, it will try to guess the configuration and ask you questions (it’s the interactve mode).
2.2.2. Quick Reference#
Option
Usage
Default
display help
display information about config file format
display information about the parsers
display the build version and date
set the cluster URI
qdb://127.0.0.1:2836
specify the file to insert
specify the configuration file
interactive mode: create a config file then exit
interactive mode: guess fields without asking
interactive mode: becomes more explicit
check the file’s syntax and display errors
insert the data asynchronously
insert the data transactionally
maximum number of threads used
number of logical cores
maximum number of errors ignored
0
path to the user’s private file
path to the cluster’s public file
connection timeout duration with the server
2.2.3. Configuration file#
2.2.3.1. Basic configuration#
The configuration file is in this JSON format:
{
"delimiter": <One of ',', ';' or '\t', character separating the values>,
"has_header": <'true' or 'false', if it has a first line containing the column names>,
"default_timeseries": <The timeserie's name in which the values are inserted>,
"columns": [
// In case of a timestamp key value
{
"parser": <One of 'time, date, datetime, iso_8601, epoch, epoch_ms, epoch_ns'>,
"into": "$timestamp"
},
// In case of the line's table name (optional)
{
"parser": <One of 'string, quoted_string'>,
"into": "$table"
},
// Otherwise, to insert the value
{
"parser": <One of 'time, date, datetime, iso_8601, epoch, epoch_ms, epoch_ns, custom_date, custom_duration, double_, int64, blob, quoted_blob, string, quoted_string, symbol, quoted_symbol, skip_column, skip_quoted_column, taq_time, single_alphanum, base64, datetime_US, hexadecimal, boolean, date_US, military_time'>,
"into": <default timeseries's column where to put the value>
// In case of symbol parser
"symtable": <Symbol table name>
}
]
}
Example of configuration:
The first column creates the timestamp key.
The second column contains text inserted in the table ‘names’ of the time series ‘VIP’.
The third column contains integer inserted in the table ‘ages’ of the time series ‘VIP’.
{
"delimiter": ",",
"has_header": false,
"default_timeseries": "VIP",
"columns": [
{
"parser": "datetime",
"into": "$timestamp"
},
{
"parser": "blob",
"into": "names"
},
{
"parser": "skip_column",
"into": "ages"
}
]
}
This configuration file would allow to insert values like this:
Notice that empty values are allowed if they are not timestamp values.
2.2.3.2. Advanced Configuration#
If several column values are marked as timestamps, the key used will the sum of these timestamps. If can be useful for example if one value is the current date, and the other value the current time.
The main JSON structure can give an optional “shard_size”, a mandatory parameter if there are mentionned timeseries that do not exist yet (otherwise, qdb_import will ask to enter it in command line). Units available are ‘d, h, min, s, ms’.
If a column’s parser is one of ‘skip_column, skip_quoted_column’, the value is ignored.
“into” can also be equal to a structure to have more flexibility:
"into": {
"timeseries": <The timeseries's name. Overrides "default_timeseries">
"column": <The column's name>
}
Finally, “into” can be an array of several ‘into’ values. In this case, it will be inserted in multiple tables, perhaps in different timeseries.
Here is a more complex configuration example:
The first and the third columns create the timestamp key. The second column contains text inserted in: - The table ‘names’ of the time series ‘VIP’. - The table ‘persons’ of the time series ‘traces’. - The fourth columm is ignored.
{
"delimiter": ",",
"has_header": false,
"default_timeseries": "VIP",
"shard_size": "20min",
"columns": [
{
"parser": "date",
"into": "$timestamp"
},
{
"parser": "blob",
"into": [
"names",
{
"timeseries": "traces",
"column": "persons"
}
]
},
{
"parser": "time",
"into": "$timestamp"
},
{
"parser": "skip_column"
}
]
}
This second configuration file would allow to insert values like this:
2018.15.05,Steve,11:03:02,aaa
2018.15.05,Duke Nukem,11:03:04,1234
2018.15.05,Germaine,11:03:05,
2018.15.05,,11:03:07,xxx
2018.15.05,Molly,11:03:11,boop
...
Moreover, it allows to create the timeseries automatically (without asking the shard_size).
2.2.4. Parsers#
Timestamp parsers :
time: Parse a time of the day which may contains seconds and nanoseconds, eg. 11:03 or 02:49:31.123456789
date: Parse a date at the format YYYY$MM$DD where $ is ‘/’, ‘.’ or ‘-’, eg. 2018.05.15 or 2019/02/26
datetime: Parse a date and a time separated by spaces, eg. 2018-05-23 11:03:58.007
datetime_US: Like datetime but reads MM/DD/YYYY, eg. 05-23-2018 11:03:58.007
iso_8601: Parse a date and a time in ISO format, eg. 2018-05-15T11:03:58.007
epoch: Parse a number as seconds, eg. 2 or 2.45
epoch_ms: Parse a number as milliseconds, eg. 2000
epoch_ns: Parse a number as nanoseconds, eg. 2000000000
custom_date: Parse a date with a given format, eg. %Y/%m/%d
custom_duration: Parse a duration with a given format, added to a basetime (1970-01-01 by default)
Other parsers :
double: Parse a floating value, eg. -1234.567, 123, -.2 or 3
int64: Parse a signed integer, eg. 123 or -54
boolean: Parse “true” or “false” as integers, respectively 1 and 0
hexadecimal: Parse positive hexadecimal numbers which may have ‘0x’ prefix or ‘h’ suffix, eg. 0x123ABC
blob: Parse text, eg. 123, bip bop or @#$%^&*{[<
quoted_blob: Parse optionally quoted text, e.g. “commas,,” or bop bip. Double quotes escape quotes : “it””s raining”
base64: Parse text encoded with base64, eg. bnVnbGlhcg==
skip_column: Skip a value that could be parsed by ‘blob’
skip_quoted_column: Skip a value that could be parsed by ‘quoted_blob’
For the parsers custom_date and custom_duration, the format used must be given in the parser’s column. The format string specification can be found at https://en.cppreference.com/w/cpp/chrono/duration/from_stream#Format_string.
For custom_duration, the optional parameter ‘basetime’ is the date added to the parsed duration.
Example:
{
"parser": "{custom_duration}",
"format": "% days + %H:%M - %S",
"basetime": "2020-04-01 12:00:00",
"into": "$timestamp"
}
Reading “2 days + 06:00:12.5” from the column above gives the timestamp “2020-04-03 18:00:12.5”.
2.2.5. Program Options#
- -h, --help#
Displays help about the program’s options.
- --help-config#
Displays information about config file format, for basic and advanced use cases.
- --help-parsers#
Displays information about the parsers.
- --version#
Displays the build version and date.
- -c, --cluster=<cluster_uri>#
Specifies the address and port of the quasardb cluster to which the insertion tool must connect.
- Argument
The URI (list of comma-separated endpoints, i.e. addresses and ports, preceded by
qdb://
) of a cluster on which the tests will be run.- Default value
qdb://127.0.0.1:2836
, the IPv4 localhost address and the port 2836.
- -f, --file=<path_to_file>#
Specifies the file used for insertion.
- Argument
Path to the file.
- --config=<path_to_file>#
Specifies the configuration file used to read the CSV file and insert it’s values.
- Argument
Path to the file.
- --i-make-config=<path_to_new_file>#
The interactive mode outputs a configuration file at the given path then quits.
- Argument
Path to the new file.
- --i-auto#
The interactive mode does not ask questions and quits on missing or ambiguous guess.
- --i-verbose#
The interactive mode is verbose, and asks more questions if ‘–i-auto’ is not specified.
- --check-syntax#
The program will locally check the file’s syntax then display it’s errors and quit. The program can run without any cluster.
- --async#
The program insert the data asynchronously.
- --transactional#
The program insert the data transactionally.
- --jobs=<max_threads>#
Maximum number of threads used. By default, it uses the number of logical cores available on your machine.
- Default value
1
- --max-error-count=<max_error_count>#
Maximum number of errors ignored (one per wrong line).
- Default value
0
- --user-credentials-file=<path_to_file>#
Specifies the user’s private key file with both username and secret_key.
- Argument
Path to the file.
- --cluster-public-key-file=<path_to_file>#
Specifies the path to the cluster’s public key file.
- Argument
Path to the file.
- --timeout=<timeout>#
The connection timeout between the client and the server. Available units: d, h, min, s, ms”)
- Argument
A timeout value.