Importing and Exporting Data using Batch Tool

This document describes the method of importing and exporting data using the Batch Tool.


Batch Tool designed for providing data import and export services for PolarDB-X database. It combines the characteristics of distributed databases to achieve one-stop and efficient functionality such as importing and exporting from files, offline data migration across databases (MySQL/PolarDB-X 1.0/PolarDB-X 2.0), and also supports batch updates and deletions based on text files (experimental feature).

Open Source Repository

Batch Tool is open-sourced on GitHub. Click on the links below to access the corresponding repositories:

Documentation

https://github.com/polardb/polardbx-tools/blob/main/batch-tool/docs/usage-details.md

Source Code Repository

https://github.com/polardb/polardbx-tools/tree/main/batch-tool

Download

https://github.com/polardb/polardbx-tools/releases/

Quick Start

For common scenarios and troubleshooting, please refer to the document: usage-details

Parameter Introduction

usage:

   java -jar batch-tool.jar --help
   usage: BatchTool [-batchsize <size>] [-col <col1;col2;col3>] [-comp <NONE | GZIP>] [-con <consumer count>]
       [-config <filepath>] [-cs <charset>] [-D <database>] [-DDL <NONE | ONLY | WITH>] [-dir <directory
       path>] [-encrypt <NONE | AES | SM4>] [-error <max error count>] [-f <filepath1;filepath2>] [-F <file
       count>] [-fcon <parallelism>] [-format <NONE | TXT | CSV | XLS | XLSX>] [-func <true | false>] [-h
       <host>] [-H <filepath>] [-header <true | false>] [-help] [-i <true | false>] [-in <true | false>]
       [-initSqls <sqls>] [-key <string-type key>] [-L <line count>] [-lastSep <true | false>] [-lb <true |
       false>] [-local <true | false>] [-mask <Json format config>] [-maxConn <max connection>] [-maxWait
       <wait time(ms)>] [-minConn <min connection>] [-noEsc <true | false>] [-o <operation>] [-O <asc | desc>]
       [-OC <col1;col2;col3>] [-p <password>] [-P <port>] [-para <true | false>] [-param
       <key1=val1&key2=val2>] [-perf <true | false>] [-pre <prefix>] [-pro <producer count>] [-quote <AUTO |
       FORCE | NONE>] [-readsize <size(MB)>] [-rfonly <true | false>] [-ringsize <size (power of 2)>] [-s
       <separator char or string>] [-sharding <true | false>] [-t <tableName>] [-tps <tps limit>] [-u
       <username>] [-v] [-w <where condition>]
     -batchsize,--batchSize <size>                          Batch size of insert.
     -col,--columns <col1;col2;col3>                        Target columns for export.
     -comp,--compress <NONE | GZIP>                         Export or import compressed file (default NONE).
     -con,--consumer <consumer count>                       Configure number of consumer threads.
     -config,--configFile <filepath>                        Use yaml config file.
     -cs,--charset <charset>                                The charset of files.
     -D,--database <database>                               Database name.
     -DDL,--DDL <NONE | ONLY | WITH>                        Export or import with DDL sql mode (default NONE).
     -dir,--directory <directory path>                      Directory path including files to import.
     -encrypt,--encrypt <NONE | AES | SM4>                  Export or import with encrypted file (default NONE).
     -error,--maxError <max error count>                    Max error count threshold, program exits when the
                                                            limit is exceeded.
     -f,--file <filepath1;filepath2>                        Source file(s).
     -F,--filenum <file count>                              Fixed number of exported files.
     -fcon,--forceConsumer <parallelism>                    Configure if allow force consumer parallelism.
     -format,--fileFormat <NONE | TXT | CSV | XLS | XLSX>   File format (default NONE).
     -func,--sqlFunc <true | false>                         Use sql function to update (default false).
     -h,--host <host>                                       Host of database.
     -H,--historyFile <filepath>                            History file name.
     -header,--header <true | false>                        Whether the header line is column names (default
                                                            false).
     -help,--help                                           Help message.
     -i,--ignore <true | false>                             Flag of insert ignore and resume breakpoint (default
                                                            false).
     -in,--whereIn <true | false>                           Using where cols in (values).
     -initSqls,--initSqls <sqls>                            Connection init sqls (druid).
     -key,--secretKey <string-type key>                     Secret key used during encryption.
     -L,--line <line count>                                 Max line limit of one single export file.
     -lastSep,--withLastSep <true | false>                  Whether line ends with separator (default false).
     -lb,--loadbalance <true | false>                       Use jdbc load balance, filling the arg in $host like
                                                            'host1:port1,host2:port2' (default false).
     -local,--localMerge <true | false>                     Use local merge sort (default false).
     -mask,--mask <Json format config>                      Masking sensitive columns while exporting data.
     -maxConn,--maxConnection <max connection>              Max connection count (druid).
     -maxWait,--connMaxWait <wait time(ms)>                 Max wait time when getting a connection.
     -minConn,--minConnection <min connection>              Min connection count (druid).
     -noEsc,--noEscape <true | false>                       Do not escape value for sql (default false).
     -o,--operation <operation>                             Batch operation type: export / import / delete /
                                                            update.
     -O,--orderby <asc | desc>                              Order by type: asc / desc.
     -OC,--orderCol <col1;col2;col3>                        Ordered column names.
     -p,--password <password>                               Password of user.
     -P,--port <port>                                       Port number of database.
     -para,--paraMerge <true | false>                       Use parallel merge when doing order by export
                                                            (default false).
     -param,--connParam <key1=val1&key2=val2>               Jdbc connection params.
     -perf,--perfMode <true | false>                        Use performance mode at the sacrifice of compatibility
                                                            (default false).
     -pre,--prefix <prefix>                                 Export file name prefix.
     -pro,--producer <producer count>                       Configure number of producer threads (export /
                                                            import).
     -quote,--quoteMode <AUTO | FORCE | NONE>               The mode of how field values are enclosed by
                                                            double-quotes when exporting table (default AUTO).
     -readsize,--readSize <size(MB)>                        Read block size.
     -rfonly,--readFileOnly <true | false>                  Only read and process file, no sql execution (default
                                                            false).
     -ringsize,--ringSize <size (power of 2)>               Ring buffer size.
     -s,--sep <separator char or string>                    Separator between fields (delimiter).
     -sharding,--sharding <true | false>                    Whether enable sharding mode (default value depends on
                                                            operation).
     -t,--table <tableName>                                 Target table.
     -tps,--tpsLimit <tps limit>                            Configure of tps limit (default -1: no limit).
     -u,--user <username>                                   User for login.
     -v,--version                                           Show batch-tool version.
     -w,--where <where condition>                           Where condition: col1>99 AND col2<100 ...

The command parameters are primarily divided into two categories:

  • Database connection configuration, which includes:
    • Basic connection information: host, port, user, password, etc.
    • Connection pool configuration, max_connections, min_connection_connections, etc.
    • JDBC connection string parameters.
  • Batch processing configuration, which includes:
    • Batch processing function parameters:
      • Target table and file name lists.
      • Delimiter, whether it ends with a delimiter, character set, quotation escape, and other text file read/write parameters
      • Export configuration: file count, line count, etc.
      • Import configuration: insert ignore, resumable import, etc.
      • SQL conditions: where, order by, etc.
      • Algorithm: Compression algorithm, encryption algorithm, data anonymization algorithm
      • File format: csv, excel, txt, etc.
    • Batch processing performance parameters
      • Producer and consumer parallelism settings
      • RingBuffer buffer, batch count, read file block size, etc.
      • Pre-partitioning, local merge, etc.
      • TPS (transactions per second) throttling related parameters

Scenario Examples

By default, the file delimiter for import and export is ,, and the character set is utf-8.

  1. Assuming you need to export the customer table from the tpc-h database (sharding mode):

    1.1. By default, the number of files for export is equal to the number of table shards:

     java -jar batch-tool.jar -P 3306 -h 127.0.0.1 -u user_test -p 123456 -D tpch -o export -t customer -s ,
    

    1.2. Export to three files:

     java -jar batch-tool.jar -P 3306 -h 127.0.0.1 -u user_test -p 123456 -D tpch -o export -t customer -s , -F 3
    

    1.3. Export to multiple files with a maximum of 100000 lines per file:

     java -jar batch-tool.jar -P 3306 -h 127.0.0.1 -u user_test -p 123456 -D tpch -o export -t customer -s , -L 100000
    

    1.4. Specify the WHERE condition:

     java -jar batch-tool.jar -P 3306 -h 127.0.0.1 -u user_test -p 123456 -D tpch -o export -t customer -s , -w "c_nationkey=10"
    

    1.5. Specify the quote mode If the text field contains the delimiter:

     java -jar batch-tool.jar -P 3306 -h 127.0.0.1 -u user_test -p 123456 -D tpch -o export -t customer -s , -quote force
    
  2. Assuming you need to import a csv file into a created table named lineitem in the tpch database (sharding mode): 2.1.Importing a single file:
     java -jar batch-tool.jar -P 3306 -h 127.0.0.1 -u user_test -p 123456 -D tpch -o import -t customer -s , -f "./data/lineitem.tbl"
    
    2.2. Importing all files in a directory:
     java -jar batch-tool.jar -P 3306 -h 127.0.0.1 -u user_test -p 123456 -D tpch -o import -t customer -s , -dir "./data/lineitem/"
    

results matching ""

    No results matching ""