Optimize data import and export

Data is frequently imported to or exported from databases. This topic describes how to import data to and export data from PolarDB-X databases.

Test environment

The following table describes the test environment that is used to perform tests in this topic.

Specification Requirement
PolarDB-X version polarx-kernel_5.4.11-16282307_xcluster-20210805
Node specifications 16 CPU cores, 64 GB memory
Number of nodes 4

Run the following command to create a test table:

CREATE TABLE `sbtest1` (
    `id` int(11) NOT NULL,
    `k` int(11) NOT NULL DEFAULT '0',
    `c` char(120) NOT NULL DEFAULT '',
    `pad` char(60) NOT NULL DEFAULT '',
    PRIMARY KEY (`id`),
    KEY `k_1` (`k`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4 dbpartition by hash(`id`);

Data import and export methods

The following methods are used to export data from PolarDB-X databases:

  • Run the mysql -e command to export data.
  • Use the musqldump tool to export data.
  • Execute the SELECT INTO OUTFILE statement to export data. By default, this method is disabled.
  • Use Batch Tool to export data. Batch Tool is automatically enabled after you purchase a PolarDB-X instance.

The following methods are used to import data to PolarDB-X databases:

  • Execute the SOURCE statement to import data.
  • Run the mysql command to import data.
  • Use a program to export data.
  • Execute the LOAD DATA statement to import data.
  • Use Batch Tool to import data. Batch Tool is automatically enabled after you purchase a PolarDB-X instance.

Use MySQL native commands to export or import data

Run the mysql -e command to connect to a local or remote server and then execute SQL statements. For example, you can execute the SELECT statement to query data. The values in the raw data are separated with tab characters. You can use string manipulation functions to change the separator to commas (,) and save the file in the .csv format. The following code provides an example:

mysql -h ip  -P port -u usr -pPassword db_name -N -e "SELECT id,k,c,pad FROM sbtest1;" >/home/data_1000w.txt
## The values in the raw data are separated with tab characters. For example, the returned result of this command is 188092293    27267211    59775766593-64673028018-...-09474402685    01705051424-...-54211554755.

mysql -h ip  -P port -u usr -pPassword db_name -N -e "SELECT id,k,c,pad FROM sbtest1;" | sed 's/\t/,/g' >/home/data_1000w.csv
## After you execute this command, the values in the output are separated with commas (,). For example, the returned result is 188092293,27267211,59775766593-64673028018-...-09474402685,01705051424-...-54211554755.

To import data of the original format, execute the LOAD DATA statement. For more information, see LOAD DATA. The following code provides an example:

LOAD DATA LOCAL INFILE '/home/data_1000w.txt' INTO TABLE sbtest1;
## LOCAL specifies that data is imported from a local file. The value of the local_infile parameter must be set to on.

To import a file in the .csv format, use a program. For more information, see Use a program to import data.

Use the mysqldump tool to export or import data

You can use the mysqldump tool to connect to a local or remote server. For more information, see Use mysqldump to migrate data.

  • Run the following command to export data by using the mysqldump tool:

    mysqldump -h ip  -P port -u usr -pPassword --default-character-set=utf8mb4 --net_buffer_length=10240 --no-tablespaces --no-create-db --no-create-info --skip-add-locks --skip-lock-tables --skip-tz-utc --set-charset  --hex-blob db_name [table_name] > /home/dump_1000w.sql
    

The following list provides the errors that can occur when you use the mysqldump tool to export data and the operations that you can perform to resolve the issues. In most cases, the errors occur because the version of the MySQL client and the version of the MySQL server do not match.

  1. Error 1: mysqldump: Couldn't execute 'SHOW VARIABLES LIKE 'gtid\_mode''

    Solution: Add the "--set-gtid-purged=OFF" parameter setting to disable gtid_mode.

  1. Error 2: mysqldump: Couldn't execute 'SHOW VARIABLES LIKE 'ndbinfo\_version''

    Solution: Make sure that the version of mysqldump and the version of the MySQL server match with each other.

The exported data is displayed as an SQL statement. The result includes the Batch Insert statement and other SQL statements. For example, the following statement is included: INSERT INTO `sbtest1` VALUES (...),(...). The net_buffer_length parameter affects the batch size.

  • You can use one of the following methods to import data as an SQL statement:

    Method 1: Execute the SOURCE statement.
    source /home/dump_1000w.sql
    
    Method 2: Run the mysql command.
    mysql -h ip  -P port -u usr -pPassword --default-character-set=utf8mb4 db_name < /home/dump_1000w.sql
    

Use Batch Tool to export or import data

Batch Tool is a data import and export tool developed by Alibaba Cloud. This tool supports multithreading.

  • Export data

    ## Export files. The number of exported files is equal to the number of shards.
    java -jar batch-tool.jar -h ip  -P port -u usr -pPassword -D db_name -o export -t sbtest1 -s ,
    
    ## Export multiple files and merge the files into one file.
    java -jar batch-tool.jar -h ip  -P port -u usr -pPassword -D db_name -o export -t sbtest1 -s , -F 1
    
  • Import data

    ## Import 32 files.
    java -jar batch-tool.jar -hpxc-spryb387va****.polarx.singapore.rds.aliyuncs.com  -P3306 -uroot -pPassword -D sysbench_db -o import -t sbtest1 -s , -f "sbtest1_0;sbtest1_1;sbtest1_2;sbtest1_3;sbtest1_4;sbtest1_5;sbtest1_6;sbtest1_7;sbtest1_8;sbtest1_9;sbtest1_10;sbtest1_11;sbtest1_12;sbtest1_13;sbtest1_14;sbtest1_15;sbtest1_16;sbtest1_17;sbtest1_18;sbtest1_19;sbtest1_20;sbtest1_21;sbtest1_22;sbtest1_23;sbtest1_24;sbtest1_25;sbtest1_26;sbtest1_27;sbtest1_28;sbtest1_29;sbtest1_30;sbtest1_31" -np -pro 64 -con 32
    
    ## Import one file.
    java -jar batch-tool.jar -h ip  -P port -u usr -p password -D db_name -o import -t sbtest1 -s , -f "sbtest1_0" -np
    

Compare the methods that are used to export data

The following data is based on the test of exporting 10,000,000 rows from PolarDB-X. Approximately 2 GB of data is exported.

Method Data format File size Duration Number of rows that are exported per second Volume of data that is exported per second (MB/s)
Run the mysql -e command to export raw data Raw data format 1998MB 33.417s 299248 59.8
Run the mysql -e command to export data in the .csv format CSV format 1998MB 34.126s 293031 58.5
Use the mysqldump tool to export data (net-buffer-length=10KB) SQL statements 2064MB 30.223s 330873 68.3
Use the mysqldump tool to export data (net-buffer-length=200KB) SQL statements 2059MB 32.783s 305036 62.8
Use Batch Tool to export data (number of files = number of shards = 32) CSV format 1998MB 4.715s 2120890 423.7
Use Batch Tool to export 1 file CSV format 1998MB 5.568s 1795977 358.8

Summary:

  1. The mysql -e command and the mysqldump tool support single-threaded operations and provide similar performance.
  1. Batch Tool support multi-thread operations. You can configure the number of concurrent threads. This improves the export performance.

Compare the methods that are used to import data

The following data is based on the test of importing 10,000,000 rows into PolarDB-X. Data of approximately 2 GB is imported.

Method Data format Duration Imported rows per second Amount of data imported per second (MB/s)
SOURCE statement (net-buffer-length=10KB) SQL statement format 10m24s 16025 3.2
SOURCE statement (net-buffer-length=200KB) SQL statement format 5m37s 29673 5.9
mysql command (net-buffer-length=10KB) SQL statement format 10m27s 15948 3.2
mysql command (net-buffer-length=200KB) SQL statement format 5m38s 29585 5.9
LOAD DATA statement Raw data format 4m0s 41666 8.3
Use a program to import data (batch size = 1000, number of threads = 1) CSV format 5m40s 29411 5.9
Use a program to import data (batch size = 1000, number of threads = 32) CSV format 19s 526315 105.3
Use Batch Tool to import data (number of files = number of shards = 32) CSV format 19.836s 504133 100.8
Use Batch Tool to import 1 file CSV format 10.806s 925411 185.1

Summary:

  1. The SOURCE statement and mysql command support single-thread SQL statements. In these methods, the Batch Insert statement is used. The batch size affects the import performance. The batch size is related to the net-buffer-length that you configure when you use mysqldump to export data. We recommend that you perform the following operations to optimize the import process:{#40f8d781365dg}

    • Set the value of the net-buffer-length parameter to a value that is less than or equal to 256 KB. This increases the batch size and improves the import performance.
  • Use third-party tools to back up data and import data in multiple threads. For example, you can use mydumper that is provided by mysqldump to back up data and use myloader that is provided by mysqldump to import data.
  1. The LOAD DATA statement is executed on a single thread and provides better performance than the mysql command and the SOURCE statement.
  1. If you use a program to import data, you can configure the batch size and the number of concurrent threads to improve the performance. We recommend that you set the batch size to 1000 and the concurrency to a value from 16 to 32.
  1. Batch Tool supports multi-threaded import operations. This tool is suitable for importing data from multiple shards of a distributed database and provides high performance.

Summary

  1. PolarDB-X is compatible with common data import and export methods that are used to perform O\&M operations by using MySQL. Most of the methods are provided for standalone MySQL databases and support only single-threaded operations. These methods cannot fully use the distributed architecture of PolarDB-X databases.
  1. PolarDB-X provides Batch Tool. This tool is suitable for importing and exporting data from distributed databases. Batch Tool supports multi-threaded operations that provide high-performance data export and import capabilities.

results matching ""

    No results matching ""