Deploying with PXD
PXD is a rapid deployment tool for PolarDB-X, supporting the quick installation and startup of the PolarDB-X distributed database on a set of servers by configuring through Yaml files. Before deploying the PolarDB-X database as described in this article, we assume the following prerequisites have been met:
A set of servers has been prepared, and server parameter configuration and software installation have been completed following System and Environment Configuration; The deployment machine (ops) has internet access, or you have already prepared the relevant software packages and images in advance as per Software Package Download.
Preparation
First, we need to install python3 and virtual environment on the deployment machine (ops).
Installing Python3
If python3 is already installed on your machine, you can skip this step.
Python3 check command:
which python3
If there is a return, it means python3 is installed.
yum install -y python3
Installing Docker
PXD operates PolarDB-X databases within containers, making Docker installation a prerequisite. The use of a private image registry is also recommended. Instructions for Docker deployment can be found in the Installing Docker and Image Repository guide.
Installing PXD
Online Installation
For deployment machines (ops) with internet connectivity, online installation is the suggested method.
Installation and usage of PXD tool within a virtual environment is advised.
Begin by creating and activating a Python3 virtual environment with these commands:
python3 -m venv venv
source venv/bin/activate
Proceed to update the Pip tool:
pip install --upgrade pip
To download and install the PXD tool via Aliyun's Pypi mirror:
pip install -i https://mirrors.aliyun.com/pypi/simple/ pxd
Upon successful installation, you can verify the PXD version with:
pxd version
Offline Installation
If the deployment environment does not have internet access, please refer to the Software Package Download section, download the corresponding architecture's offline installation package, and copy the offline package to the deployment machine (ops) within the deployment environment. The directory structure of the PXD offline installation package is as follows:
polardbx-install
|-- images # Docker image directory
| |-- image.list # List of downloaded Docker images
| |-- image.manifest # Offline environment image manifest, script parameters
| |-- load_image.sh # Script to import images in an offline environment
| |-- polardbx-cdc-latest-arm64.tar.gz
| |-- polardbx-engine-latest-arm64.tar.gz
| |-- polardbx-init-latest-arm64.tar.gz
| |-- polardbx-sql-latest-arm64.tar.gz
| `-- xstore-tools-latest-arm64.tar.gz
|-- pxd-0.4.3-py3-none-any.whl # PXD installation package
`-- pxd-denpendency-arm64.tar.gz # PXD's dependency package
`-- install.sh # Installation script
On the deployment machine (ops) in the deployment environment, execute the following command to enter the offline installation package directory and install PXD with one click and import the Docker images:
cd polardbx-install
sh install.sh
The install.sh script primarily does the following:
- Import the Docker images in the images directory and push them to the specified private repository. This document uses registry:5000 as an example.
- Create a venv directory under the polardbx-install directory and install PXD within it.
After the installation script finishes executing, run the following command to verify that PXD is installed:
source polardbx-install/venv/bin/activate
pxd version
Directory Mapping
As PXD uses the $HOME/.pxd/data
path to map the container data directory, it's necessary to symlink the path to the data disk in advance.
ansible -i ${ini_file} all -m shell -a " mkdir -p /polarx/data "
ansible -i ${ini_file} all -m shell -a " mkdir -p \$HOME/.pxd "
ansible -i ${ini_file} all -m shell -a " ln -s /polarx/data \$HOME/.pxd/data "
Check if the path is correctly mapped:
ansible -i ${ini_file} all -m shell -a " df -lh \$HOME/.pxd/data "
Deploying PolarDB-X
Planning the Cluster Topology
A PolarDB-X database cluster consists of Compute Nodes (CN), Data Nodes (DN), Global Meta Service (GMS), and as an optional feature, Change Data Capture (CDC) Nodes. The minimum deployment scale for a production environment involves three servers, while a typical database cluster contains at least two compute nodes, more than two sets of data nodes, one set of GMS nodes, and optionally, one set of CDC nodes. Data Nodes (DN) and the Global Meta Service (GMS) achieve high availability through the X-Paxos protocol. Thus, each set of data nodes or metadata services should consist of three independent replica container instances, which need to be deployed on different servers.
The main principles for planning the PolarDB-X cluster topology based on different server models/quantities are:
- Due to different requirements for computing and I/O capabilities, it is recommended to deploy Compute Nodes (CN) and Data Nodes (DN) on two separate groups of servers.
- Select servers with better I/O capabilities and larger disk space to deploy Data Nodes (DN); servers with weaker I/O but superior CPU capabilities should be allocated for Compute Nodes (CN).
- If a single server has fewer than 16 CPUs, it is advisable to deploy one Compute Node (CN); otherwise, deploy multiple Compute Nodes.
- All Compute Nodes (CN) should be allocated the same server resources to prevent the "weakest link effect" in the cluster, where the node with the least resources becomes a performance bottleneck. All Data Nodes (DN), except for log replicas, should follow the same principle.
- The three replicas of the same set of Data Nodes should be deployed on three separate servers. The three replicas of different Data Nodes can be mixed on the same server to maximize server performance while ensuring high availability of the cluster.
- The Global Meta Service (GMS) stores the cluster's metadata, and its three replicas should be deployed on three separate servers to ensure high availability.
Here is a cluster topology planned according to the above principles:
Server | Server Specifications | Node | Container Limits | Number of Containers |
192.168.1.102 | 16c64G | CN1 | 16c32G | 1 |
192.168.1.103 | 16c64G | CN2 | 16c32G | 1 |
192.168.1.104 | 16c64G | CN3 | 16c32G | 1 |
192.168.1.105 | 32c128G | DN1-1(leader) | 16c50G | 4 |
DN2-3(log) | - | |||
DN3-2(follower) | 16c50G | |||
GMS-1 | 8c16G | |||
192.168.1.106 | 32c128G | DN1-2(follower) | 16c50G | 4 |
DN2-1(leader) | 16c50G | |||
DN3-3(log) | - | |||
GMS-2 | 8c16G | |||
192.168.1.107 | 32c128G | DN1-3(log) | - | 4 |
DN2-2(follower) | 16c50G | |||
DN3-1(leader) | 16c50G | |||
GMS-3 | 8c16G |
Note: The default memory limit for the DN's log node is set to 4GB, as the log only records events and requires minimal memory and CPU resources.
Preparing the Topology Configuration
First, you need to obtain the latest image tags for each PolarDB-X component with the following command:
curl -s "https://polardbx-opensource.oss-cn-hangzhou.aliyuncs.com/scripts/get-version.sh" | sh
The output will be as follows (taking PolarDB-X V2.4.0 as an example):
CN polardbx/polardbx-sql:v2.4.0_5.4.19
DN polardbx/polardbx-engine:v2.4.0_8.4.19
CDC polardbx/polardbx-cdc:v2.4.0_5.4.19
Based on the topology plan mentioned above, create the following Yaml file and modify the image tags according to the output:
vi polarx_pxd.yaml
Contents of the configuration file:
version: v1
type: polardbx
cluster:
name: pxc-product
gms:
image: registry:5000/polardbx-engine:v2.4.0_8.4.19
engine: galaxy
engine_version: "8.0"
host_group: [192.168.1.102, 192.168.1.103, 192.168.1.104]
resources:
mem_limit: 16G
cpu_limit: 8
cn:
image: registry:5000/polardbx-sql:v2.4.0_5.4.19
replica: 3
nodes:
- host: 192.168.1.102
- host: 192.168.1.103
- host: 192.168.1.104
resources:
mem_limit: 64G
cpu_limit: 16
dn:
image: registry:5000/polardbx-engine:v2.4.0_8.4.19
engine: galaxy
engine_version: "8.0"
replica: 3
nodes:
- host_group: [192.168.1.105, 192.168.1.106, 192.168.1.107]
- host_group: [192.168.1.106, 192.168.1.107, 192.168.1.105]
- host_group: [192.168.1.107, 192.168.1.105, 192.168.1.106]
resources:
mem_limit: 50G
cpu_limit: 16
Deploying the Database
Run the following command to quickly deploy a PolarDB-X cluster using PXD:
pxd create -file polarx_pxd.yaml -repo="registry:5000/"
Upon a successful creation, PXD will output in the log how to access the PolarDB-X data:
PolarDB-X cluster create successfully, you can try it out now.
Connect PolarDB-X using the following command:
mysql -h192.168.1.102 -P54674 -upolardbx_root -p******
mysql -h192.168.1.103 -P50236 -upolardbx_root -p******
mysql -h192.168.1.104 -P52400 -upolardbx_root -p******
After creation, it's recommended to use PXD check to automatically adjust the current leader distribution of Data Nodes (DN):
pxd check pxc-product -t dn -r true
Accessing the Database
If the MySQL client is installed on the deployment machine (ops), you can connect to the PolarDB-X database and start using it with the following command:
mysql -h192.168.1.102 -P54674 -Ac -upolardbx_root -p******
After connecting successfully via the MySQL client, you can execute the following SQL to display information about the storage nodes:
MySQL [(none)]> show storage;
An example of the output:
+--------------------+---------------------+------------+-----------+----------+-------------+--------+-----------+-------+--------+
| STORAGE_INST_ID | LEADER_NODE | IS_HEALTHY | INST_KIND | DB_COUNT | GROUP_COUNT | STATUS | DELETABLE | DELAY | ACTIVE |
+--------------------+---------------------+------------+-----------+----------+-------------+--------+-----------+-------+--------+
| pxc-product-dn-0 | 192.168.1.105:15420 | true | MASTER | 5 | 21 | 0 | false | null | null |
| pxc-product-dn-1 | 192.168.1.106:17907 | true | MASTER | 5 | 19 | 0 | true | null | null |
| pxc-product-dn-2 | 192.168.1.107:16308 | true | MASTER | 5 | 19 | 0 | true | null | null |
| pxc-product-gms | 192.168.1.105:17296 | true | META_DB | 2 | 2 | 0 | false | null | null |
+--------------------+---------------------+------------+-----------+----------+-------------+--------+-----------+-------+--------+
4 rows in set (0.01 sec)