SQL query optimization rules
This topic describes the basic optimization principles and execution plans that PolarDB-X uses to process SQL statements.
Distributed databases are different from standalone databases in the database architecture. The methods used to optimize SQL queries on standalone databases can be modified to optimize SQL queries on distributed databases. When you use PolarDB-X, you can identify the causes of slow SQL queries based on the statistics, execution plans, rules that define the degree of parallelism, and the reported time that is consumed to run SQL queries. This way, you can optimize SQL queries based on your business scenario.
Slow queries can occur if the execution of physical SQL queries is slow, an excessive number of queries are run in parallel, or an inappropriate query execution plan or index is used. Therefore, the costs of SQL query optimization in distributed databases are higher than the costs of SQL query optimization in standalone databases.
Basic principles
PolarDB-X is a distributed database service in which computing is decoupled from storage. The SELECT queries that you send to PolarDB-X are logical SQL queries. When a SELECT query is sent to a compute node in PolarDB-X, PolarDB-X classifies the subqueries in the SELECT query into two types based on whether the subqueries can be pushed down. The subqueries that can be pushed down are known as physical SQL queries. The SQL queries that cannot be pushed down are run on compute nodes. The SQL queries that can be pushed down are run on data nodes. PolarDB-X optimizes SQL queries based on the following rules:
PolarDB-X pushes down as many logical SQL queries as possible and runs these queries on data nodes. This reduces the network latency between data nodes and compute nodes. The SQL queries that are pushed down can be run on database shards in parallel. This way, the resources of each data node are used in an efficient manner and the SQL queries are accelerated.
The optimizer of PolarDB-X selects an optimal method to execute the physical operators that cannot be pushed down. For example, the optimizer selects the physical operators and rules that define the degree of parallelism based on your business scenario, and determines whether to run SQL queries in massively parallel processing (MPP) mode. Note Degree of parallelism: the maximum number of queries that can be processed by PolarDB-X in parallel. For compute nodes, the degree of parallelism indicates the parallel processing capability of CPUs that have multiple cores. This capability determines the maximum number of threads that can be run by the compute nodes in parallel. For data nodes, the degree of parallelism indicates the maximum number of physical SQL queries that can be pushed down and run in parallel.
- Local indexes and global indexes are used in PolarDB-X. A local index is created on a single data node. The local index is also knows as the MySQL index. A global index is a distributed index that is created on multiple data nodes. An index that is suitable for your business scenario can improve the query performance of your PolarDB-X instance.