Push down and rewrite queries
PolarDB-X can push down physical SQL statements to the storage layer to be performed. This is an optimization method of query rewrite. The automatic split feature that is provided by PolarDB-X is used to optimize execution plans to push down operations to the storage layer. If specific operations that are included in a query are pushed down to the storage layer to be performed, the query is optimized. This way, the queried data is filtered at the storage layer, the data transmission traffic is reduced, and the operations are performed in a parallel manner.
Background
PolarDB-X pushes down specific operations to the storage layer based on the basic principles of query optimization. The following operations can be pushed down to the storage layer:
Join operations.
Filter operations that are specified by using operators such as
WHERE
andHAVING
.Calculate operations that are specified by using operators such as
COUNT
andGROUP BY
.
- Sort operations that are specified by using operators such as
ORDER BY
.
Deduplicate operations that are specified by using operators such as
DISTINCT
.Operations that are performed by using functions such as
NOW()
.Subqueries.
You can use the EXPLAIN OPTIMIZER \
Push down project operations and filter operations
The following example shows the process in which an execution plan for an SQL statement is generated. The filter operation and the project operation are pushed down to the LogicalView operator. When the filter operation and the project operation are pushed down to the storage layer to be performed, the query is optimized. This way, the queried data is filtered at the storage layer and the data transmission traffic is reduced.
EXPLAIN OPTIMIZER select c_custkey,c_name from customer where c_custkey = 1;
c_custkey is a partition key of the table.
Push down limit operations and sort operations
The following example shows the process in which an execution plan for an SQL statement is generated. The sort operation and the limit operation are pushed down to the LogicalView operator. When the sort operation and the limit operation are pushed down to the storage layer to be performed, the query is optimized. This way, the queried data is filtered at the storage layer, the data transmission traffic is reduced, the operations are performed in a parallel manner, and the memory usage of the PolarDB-X database is reduced.
EXPLAIN OPTIMIZER select * from customer order by c_name limit 10
Push down aggregate operations
The following example shows the process in which an execution plan for an SQL statement is generated. The aggregate operation is pushed down to the LogicalView
operator.
When the aggregate operation is pushed down to the storage layer to be performed, the query is optimized. This way, the queried data is filtered at the storage layer, the data transmission traffic is reduced, the operations that are included in the query are performed in a parallel manner, and the memory usage of the PolarDB-X database is reduced.
EXPLAIN OPTIMIZER select count(*) from customer group by c_nationkey;
The following figure shows how an execution plan is generated for the SQL statement when c_nationkey is a partition key of the table.
The following figure shows how an execution plan is generated for the SQL statement when c_nationkey is not a partition key of the table.
JoinClustering
When an SQL statement specifies that the queried data must be obtained by joining multiple tables, PolarDB-X uses the JoinClustering operator to re-sort the join operations and places the JOIN clauses that can be pushed down to the storage layer in adjacent positions. This way, the join operations can be pushed down as expected. The following example shows how to join three tables.
In this example, the original join order is that the system joins the t2 table and the t1 table, and then joins the t1 table and the l2 table. After the JoinClustering operator re-sorts the join operations, the join operation that is performed on the t2 and l2 tables is pushed down to the LogicalView operator.
EXPLAIN select t2.id from t2 join t1 on t2.id = t1.id join l2 on t1.id = l2.id;
Project(id="id")
HashJoin(condition="id = id AND id = id0", type="inner")
Gather(concurrent=true)
LogicalView(tables="t2_[0-3],l2_[0-3]", shardCount=4, sql="SELECT `t2`.`id`, `l2`.`id` AS `id0` FROM `t2` AS `t2` INNER JOIN `l2` AS `l2` ON (`t2`.`id` = `l2`.`id`) WHERE (`t2`.`id` = `l2`.`id`)")
Gather(concurrent=true)
LogicalView(tables="t1", shardCount=2, sql="SELECT `id` FROM `t1` AS `t1`")
Push down join operations
If you want to push down join operations, the following conditions must be met:
- The sharding strategies of the two tables that are joined must be consistent. A sharding strategy specifies the database sharding keys, the table sharding keys, the sharding function, and the number of shards.
- In the join condition, the shard key values that are specified for the two tables must be consistent.
Join operations that are used to join a broadcast table and a table of another type are pushed down to the storage layer to be performed.
EXPLAIN OPTIMIZER select * from t1, t2 where t1.id = t2.id;
The following example shows the process in which an execution plan for an SQL statement is generated. The join operation is pushed down to the LogicalView operator. When the join operation is pushed down to the storage layer to be performed, the query can be performed in a parallel manner and the query efficiency is accelerated.
Push down subqueries
The following example shows the process in which an execution plan for an SQL statement is generated. The subquery statement that is included in the SQL statement is pushed down to the LogicalView
operator. When the subquery statement is pushed down to the storage layer to be executed, the query can be performed in a parallel manner and the query efficiency is accelerated.
The system converts the subquery to a
SemiJoin
orAntiJoin
operation.If the conditions that are described in the "Push down join operations" section are met, the
SemiJoin
orAntiJoin
operation is pushed down to theLogicalView
operator.In the storage layer, the
SemiJoin
orAntiJoin
is converted back to a subquery.
EXPLAIN OPTIMIZER select * from t1 where id in (select id from t2);