GROUPING SETS, ROLLUP, and CUBE extensions
In relational databases, you must use multiple SELECT and UNION
statements to group results based on multiple groups of dimensions. PolarDB-X provides GROUPING SETS, ROLLUP, and CUBE extensions that allow you to group results based on multiple groups of dimensions. In addition, PolarDB-X allows you to use the GROUPING and GROUPING_ID functions in a SELECT statement or a HAVING clause. This can be used to explain the results of the preceding extensions. This topic describes the syntax and provides examples for the GROUPING SETS, ROLLUP, and CUBE extensions and the GROUPING and GROUPING_ID functions.
Note
In the syntax of all GROUP BY extensions described in this topic, SQL queries cannot be pushed down to the
LogicalView
operators for execution. For more information about SQL query pushdown, see Push down and rewrite queries.The following test data is used in the examples provided in this topic. Execute the following statement to create a table named
requests
:CREATE TABLE requests ( `id` int(10) UNSIGNED NOT NULL, `os` varchar(20) DEFAULT NULL, `device` varchar(20) DEFAULT NULL, `city` varchar(20) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE = InnoDB DEFAULT CHARSET = utf8 dbpartition BY hash(`id`) tbpartition BY hash(`id`);
Execute the following statement to insert the required test data into the
requests
table:INSERT INTO requests (id, os, device, city) VALUES (1, 'windows', 'PC', 'Beijing'), (2, 'windows', 'PC', 'Shijiazhuang'), (3, 'linux', 'Phone', 'Beijing'), (4, 'windows', 'PC', 'Beijing'), (5, 'ios', 'Phone', 'Shijiazhuang'), (6, 'linux', 'PC', 'Beijing'), (7, 'windows', 'Phone', 'Shijiazhuang');
GROUPING SETS extension
Overview
GROUPING SETS is an extension of the GROUP BY clause and can be used to generate a result set. The result set is a concatenation of multiple result sets based on different groups. The result returned by the GROUPING SETS extension is similar to that of the UNION ALL operator. The UNION ALL operator and the GROUPING SETS expansion do not remove duplicate rows from the combined result sets.
Syntax
GROUPING SETS ( { expr_1 | ( expr_1a [, expr_1b ] ...) | ROLLUP ( expr_list ) | CUBE ( expr_list ) } [, ...] )
Note A GROUPING SETS extension can contain one or more comma-separated expressions, such as
expr_1
or(expr_1a [, expr_1b ] ...)
, and lists of expressions enclosed in parentheses (), such as( expr_list )
. In the syntax:Each expression can be used to determine how the result set is grouped.
You can nest a ROLLUP or CUBE extension in a GROUPING SETS extension.
Examples
You can group the data that you want to query by using a GROUPING SETS extension. The following code block provides the relevant syntax:
select os,device, city ,count(*) from requests group by grouping sets((os, device), (city), ());
The preceding statement is equivalent to the following statement:
select os, device, NULL, count(*) from requests group by os, device union all select NULL, NULL, NULL, count(*) from requests union all select null, null, city, count(*) from requests group by city;
The following result is returned:
+---------+--------+--------------+----------+ | os | device | city | count(*) | +---------+--------+--------------+----------+ | windows | PC | NULL | 3 | | linux | PC | NULL | 1 | | linux | Phone | NULL | 1 | | windows | Phone | NULL | 1 | | ios | Phone | NULL | 1 | | NULL | NULL | Shijiazhuang | 3 | | NULL | NULL | Beijing | 4 | | NULL | NULL | NULL | 7 | +---------+--------+--------------+----------+
Note If an expression is not used in a grouping set, NULL is used as a placeholder for the expression. This way, operations can be performed on the result set that is not used in the grouping set. For example, the result set is a group of the rows for which the values of the
city
column are NULL in the returned result.
You can group data by nesting a ROLLUP extension in a GROUPING SETS extension. The following code block shows the relevant syntax:
select os,device, city ,count(*) from requests group by grouping sets((city), ROLLUP(os, device)); The preceding statement is equivalent to the following statement: select os,device, city ,count(*) from requests group by grouping sets((city), (os), (os, device), ());
The following result is returned:
+---------+--------+--------------+----------+ | os | device | city | count(*) | +---------+--------+--------------+----------+ | NULL | NULL | Shijiazhuang | 3 | | NULL | NULL | Beijing | 4 | | windows | PC | NULL | 3 | | linux | PC | NULL | 1 | | ios | Phone | NULL | 1 | | linux | Phone | NULL | 1 | | windows | Phone | NULL | 1 | | windows | NULL | NULL | 4 | | linux | NULL | NULL | 2 | | ios | NULL | NULL | 1 | | NULL | NULL | NULL | 7 | +---------+--------+--------------+----------+
You can group data by specifying a CUBE extension nested in a GROUPING SETS extension. The following code block shows the relevant syntax:
select os,device, city ,count(*) from requests group by grouping sets((city), CUBE(os, device)); The preceding statement is equivalent to the following statement: select os,device, city ,count(*) from requests group by grouping sets((city), (os), (os, device), (), (device));
The following result is returned:
+---------+--------+--------------+----------+ | os | device | city | count(*) | +---------+--------+--------------+----------+ | NULL | NULL | Beijing | 4 | | NULL | NULL | Shijiazhuang | 3 | | windows | PC | NULL | 3 | | ios | Phone | NULL | 1 | | linux | Phone | NULL | 1 | | windows | Phone | NULL | 1 | | linux | PC | NULL | 1 | | windows | NULL | NULL | 4 | | ios | NULL | NULL | 1 | | linux | NULL | NULL | 2 | | NULL | PC | NULL | 4 | | NULL | Phone | NULL | 3 | | NULL | NULL | NULL | 7 | +---------+--------+--------------+----------+
You can combine the GROUP BY clause and the CUBE, and GROUPING SETS extensions to generate grouping sets, as shown in the following example:
select os,device, city, count(*) from requests group by os, cube(os,device), grouping sets(city); The preceding statement is equivalent to the following statement: select os,device, city, count(*) from requests group by grouping sets((os,device,city),(os,city),(os,device,city));
The following result is returned:
+---------+--------+--------------+----------+ | os | device | city | count(*) | +---------+--------+--------------+----------+ | linux | Phone | Beijing | 1 | | windows | Phone | Shijiazhuang | 1 | | windows | PC | Shijiazhuang | 1 | | linux | PC | Beijing | 1 | | windows | PC | Beijing | 2 | | ios | Phone | Shijiazhuang | 1 | | linux | NULL | Beijing | 2 | | windows | NULL | Shijiazhuang | 2 | | windows | NULL | Beijing | 2 | | ios | NULL | Shijiazhuang | 1 | +---------+--------+--------------+----------+
ROLLUP extension
Overview
A ROLLUP extension generates a hierarchical set of groups. This set contains a grand total and subtotals for each hierarchical group. The hierarchical order is determined by the order of the expressions that are specified in the ROLLUP expression list. The top of the hierarchy is the first item from the left side of the list. Each successive item that proceeds to the right side moves down the hierarchy. The last item in the hierarchy is at the lowest level.
Syntax
ROLLUP ( { expr_1 | ( expr_1a [, expr_1b ] ...) } [, expr_2 | ( expr_2a [, expr_2b ] ...) ] ...)
Note
Each expression is used to determine how the result set is grouped. If the expressions are enclosed in parentheses (), such as
( expr_1a, expr_1b, ...)
, the combination of values returned byexpr_1a
andexpr_1b
indicates a single grouping level of the hierarchy.For the first item in the list, such as
expr_1
or the combination of( expr_1a, expr_1b, ...)
, PolarDB-X returns a subtotal for each unique value of the first item. For the second item in the list, such asexpr_2
or the combination of( expr_2a, expr_2b, ...)
, PolarDB-X returns a subtotal for each unique value of each group in the second item. Similar rules are used in each grouping level of the subsequent items. As a result, PolarDB-X returns a grand total for the entire result set.For the subtotal rows, NULL is returned for the items for which the subtotal is taken.
Examples
ROLLUP is used to aggregate
(os, device, city)
in a hierarchical manner to generate grouping sets. The following code block shows the relevant syntax:select os,device, city, count(*) from requests group by rollup (os, device, city); The preceding statement is equivalent to the following statement: select os,device, city, count(*) from requests group by os, device, city with rollup; The first statement is also equivalent to the following statement: select os,device, city, count(*) from requests group by grouping sets ((os, device, city),(os, device),(os),());
The following result is returned:
+---------+--------+--------------+----------+ | os | device | city | count(*) | +---------+--------+--------------+----------+ | windows | PC | Beijing | 2 | | ios | Phone | Shijiazhuang | 1 | | windows | PC | Shijiazhuang | 1 | | linux | PC | Beijing | 1 | | linux | Phone | Beijing | 1 | | windows | Phone | Shijiazhuang | 1 | | windows | PC | NULL | 3 | | ios | Phone | NULL | 1 | | linux | PC | NULL | 1 | | linux | Phone | NULL | 1 | | windows | Phone | NULL | 1 | | windows | NULL | NULL | 4 | | ios | NULL | NULL | 1 | | linux | NULL | NULL | 2 | | NULL | NULL | NULL | 7 | +---------+--------+--------------+----------+
ROLLUP is used to aggregate
os, (os,device), and city
in a hierarchical manner to generate grouping sets. The following code block shows the relevant syntax:select os,device, city, count(*) from requests group by rollup (os, (os,device), city); The preceding statement is equivalent to the following statement: select os,device, city, count(*) from requests group by os, (os,device), city with rollup; The first statement is also equivalent to the following statement: select os,device, city, count(*) from requests group by grouping sets ((os, device, city),(os, device),(os),());
The following result is returned:
+---------+--------+--------------+----------+ | os | device | city | count(*) | +---------+--------+--------------+----------+ | windows | PC | Beijing | 2 | | windows | PC | Shijiazhuang | 1 | | linux | PC | Beijing | 1 | | linux | Phone | Beijing | 1 | | windows | Phone | Shijiazhuang | 1 | | ios | Phone | Shijiazhuang | 1 | | windows | PC | NULL | 3 | | linux | PC | NULL | 1 | | linux | Phone | NULL | 1 | | windows | Phone | NULL | 1 | | ios | Phone | NULL | 1 | | windows | NULL | NULL | 4 | | linux | NULL | NULL | 2 | | ios | NULL | NULL | 1 | | NULL | NULL | NULL | 7 | +---------+--------+--------------+----------+
CUBE extension
Overview
A CUBE extension is similar to a ROLLUP extension. A ROLLUP extension generates groups and results in a hierarchy based on a left-to-right list of items in the ROLLUP expression list. A CUBE extension generates groupings and subtotals based on each permutation of all items in the CUBE expression list. A CUBE extension returns more rows in the generated result set than a ROLLUP extension performed on the same expression list.
Syntax
CUBE ( { expr_1 | ( expr_1a [, expr_1b ] ...) } [, expr_2 | ( expr_2a [, expr_2b ] ...) ] ...)
Note
Each expression is used to determine how the result set is grouped. If the expressions are enclosed in parentheses (), such as
( expr_1a, expr_1b, ...)
, the combination of values that are returned byexpr_1a
andexpr_1b
defines a single group.For the first item in the list, such as
expr_1
or the combination of( expr_1a, expr_1b, ...)
, PolarDB-X returns a subtotal for each unique value of the first item. For the second item in the list, such asexpr_2
or the combination of( expr_2a, expr_2b, ...)
, PolarDB-X returns a subtotal for each unique value of the second item. A subtotal is also returned for each unique combination of the first and second items. If a third item exists, PolarDB-X returns a subtotal for each unique value of the third item, each unique combination of the third and first items, each unique combination of the third and second items, and each unique combination of the third, second, and first items. As a result, a grand total is returned for the entire result set.For the subtotal rows, NULL is returned for the items for which the subtotal is taken
Examples
CUBE lists all possible combinations of
(os, device, city)
columns as grouping sets. The following code block shows the relevant syntax:select os,device, city, count(*) from requests group by cube (os, device, city); The preceding statement is equivalent to the following statement: select os,device, city, count(*) from requests group by grouping sets ((os, device, city),(os, device),(os, city),(device,city),(os),(device),(city),());
The following result is returned:
+---------+--------+--------------+----------+ | os | device | city | count(*) | +---------+--------+--------------+----------+ | linux | Phone | Beijing | 1 | | windows | Phone | Shijiazhuang | 1 | | windows | PC | Beijing | 2 | | ios | Phone | Shijiazhuang | 1 | | windows | PC | Shijiazhuang | 1 | | linux | PC | Beijing | 1 | | linux | Phone | NULL | 1 | | windows | Phone | NULL | 1 | | windows | PC | NULL | 3 | | ios | Phone | NULL | 1 | | linux | PC | NULL | 1 | | linux | NULL | Beijing | 2 | | windows | NULL | Shijiazhuang | 2 | | windows | NULL | Beijing | 2 | | ios | NULL | Shijiazhuang | 1 | | linux | NULL | NULL | 2 | | windows | NULL | NULL | 4 | | ios | NULL | NULL | 1 | | NULL | Phone | Beijing | 1 | | NULL | Phone | Shijiazhuang | 2 | | NULL | PC | Beijing | 3 | | NULL | PC | Shijiazhuang | 1 | | NULL | Phone | NULL | 3 | | NULL | PC | NULL | 4 | | NULL | NULL | Beijing | 4 | | NULL | NULL | Shijiazhuang | 3 | | NULL | NULL | NULL | 7 | +---------+--------+--------------+----------+
CUBE lists all the possible combinations of
(os, device),(device, city)
columns as grouping sets.select os,device, city, count(*) from requests group by cube ((os, device), (device, city)); The preceding statement is equivalent to the following statement: select os,device, city, count(*) from requests group by grouping sets ((os, device, city),(os, device),(device,city),());
The following result is returned:
+---------+--------+--------------+----------+ | os | device | city | count(*) | +---------+--------+--------------+----------+ | linux | Phone | Beijing | 1 | | windows | Phone | Shijiazhuang | 1 | | windows | PC | Beijing | 2 | | windows | PC | Shijiazhuang | 1 | | linux | PC | Beijing | 1 | | ios | Phone | Shijiazhuang | 1 | | linux | Phone | NULL | 1 | | windows | Phone | NULL | 1 | | windows | PC | NULL | 3 | | linux | PC | NULL | 1 | | ios | Phone | NULL | 1 | | NULL | Phone | Beijing | 1 | | NULL | Phone | Shijiazhuang | 2 | | NULL | PC | Beijing | 3 | | NULL | PC | Shijiazhuang | 1 | | NULL | NULL | NULL | 7 | +---------+--------+--------------+----------+
GROUPING and GROUPING_ID functions
Overview
GROUPING function
When you use the GROUPING SETS, ROLLUP, or CUBE extensions in the GROUP BY clause, NULL is used as a placeholder in a return value of the GROUPING SETS extension. As a result, the placeholder NULL cannot be distinguished from the NULL value. In this case, you can use the GROUPING function provided by PolarDB-X to identify the placeholder and the NULL value.
The GROUPING function allows you to use a column name as a parameter. If the corresponding rows are aggregated based on the column, 0 is returned in the result. In this case, NULL is a value. If the corresponding rows are not aggregated based on the column, 1 is returned. In this case, NULL is a placeholder in a return value of the GROUPING SETS extension.
GROUPING_ID function
The GROUPING_ID function simplifies the GROUPING function. The GROUPING_ID function is used to determine the subtotal level of a row in the result set of a ROLLBACK, CUBE, or GROUPING SETS extension. The GROUPING function uses only one column expression and returns a value to indicate whether a row is a subtotal for all values of the specified column. Multiple GROUPING functions may be required to determine the levels of subtotals for queries that contain multiple grouping columns. The GROUPING_ID function supports one or more column expressions specified in the ROLLBACK, CUBE, or GROUPING SETS extensions and returns a single integer. This integer indicates the column on which a subtotal is aggregated.
Syntax
GROUPING function
SELECT [ expr ...,] GROUPING( col_expr ) [, expr ] ... FROM ... GROUP BY { ROLLUP | CUBE | GROUPING SETS }( [...,] col_expr [, ...] ) [, ...]
Note The GROUPING function uses a single parameter. This parameter must be an expression of a dimension column specified in the expression list of a ROLLUP, CUBE, or GROUPING SETS extension of the GROUP BY clause.
GROUPING_ID function
SELECT [ expr ...,] GROUPING_ID( col_expr_1 [, col_expr_2 ] ... ) [, expr ] ... FROM ... GROUP BY { ROLLUP | CUBE | GROUPING SETS }( [...,] col_expr_1 [, col_expr_2 ] [, ...] ) [, ...]
Examples The GROUPING_ID function uses multiple column names as parameters. The function also converts the grouping results of the parameter columns into integers by using the bitmap algorithm. The following code block shows the relevant syntax:
select a,b,c,count(*), grouping(a) ga, grouping(b) gb, grouping(c) gc, grouping_id(a,b,c) groupingid from (select 1 as a ,2 as b,3 as c) group by cube(a,b,c);
The following result is returned:
+------+------+------+----------+------+------+------+------------+ | a | b | c | count(*) | ga | gb | gc | groupingid | +------+------+------+----------+------+------+------+------------+ | 1 | 2 | 3 | 1 | 0 | 0 | 0 | 0 | | 1 | 2 | NULL | 1 | 0 | 0 | 1 | 1 | | 1 | NULL | 3 | 1 | 0 | 1 | 0 | 2 | | 1 | NULL | NULL | 1 | 0 | 1 | 1 | 3 | | NULL | 2 | 3 | 1 | 1 | 0 | 0 | 4 | | NULL | 2 | NULL | 1 | 1 | 0 | 1 | 5 | | NULL | NULL | 3 | 1 | 1 | 1 | 0 | 6 | | NULL | NULL | NULL | 1 | 1 | 1 | 1 | 7 | +------+------+------+----------+------+------+------+------------+