MySQL 深度分页是指在分页查询数据量比较大的表时,需要访问表中的某一段数据,而这段数据的位置非常靠后,需要通过较大的 offset 来获取目标数据。
-- 测试表
drop table if exists wide_table;
create table wide_table
(
id bigint auto_increment primary key,
a varchar(255),
b varchar(255),
c varchar(255),
d varchar(255),
e varchar(255),
f varchar(255),
g varchar(255),
h varchar(255),
i varchar(255),
create_time datetime default current_timestamp
);
-- 插入十万条记录
delimiter //
drop procedure if exists insert_data;
create procedure insert_data()
begin
declare i int default 1;
while i <= 100000
do
insert into wide_table (a, b, c, d, e, f, g, h, i)
values (i, i, i, i, i, i, i, i, i);
set i = i + 1;
if i % 10000 = 0 then
select i;
end if;
end while;
end //
delimiter ;
call insert_data();
默认分页即通过 limit #{offset}, #{pageSize}
或 limit #{pageSize} offset #{offset}
来进行分页。二者本质上都是全表扫描,MySQL 会依次取出 pageSize 条记录,然后判断其是否在 offset 后,如果不在则舍弃,继续过滤,所以效率低下。
test> select *
from wide_table
limit 80000, 100
[2024-01-18 14:55:42] 在 335 ms (execution: 298 ms, fetching: 37 ms) 内检索到从 1 开始的 100 行
test> select *
from wide_table
limit 100 offset 80000
[2024-01-18 14:55:43] 在 282 ms (execution: 233 ms, fetching: 49 ms) 内检索到从 1 开始的 100 行
explain
select *
from wide_table
limit 80000, 100;
# +--+-----------+----------+----------+----+-------------+----+-------+----+-----+--------+-----+
# |id|select_type|table |partitions|type|possible_keys|key |key_len|ref |rows |filtered|Extra|
# +--+-----------+----------+----------+----+-------------+----+-------+----+-----+--------+-----+
# |1 |SIMPLE |wide_table|null |ALL |null |null|null |null|99551|100 |null |
# +--+-----------+----------+----------+----+-------------+----+-------+----+-----+--------+-----+
explain
select *
from wide_table
limit 100 offset 80000;
# +--+-----------+----------+----------+----+-------------+----+-------+----+-----+--------+-----+
# |id|select_type|table |partitions|type|possible_keys|key |key_len|ref |rows |filtered|Extra|
# +--+-----------+----------+----------+----+-------------+----+-------+----+-----+--------+-----+
# |1 |SIMPLE |wide_table|null |ALL |null |null|null |null|99551|100 |null |
# +--+-----------+----------+----------+----+-------------+----+-------+----+-----+--------+-----+
索引分页即通过索引字段(一般选取主键)的范围查询以及比较操作进行分页,通过应用索引能够大幅缩短查询时间。
test> select *
from wide_table
where id between 80000 and 80100
[2024-01-18 15:02:27] 在 224 ms (execution: 184 ms, fetching: 40 ms) 内检索到从 1 开始的 101 行
test> select *
from wide_table
where id > 80000
limit 100
[2024-01-18 14:58:34] 在 218 ms (execution: 185 ms, fetching: 33 ms) 内检索到从 1 开始的 100 行
explain
select *
from wide_table
where id between 80000 and 80100;
# +--+-----------+----------+----------+-----+-------------+-------+-------+----+----+--------+-----------+
# |id|select_type|table |partitions|type |possible_keys|key |key_len|ref |rows|filtered|Extra |
# +--+-----------+----------+----------+-----+-------------+-------+-------+----+----+--------+-----------+
# |1 |SIMPLE |wide_table|null |range|PRIMARY |PRIMARY|8 |null|101 |100 |Using where|
# +--+-----------+----------+----------+-----+-------------+-------+-------+----+----+--------+-----------+
explain
select *
from wide_table
where id > 80000
limit 100;
# +--+-----------+----------+----------+-----+-------------+-------+-------+----+-----+--------+-----------+
# |id|select_type|table |partitions|type |possible_keys|key |key_len|ref |rows |filtered|Extra |
# +--+-----------+----------+----------+-----+-------------+-------+-------+----+-----+--------+-----------+
# |1 |SIMPLE |wide_table|null |range|PRIMARY |PRIMARY|8 |null|39420|100 |Using where|
# +--+-----------+----------+----------+-----+-------------+-------+-------+----+-----+--------+-----------+
索引分页的局限性在于待查询的表必须有一个自增长的主键,如果主键是 UUID 就无法进行索引分页了,此外在编码时还需要考虑索引断裂导致的重复查询。
我们也可以通过子查询来优化分页,即先通过默认分页操作找到限定的主键,这个操作能够利用主键索引而且只考虑主键这一个字段,因此效率比较高。然后在根据这个主键进行条件查询,从而得到分页数据。
test> select *
from wide_table
where id >= (select id from wide_table limit 80000, 1)
limit 100
[2024-01-18 15:03:34] 在 236 ms (execution: 202 ms, fetching: 34 ms) 内检索到从 1 开始的 100 行
explain
select *
from wide_table
where id >= (select id from wide_table limit 80000, 1)
limit 100;
# +--+-----------+----------+----------+-----+-------------+-------+-------+----+-----+--------+-----------+
# |id|select_type|table |partitions|type |possible_keys|key |key_len|ref |rows |filtered|Extra |
# +--+-----------+----------+----------+-----+-------------+-------+-------+----+-----+--------+-----------+
# |1 |PRIMARY |wide_table|null |range|PRIMARY |PRIMARY|8 |null|39420|100 |Using where|
# |2 |SUBQUERY |wide_table|null |index|null |PRIMARY|8 |null|99551|100 |Using index|
# +--+-----------+----------+----------+-----+-------------+-------+-------+----+-----+--------+-----------+
延迟关联主要通过减少回表次数来提高查询效率。
分析下面的执行计划:
整个查询过程中只有最终的 100 条记录发生了回表,其余记录都只查询了 id 并被过滤掉了,提高了查询的效率。
test> select *
from wide_table
inner join (select id from wide_table limit 80000, 100) as wt
on wide_table.id = wt.id
[2024-01-18 15:03:36] 在 241 ms (execution: 207 ms, fetching: 34 ms) 内检索到从 1 开始的 100 行
explain
select *
from wide_table
inner join (select id from wide_table limit 80000, 100) as wt
on wide_table.id = wt.id;
# +--+-----------+----------+----------+------+-------------+-------+-------+-----+-----+--------+-----------+
# |id|select_type|table |partitions|type |possible_keys|key |key_len|ref |rows |filtered|Extra |
# +--+-----------+----------+----------+------+-------------+-------+-------+-----+-----+--------+-----------+
# |1 |PRIMARY |<derived2>|null |ALL |null |null |null |null |80100|100 |null |
# |1 |PRIMARY |wide_table|null |eq_ref|PRIMARY |PRIMARY|8 |wt.id|1 |100 |null |
# |2 |DERIVED |wide_table|null |index |null |PRIMARY|8 |null |99551|100 |Using index|
# +--+-----------+----------+----------+------+-------------+-------+-------+-----+-----+--------+-----------+