本文同时发表在
关于MySQL的loose index scan有几点疑问,欢迎看到这篇文章的人一起探讨。
测试表结构:
CREATE TABLE `test` ( `id` int(11) NOT NULL default '0', `v1` int(10) unsigned NOT NULL default '0', `v2` int(10) unsigned NOT NULL default '0', `v3` int(10) unsigned NOT NULL default '0', PRIMARY KEY (`id`), KEY `v1_v2_v3` (`v1`,`v2`,`v3`)) ENGINE=InnoDB DEFAULT CHARSET=utf8select * from test;+----+----+-----+----+| id | v1 | v2 | v3 |+----+----+-----+----+| 1 | 1 | 0 | 1 || 2 | 3 | 1 | 2 || 10 | 4 | 10 | 10 || 0 | 4 | 100 | 0 || 3 | 4 | 100 | 3 || 5 | 5 | 9 | 5 || 8 | 7 | 3 | 8 || 7 | 7 | 4 | 7 || 30 | 8 | 15 | 30 |+----+----+-----+----+ select version();+-------------+| version() |+-------------+| 5.0.51b-log |+-------------+
由此我们可以大致画出索引的结构:
下面说下我纠结的实验过程:
mysql> explain select max(v3) from test where v1>3 group by v1,v2; +----+-------------+-------+-------+---------------+----------+---------+------+------+---------------------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+-------+---------------+----------+---------+------+------+---------------------------------------+| 1 | SIMPLE | test | range | v1_v2_v3 | v1_v2_v3 | 8 | NULL | 1 | Using where; Using index for group-by |+----+-------------+-------+-------+---------------+----------+---------+------+------+---------------------------------------+1 row in set (0.00 sec)
一般来说,MySQL的索引扫描需要定义一个起点和终点,即使需要的数据只是这段索引中的几个,MySQL仍然需要扫描这段索引中的每一个条目,将它们返回给Sever层,Server层根据where条件将存储引擎返回的数据再进行一遍过滤。
但是根据官方文档的描述,这段sql会用到松散扫描索引,那么我有一点疑问:MySQL的loose index scan的工作原理究竟是怎样的呢?我有个猜想:
如上图,MySQL根据索引的前2列(v1,v2)来分组,此时先不读取v3列的值。MySQL扫描时发现(v1,v2)的分组值发生变化时,取上一个节点的v3值(因为是B-Tree,v3的值在相同的(v1,v2)中肯定是有序的,并且是从小到大)。这样的话MySQL就少扫描了一个v3的值。当(v1,v2)重复的越多,MySQL少扫描的v3列的次数越多。
如果MySQL全部读取的话,存储引擎需要将全部的数据返回给Server层,Server层还需要自己判断max(v3),莫不如在扫描索引的时候顺便读取max(v3)了。当我马上就要说服自己的时候,突然发现Explain结果的rows值为1。难道说MySQL估算只扫描一行就能算出结果?这时,我通过如下命令来看MySQL是如何扫描索引的:
mysql> flush status; Query OK, 0 rows affected (0.00 sec)mysql> select max(v3) from test where v1>3 group by v1,v2; +---------+| max(v3) |+---------+| 10 || 3 || 5 || 8 || 7 || 30 |+---------+6 rows in set (0.00 sec)mysql> show session status like 'Handler_%';+----------------------------+-------+| Variable_name | Value |+----------------------------+-------+| Handler_commit | 0 || Handler_delete | 0 || Handler_discover | 0 || Handler_prepare | 0 || Handler_read_first | 0 || Handler_read_key | 15 || Handler_read_next | 0 || Handler_read_prev | 0 || Handler_read_rnd | 0 || Handler_read_rnd_next | 0 || Handler_rollback | 0 || Handler_savepoint | 0 || Handler_savepoint_rollback | 0 || Handler_update | 0 || Handler_write | 14 |+----------------------------+-------+15 rows in set (0.00 sec)
结果令我很惊奇,我们重点观察这两行数据:
| Handler_read_key | 15 || Handler_read_next | 0 |
Handler_read_next为0(Handler_read_next的意思是按照索引叶子节点顺序读取下一个节点的次数。)。说明MySQL根本没有按照我上面的意思顺序扫描v1>3的叶子节点,到此只有三种解释了:
- 我猜想的MySQL松散扫描的方法不正确。
show session status like 'Handler_%'
存在bug,没有计算出正确的值。explain
方法的rows列的估算方法存在bug,没有正确的估算出扫描的行数。
在bugs.mysql.com上我找到了有人存在跟我一样的疑惑:
罢了,我换了个MySQL版本:
mysql> select version();+-----------+| version() |+-----------+| 5.7.14 |+-----------+1 row in set (0.00 sec)
执行同样的查询:
mysql> flush status;Query OK, 0 rows affected (0.00 sec)mysql> explain select max(v3) from test where v1>3 group by v1,v2\G;*************************** 1. row *************************** id: 1 select_type: SIMPLE table: test partitions: NULL type: rangepossible_keys: v1_v2_v3 key: v1_v2_v3 key_len: 4 ref: NULL rows: 7 filtered: 100.00 Extra: Using where; Using index1 row in set, 1 warning (0.00 sec)mysql> select max(v3) from test where v1>3 group by v1,v2;+---------+| max(v3) |+---------+| 10 || 3 || 5 || 8 || 7 || 30 |+---------+6 rows in set (0.00 sec)mysql> show session status like 'Handler_%';+----------------------------+-------+| Variable_name | Value |+----------------------------+-------+| Handler_commit | 1 || Handler_delete | 0 || Handler_discover | 0 || Handler_external_lock | 2 || Handler_mrr_init | 0 || Handler_prepare | 0 || Handler_read_first | 0 || Handler_read_key | 1 || Handler_read_last | 0 || Handler_read_next | 7 || Handler_read_prev | 0 || Handler_read_rnd | 0 || Handler_read_rnd_next | 0 || Handler_rollback | 0 || Handler_savepoint | 0 || Handler_savepoint_rollback | 0 || Handler_update | 0 || Handler_write | 0 |+----------------------------+-------+18 rows in set (0.00 sec)
我惊喜的发现,换了版本之后,explain输出的rows值正常了,为7。show session status like 'Handler_%';
输出的值看起来也正常了:
| Handler_read_key | 1 || Handler_read_last | 0 || Handler_read_next | 7 |
正当我以为是MySQL5.7修复了统计的bug时,我突然发现explain的Extra是这样的:
Extra: Using where; Using index
等等,这是什么鬼,这说明MySQL在运行这条Sql时根本没有使用松散扫描索引,怪不得统计输出结果是正常的。
我在stackoverflow上关于loose index scan的提问:
参考资料: