1、MySQL SQL执行计划分析与优化CONTENTS1. 读懂最简单的SQL2. “聪明的” SQL优化器3. Join 方式4. 子查询5. 视图6. 有”缺陷的” MySQL优化器如何纠正读懂最简单的SQL 深入学习的基础. 一个最简单的SQL的执行计划但含有丰富的信息.1.1 范围查找range与等值查找refmysql explain select * from score where sid=0 and sid explain select * from score where sid=1 G* 1. row *select_type: SIMPLEtable: scoreid: 1
2、partitions: NULLtype: rangepossible_keys: sidkey: sidselect_type: SIMPLEtable: scorepartitions: NULLtype: refpossible_keys: sidkey: sid对比观察:Key相同Type不同key_len: 5ref: NULLrows: 13key_len: 5filtered: 100.00Extra: Using index condition1 row in set, 1 warning (0.01 sec)ref: constrows: 13filtered: 100.00
3、Extra: NULL1 row in set, 1 warning (0.00 sec)51.2 范围查找range与全表扫描allmysql explain select * from score where sid=2 and sid explain select sid from score where sid in (7,6) G* 1. row *select_type: SIMPLEtable: scorepartitions: NULLtype: ALLid: 1select_type: SIMPLEtable: scorepartitions: NULLtype: range
4、possible_keys: sidkey: sidpossible_keys: sidkey: NULL为什么where 条件为sidin(7,6)是范围查找?而 的查询为 查找?key_len: NULLref: NULLrows: 91filtered: 28.57sid=1 refkey_len: 5ref: NULLrows: 26filtered: 100.00Extra: Using where1 row in set, 1 warning (0.00 sec)Extra: Using where; Using index1 row in set, 1 warning (0.00
5、 sec)61.3 全索引查找indexmysql explain select sid from score G* 1. row *id: 1select_type: SIMPLEtable: scorepartitions: NULLtype: indexpossible_keys: NULLkey: sidkey_len: 10ref: NULLrows: 91filtered: 100.00Extra: Using index1 row in set, 1 warning (0.00 sec)7“聪明的”SQL优化器 优化器到底有多聪明? 为什么可以这么聪明?2.1 为什么改变查找方式
6、?mysql explain select * from score where sid=0 and sid explain select * from score where sid=2 and sid explain select sid from score where sid in (7,6) G* 1. row *mysql explain select sid from score where sid in (7,8) G* 1. row *id: 1select_type: SIMPLEid: 1table: scorepartitions: NULLtype: rangesel
7、ect_type: SIMPLEtable: scorepartitions: NULLpossible_keys: sidkey: sidrows的值如何计算的?type: rangepossible_keys: sidkey: sid准吗?key_len: 5ref: NULLrows: 26filtered: 100.00Extra: Using where; Using index1 row in set, 1 warning (0.00 sec)key_len: 5ref: NULLrows: 14filtered: 100.00Extra: Using where; Using i
8、ndex1 row in set, 1 warning (0.01 sec)112.3 未执行就”知道”filtered的值?mysql explain select * from score where sid=2 and score 50 G* 1. row *mysql explain select * from score where sid=2 and score 80G* 1. row *id: 1select_type: SIMPLEtable: scorepartitions: NULLtype: refid: 1select_type: SIMPLEtable: scorep
9、artitions: NULLtype: refpossible_keys: sid,idx_scorekey: sidFiltered值准吗?key_len: 5possible_keys: sid,idx_scorekey: sidref: constrows: 13filtered: 100.00Extra: Using where1 row in set, 1 warning (0.00 sec)key_len: 5ref: constrows: 13filtered: 28.57Extra: Using where1 row in set, 1 warning (0.00 sec)1
10、2mysql explain select * from score where sid=6 and score 80 G* 1. row *mysql explain select * from score where sid=2 and score 80G* 1. row *id: 1select_type: SIMPLEtable: scorepartitions: NULLtype: refid: 1select_type: SIMPLEtable: scorepartitions: NULLtype: refpossible_keys: sid,idx_scorekey: sidFi
11、ltered值为什么是28.57?key_len: 5Possible_keys: sid,idx_scorekey: sidref: constrows: 13filtered: 28.57key_len: 5ref: constrows: 13filtered: 28.57Extra: Using where1 row in set, 1 warning (0.00 sec)Extra: Using where1 row in set, 1 warning (0.00 sec)返回结果行数=13*28.57% ?132.4 如何计算filtered值filtered (JSON name:
12、 filtered)The filtered column indicates an estimated percentage of table rows that will be filtered by thetable condition. That is, rows shows the estimated number of rows examined and rows filtered/ 100 shows the number of rows that will be joined with previous tables.官方手册上仅介绍filtered值来自估算。如何估算?142
13、.4.2 filtered值的计算逻辑范围优化器是否已对执行计划所选访问方法未使calculate_condition_filter函数用的谓词进行行估计,如果是,则范围优化器的估计值将用作这些字段的过滤效果的计算。这样做是因为范围优化器比索引统计更准确。是/*If the range optimizer has made rowconst float selectivity=estimates for predicates that are not usedby the chosen access method, the estimatefrom the range optimizer is
14、 used as filteringeffect for those fields. We do this becausethe range optimizer is more accurate thanindex statistics.static_cast(table-quick_rowskeyno) /static_cast(tab-records();filter*= std:min(selectivity, 1.0f);Quick_rowkeyno的值来自于优化器对where条件中列进行范围查找时预计将输出多少行。具体值来自函数check_quick_select*/内容来自mysq
15、l5.7.18源代码注释/*Get filtering effect for predicates that are not alreadyreflected in filter. The below call gets this filtering effectbased on index statistics and guesstimates.*/filter*=tab-join()-where_cond-get_filtering_effect(tab-table_ref-map(),used_tables,&table-tmp_set,static_cast(tab-records()
16、;因此,filtered值计算不一定来自索引统计信息,也可能来自范围优化器对满足该范围的行数的预估。范围优化器比索引统计更准确。2.4.3 filtered计算案例分析mysql explain select * from score where sid=2 and score80 G* 1. row *id: 1mysql select count(*) from score where score80;select_type: SIMPLEtable: scorepartitions: NULLtype: refpossible_keys: sid,idx_scorekey: sid+-+
17、| count(*) |+-+|26 |+-+1 row in set (25.27 sec)key_len: 5ref: constrows: 13mysql select count(*) from score ;+-+| count(*) |+-+filtered: 28.57Extra: Using where1 row in set, 1 warning (7 min 15.94 sec)|91 |+-+1 row in set (5.06 sec)26/91=0.2857142.2.4.4 filtered计算调试182.4.4 filtered计算调试192.5 为什么改变查找方
18、式(回顾)mysql explain select * from score where sid=0 and sid explain select * from score where sid=2 and sidrow_evaluate_cost(static_cast(records) + 1;Cost_estimate cost_est= head-file-table_scan_cost();cost_est.add_io(1.1);cost_est.add_cpu(scan_time);(gdb) p cost_est$31 = io_cost = 2.1000000000000001
19、, cpu_cost = 19.199999999999999, import_cost = 0, mem_cost = 0(gdb) p (0.20000000000000001*91)+1$12 = 19.19999999999999921select count(*) from score结果为912. 计算索引查找时的成本:*cost= read_cost(keyno, static_cast(n_ranges),static_cast(total_rows);cost-add_cpu(cost_model-row_evaluate_cost(static_cast(total_row
20、s) + 0.01);*cost= read_cost(keyno, static_cast(n_ranges),static_cast(total_rows);cost-add_cpu(cost_model-row_evaluate_cost(static_cast(total_rows) + 0.01);(gdb) p *cost$53 = io_cost = 27, cpu_cost = 5.21, import_cost = 0, mem_cost = 0(gdb) p 26*0.20000000000000001+0.01$55 = 5.21select count(*) from
21、score where sid=2 and sid explain select * from score where sid=0 and sid=0 and sid explain select s.name,sc.cid,sc.score- from student s , score sc where sc.sid=s.id G* 1. row *mysql alter table score drop key sid;mysql alter table student drop primary key ;mysql explain select s.name,sc.cid,sc.sco
22、re- from student s , score sc where sc.sid=s.id G* 1. row *id: 1id: 1select_type: SIMPLEtable: spartitions: NULLtype: ALLpossible_keys: PRIMARYselect_type: SIMPLEtable: spartitions: NULLtype: ALLpossible_keys: NULLkey: NULL第一个表访问方式不变(dont care),ref all,第二个表的访问方式从 变成了 然key: NULLkey_len: NULLref: NULL
23、rows: 21filtered: 100.00后join方式变成了block nested loopkey_len: NULLref: NULLrows: 21filtered: 100.00Extra: NULLExtra: NULL* 2. row * 2. row *id: 1id: 1select_type: SIMPLEselect_type: SIMPLE有没有遇到过关联中的内表走索引table: scpartitions: NULLtype: reftable: scpartitions: NULLtype: ALLpossible_keys: NULLkey: NULL查找有
24、时可能比全表扫描慢,而且可possible_keys: sidkey: sid能是慢非常多的情况?key_len: 5key_len: NULLref: NULLrows: 91ref: xcytest.s.idrows: 12filtered: 100.00Extra: NULL2 rows in set, 1 warning (0.00 sec)filtered: 10.00Extra: Using where; Using join buffer (Block Nested Loop)2 rows in set, 1 warning (0.01 sec)27子查询 可能有曾听说过在MyS
25、QL上不建议使用子查询 可曾试想过MySQL对子查询是如何处理的,通过嵌套方式层层处理? 但不是所有的子查询是真正的子查询 .4 子查询优化方式The MySQL query optimizer has different strategies available to evaluate subqueries:For IN (or =ANY) subqueries, the optimizer has these choices:Semi-joinMaterializationEXISTS strategyFor NOT IN (or ALL) subqueries, the optimize
26、r has these choices:MaterializationEXISTS strategyFor derived tables, the optimizer has these choices (which also apply to view references):Merge the derived table into the outer query block内容来自mysql 5.7refman29Materialize the derived table to an internal temporary table4.1 子查询变成内连接mysql explain sel
27、ect * from student where id in ( select sid from score where cid=2) G* 1. row *mysql show warnings;id: 1select_type: SIMPLEtable: scorepartitions: NULLtype: indexpossible_keys: sidkey: sidselect xcytest.student.id AS id,xcytest.student.name AS namefrom xcytest.score join xcytest.studentwhere (xcytes
28、t.student.id = xcytest.score.sid) and(xcytest.score.cid = 2)key_len: 10ref: NULLrows: 91filtered: 10.00Extra: Using where; Using index* 2. row *id: 1select_type: SIMPLEtable: studentpartitions: NULL为什么这个子查询可以转变成了内连接?转换成内连接的好处是什么?type: eq_refpossible_keys: PRIMARYkey: PRIMARYkey_len: 4ref: xcytest.sc
29、ore.sidrows: 1filtered: 100.00Extra: NULL302 rows in set, 1 warning (0.03 sec)4.1 子查询变成内连接mysql explain select sid,cid,score from score sc where sc.sid in ( select id fromstudent ) and sc.score 60 G* 1. row *id: 1select_type: SIMPLEtable: studentpartitions: NULLmysql show warnings;select xcytest.sc.
30、sid AS sid,xcytest.sc.cid AS cid,xcytest.sc.score AS score“from xcytest.student join xcytest.score sc“where (xcytest.sc.sid = xcytest.student.id)and (xcytest.sc.score 60)type: indexpossible_keys: PRIMARYkey: PRIMARYkey_len: 4ref: NULLrows: 21filtered: 100.00Extra: Using index* 2. row *id: 1select_ty
31、pe: SIMPLEtable: scpartitions: NULLtype: refpossible_keys: sid,idx_scorekey: sidkey_len: 5为什么这个子查询也转变成了内连接?ref: xcytest.student.idrows: 1331filtered: 100.00Extra: Using where子查询转换成半连接条件1011 DBUG_PRINT(info, (Checking if subq can be converted to semi-join);1012 /*1013101410151016101710181019102010211
32、022102310241025102610271028 */Check if were in subquery that is a candidate for flattening into asemi-join (which is done in flatten_subqueries(). The requirements are:1. Subquery predicate is an IN/=ANY subquery predicate2. Subquery is a single SELECT (not a UNION)3. Subquery does not have GROUP BY
33、4. Subquery does not use aggregate functions or HAVING5. Subquery predicate is (a) in an ON/WHERE clause, and (b) atthe AND-top-level of that clause.6. Parent query block accepts semijoins (i.e we are not in a subquery ofa single table UPDATE/DELETE (TODO: We should handle this at somepoint by switc
34、hing to multi-table UPDATE/DELETE)7. Were not in a confluent table-less subquery, like SELECT 1.8. No execution method was already chosen (by a prepared statement)9. Parent select is not a confluent table-less select10. Neither parent nor child select have STRAIGHT_JOIN option.32子查询转换成半连接条件代码1029 if
35、 (semijoin_enabled(thd) &103010311032103310341035103610371038103910401041104210431044 1045in_predicate &!is_part_of_union() &!group_list.elements &!m_having_cond & !with_sum_func &(outer-resolve_place = st_select_lex:RESOLVE_CONDITION | / 5aouter-resolve_place = st_select_lex:RESOLVE_JOIN_NEST) & /
36、5a!outer-semijoin_disallowed &outer-sj_candidates &leaf_table_count &in_predicate-exec_method =Item_exists_subselect:EXEC_UNSPECIFIED & / 8outer-leaf_table_count & / 9!(active_options() | outer-active_options() &SELECT_STRAIGHT_JOIN)/ 1/ 2/ 3/ 4/ 5b/ 6/ 7/10DBUG_PRINT(info, (Subquery is semi-join co
37、nversion candidate);半连接进一步转换为内连接/*Pull tables out of semi-join nests based on functional dependenciesparam join The join where to do the semi-join table pulloutreturn False if successful, true if error (Out of memory)detailsPull tables out of semi-join nests based on functional dependencies,ie. if a
38、 table is accessed via eq_ref(outer_tables).The function may be called several times, the caller is responsiblefor setting up proper key information that this function acts upon.NOTETable pullout may make uncorrelated subquery correlated. Consider thisexample:. WHERE oe IN (SELECT it1.primary_key WH
39、ERE p(it1, it2) . )here table it1 can be pulled out (we have it1.primary_key=oe which givesus functional dependency).Making the subquery (i.e. its semi-join nest) correlated prevents us fromusing Materialization or LooseScan to execute it. */static bool pull_out_semijoin_tables(JOIN *join)4.2 利用MATE
40、RIALIZED处理半连接mysql explain select * from student where id in ( select sid fromscore) G* 1. row *mysql show warnings;id: 1select_type: SIMPLEtable: studentpartitions: NULLtype: ALLPossible_keys: PRIMARYkey: NULLselect xcytest.student.id AS id,xcytest.student.name AS namefrom xcytest.student semi join
41、key_len: NULLref: NULLrows: 21filtered: 100.00Extra: Using where* 2. row * (“xcytest.score)id: 1select_type: SIMPLEtable: partitions: NULLtype: eq_refpossible_keys: key: key_len: 5where (.sid =xcytest.student.id)ref: xcytest.student.idrows: 1filtered: 100.00Extra: NULL* 3. row *id: 2select_type: MAT
42、ERIALIZEDtable: score通过MATERIALIZED的方式对子查询生成临时表,然后跟外表进行1vs1等值连接eq_ref。partitions: NULLtype: indexpossible_keys: sidkey: sid35key_len: 10ref: NULL4.3 利用loosescan处理半连接mysql explain select * from xcytest where a in ( select b from xcytestb) ;+-+-+-+-+-+-+-+-+-+-+-+-+| id | select_type | table | partiti
43、ons | type | possible_keys | key | key_len | ref | rows |filtered | Extra|+-+-+-+-+-+-+-+-+-+-+-+-+| 1 | SIMPLE | xcytestb | NULL| Using index; LooseScan| index | b| b | 5| NULL | 2 | 100.00| 1 | SIMPLE | xcytest | NULL| ALL | PRIMARY| NULL | NULL | NULL | 2| 50.00 | Using where; Using join buffer (
44、Block Nested Loop) |+-+-+-+-+-+-+-+-+-+-+-+-+364.4 其他处理半连接策略the firstmatch, loosescan, duplicateweedout, andmaterialization flags enable finer control over the permittedsemi-join strategies-请参考mysql 5.7 refman374.5 非常特殊的“子查询”mysql explain select * from testa a where a.id in (select b.id from testb b
45、 where b.id=100);+-+-+-+-+-+-+-+-+-+-+-+-+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra|+-+-+-+-+-+-+-+-+-+-+-+-+| 1 | SIMPLE | NULL | NULLmatching row in const table | NULL | NULL| NULL | NULL | NULL | NULL | NULL | no+-+-+-+-+-+-+-+-
46、+-+-+-+-+1 row in set, 1 warning (0.00 sec)在分析阶段就知道了这个查询将为空?384.6 常量表转换(gdb) bt#0 ha_innobase:index_read (this=0 x7f067c058d00, buf=0 x7f067c059110 377,key_ptr=0 x7f067c06e930 004, key_len=4, find_flag=HA_READ_KEY_EXACT)#1 0 x0000000000f9fba4 in handler:index_read_map (this=0 x7f067c058d00, buf=0 x7
47、f067c059110 377,key=0 x7f067c06e930 004, keypart_map=1, find_flag=HA_READ_KEY_EXACT)#4 0 x000000000154eb1b in read_const (table=0 x7f067c058350, ref=0 x7f067c073448)#5 0 x000000000154e5fe in join_read_const_table (tab=0 x7f067c0732e0, pos=0 x7f067c0735c0).#7 0 x0000000001579ad5 in JOIN:make_join_pla
48、n (this=0 x7f067c072d40)#8 0 x000000000156e576 in JOIN:optimize (this=0 x7f067c072d40)#9 0 x00000000015e5ee6 in st_select_lex:optimize (this=0 x7f067c053f70, thd=0 x7f067c01bf40) #100 x00000000015e4642 in handle_query (thd=0 x7f067c01bf40, lex=0 x7f067c01e238,result=0 x7f067c06dc38, added_options=0,
49、 removed_options=0) at /mysqldata/mysql-5.7.18/sql/sql_select.cc:164394.7 必须手工优化的子查询例如:select * from ( select id from testa union select id fromtestb ) out_a_b where out_a_b.id=8;select * from ( select id from testa union all select idfrom testb ) out_a_b where out_a_b.id=8;。无法转化成半连接的子查询404.7.1 手工优化
50、子查询方法1.将子查询上拉,将子查询跟外层表平级,将嵌套查询变成关联查询。 -消除子查询2. 将查询的条件下沉,使子查询能够快速执行且生成的结果集变小。-快速物理化子查询41视图/派生表 采用跟外表合并的方式优化 谨慎使用包含union/union all的视图/派生表5.1 视图/派生表的优化方式For derived tables, the optimizer has these choices (whichalso apply to view references):Merge the derived table into the outer query blockMaterialize