一條SQL引發(fā)的“血案”:與SQL優(yōu)化相關(guān)的4個案例
點擊關(guān)注上方“SQL數(shù)據(jù)庫開發(fā)”,
設(shè)為“置頂或星標”,第一時間送達干貨 SQL專欄 SQL基礎(chǔ)知識第二版
SQL高級知識第二版
導(dǎo)讀:筆者早年間從事了多年開發(fā)工作,后因個人興趣轉(zhuǎn)做數(shù)據(jù)庫。在長期的工作實踐中,看到了數(shù)據(jù)庫工作(特別是SQL優(yōu)化)面臨的種種問題。本文通過幾個案例探討一下SQL優(yōu)化的相關(guān)問題。
作者:馬立和 高振嬌 韓鋒
來源:大數(shù)據(jù)DT(ID:hzdashuju)

具體分析
SELECT /*+ INDEX (A1 xxxxx) */ SUM(A2.CRKSL), SUM(A2.CRKSL*A2.DJ) ...
FROM xxxx A2, xxxx A1
WHERE A2.CRKFLAG=xxx AND A2.CDATE>=xxx AND A2.CDATE<xxx;
分析結(jié)論
開發(fā)人員的一個疏忽造成了嚴重的后果,原來數(shù)據(jù)庫竟是如此的脆弱。需要對數(shù)據(jù)庫保持“敬畏”之心。 電腦不是人腦,它不知道你的需求是什么,只能根據(jù)寫好的邏輯進行處理。 不要去責(zé)怪開發(fā)人員,誰都會犯錯誤,關(guān)鍵是如何從制度上保證不再發(fā)生類似的問題。

create table t1 as select * from dba_objects where 1=0;
alter table t1 add id int primary key;
create table t2 as select * from dba_objects where 1=0;
alter table t2 add id varchar2(10) primary key;
insert into t1
select 'test','test','test',rownum,rownum,'test',sysdate,sysdate,'test','test','','','',rownum
from dual
connect by rownum<=3200000;
insert into t2
select 'test','test','test',rownum,rownum,'test',sysdate,sysdate,'test','test','','','',rownum
from dual
connect by rownum<=3200000;
commit;
execdbms_stats.gather_table_stats(ownname => 'hf',tabname => 't1',cascade =>true,estimate_percent => 100);
execdbms_stats.gather_table_stats(ownname => 'hf',tabname => 't2',cascade =>true,estimate_percent => 100);
select * from t1 where id>= 3199990;
11 rows selected.
--------------------------------------------------------------------------------
| Id | Operation | Name |Rows |Bytes|Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 11 | 693 | 4 (0) | 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T1 | 11 | 693 | 4 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN |SYS_C0025294| 11 | | 3 (0) | 00:00:01 |
---------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
6 consistent gets
0 physical readsselect * from t2 where id>= '3199990';
755565 rows selected.
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2417K| 149M| 8927 (2)| 00:01:48 |
|* 1 | TABLE ACCESS FULL| T2 | 2417K| 149M| 8927 (2)| 00:01:48 |
--------------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
82568 consistent gets
0 physical reads字符類型在索引中是“亂序”的,這是因為字符類型的排序方式與我們的預(yù)期不同。從“select * from t2 where id>= '3199990'”執(zhí)行返回755 565條記錄可見,不是直觀上的10條記錄。這也是當初在做表設(shè)計時,開發(fā)人員沒有注意的問題。 字符類型還導(dǎo)致了聚簇因子很大,原因是插入順序與排序順序不同。詳細點說,就是按照數(shù)字類型插入(1..3200000),按字符類型('1'...'32000000')t排序。
select table_name,index_name,leaf_blocks,num_rows,clustering_factor
from user_indexes
where table_name in ('T1','T2');
TABLE_NAME INDEX_NAME LEAF_BLOCKS NUM_ROWS CLUSTERING_FACTOR
-------------- -------------- ---------------- ---------- ---------------------
T1 SYS_C0025294 6275 3200000 31520
T2 SYS_C0025295 13271 3200000 632615
在對字符類型使用大于運算符時,會導(dǎo)致優(yōu)化器認為需要掃描索引大部分數(shù)據(jù)且聚簇因子很大,最終導(dǎo)致棄用索引掃描而改用全表掃描方式。
select * from t2 where id between '3199990' and '3200000';
--------------------------------------------------------------------------------
| Id | Operation | Name |Rows|Bytes |Cost(%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6| 390 | 5 (0)|00:00:01|
| 1 | TABLE ACCESS BY INDEX ROWID| T2 | 6| 390 | 5 (0)|00:00:01|
|* 2 | INDEX RANGE SCAN | SYS_C0025295 | 6| | 3 (0)|00:00:01|
--------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
13 consistent gets
0 physical reads糟糕的數(shù)據(jù)結(jié)構(gòu)設(shè)計往往是致命的,后期的優(yōu)化只是補救措施。只有從源頭上加以杜絕,才是優(yōu)化的根本。 在設(shè)計初期能引入數(shù)據(jù)庫審核,可以起到很好的作用。

select ... from ...
where
(
(
order_creation_date>= to_date(20120208,'yyyy-mm-dd') and
order_creation_date<to_date(20120209,'yyyy-mm-dd')
)
or
(
send_date>= to_date(20120208,'yyyy-mm-dd') and send_date<to_date(20120209, 'yyyy-mm-dd')
)
)
andnvl(a.bd_id,0) = 1
--------------------------------------------------------------------------------
| Id | Operation | Name |Cost (%CPU)| Time |Pstart | Pstop |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2470K(100)| | | |
| 1 | SORT GROUP BY | | | | | |
| 2 | TABLE ACCESS BY GLOBAL INDEX ROWID
| XXXX | 5 (0) | 00:00:01 | ROW L | ROW L |
| 3 | NESTED LOOPS | | 2470K (1) | 08:14:11 | | |
| 4 | VIEW |VW_NSO_1| 2470K (1) | 08:14:10 | | |
| 5 | FILTER | | | | | |
| 6 | HASH GROUP BY | | 2470K (1)| 08:14:10 | | |
| 7 | TABLE ACCESS BY GLOBAL INDEX ROWID
| XXXX | 5 (0)| 00:00:01 | ROW L | ROW L |
| 8 | NESTED LOOPS | | 2470K (1)| 08:14:10 | | |
| 9 | SORT UNIQUE | | 2340K (2)| 07:48:11 | | |
| 10 | PARTITION RANGE ALL
| | 2340K (2)| 07:48:11 | 1 | 92 |
| 11 | TABLE ACCESS FULL
| XXXX | 2340K (2)| 07:48:11 | 1 | 92 |
| 12 | INDEX RANGE SCAN
| XXXX | 3 (0)| 00:00:01 | | |
| 13 | INDEX RANGE SCAN | XXXX | 3 (0)| 00:00:01 | | |
--------------------------------------------------------------------------------
select ...
from ...
where
order_creation_date >= to_date(20120208,'yyyy-mm-dd') and
order_creation_date<to_date(20120209,'yyyy-mm-dd')
union all
select ...
from ...
where
send_date>= to_date(20120208,'yyyy-mm-dd') and
send_date<to_date(20120209,'yyyy-mm-dd') and
nvl(a.bd_id,0) = 5
select ...
from ...
where
(
(
order_creation_date>= to_date(20120208,'yyyymmdd') and
order_creation_date<to_date(20120209,'yyyymmdd')
)
or
(
send_date>= to_date(20120208,'yyyymmdd') and
send_date<to_date(20120209,'yyyymmdd')
)
);
--------------------------------------------------------------------------------
| Id | Operation | Name | Cost(%CPU)|Time | Pstart | Pstop |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 42358 (1)| 00:08:29 | | |
| 1 | SORT AGGREGATE | | | | | |
| 2 | CONCATENATION | | | | | |
| 3 | PARTITION RANGE SINGLE
| | 17393 (1)| 00:03:29 | 57 | 57 |
|* 4 | TABLE ACCESS FULL
| XXXX | 17393 (1)| 00:03:29 | 57 | 57 |
|* 5 | TABLE ACCESS BY GLOBAL INDEX ROWID
| XXXX | 24966 (1)| 00:05:00 | ROWID | ROWID |
|* 6 | INDEX RANGE SCAN
| XXXX | 658 (1)| 00:00:08 | | |
---------------------------------------------------------------------------------
規(guī)范的SQL寫法,不但利于提高代碼可讀性,還有利于優(yōu)化器生成更優(yōu)的執(zhí)行計劃。 分區(qū)功能是Oracle應(yīng)對大數(shù)據(jù)的利器,但在使用中要注意是否真正會用到分區(qū)特性;否則,可能適得其反,使用分區(qū)會導(dǎo)致效率更差。

select...
from xxx a join xxx b on a.order_id = b.lyywzdid
left join xxx c on b.gysid = c.gysid
whereb.cdate>= to_date('2012-03-31', 'yyyy-mm-dd') – 3 and ...
a.send_date>= to_date('2012-03-31', 'yyyy-mm-dd') - 1 and
a.send_date<to_date('2012-03-31', 'yyyy-mm-dd');
--------------------------------------------------------------------------------
|Id | Operation |Name | Rows | Bytes | Cost (%CPU) |Pstart|Pstop|
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 104 | 9743(1)| | |
| 1 | HASH JOIN OUTER | | 1 | 104 | 9743(1)| | |
| 2 | TABLE ACCESS BY LOCAL INDEX ROWID
| XXXX | 1 | 22 | 0(0)| 1189 | 1189|
| 3 | NESTED LOOPS | | 1 | 94 | 9739(1)| | |
| 4 | PARTITION RANGE ITERATOR
| | 1032 | 74304 | 9739(1)| 123 | 518 |
| 5 | TABLE ACCESS FULL
| XXXX | 1032 | 74304 | 9739(1)| 123 | 518 |
| 6 | PARTITION RANGE SINGLE
| | 1 | | 0(0)| 1189 | 1189 |
| 7 | INDEX RANGE SCAN
| XXXX | 1 | | 0(0)| 1189 | 1189 |
| 8 | TABLE ACCESS FULL
| XXXX | 183 | 1830 | 3(0)| | |
--------------------------------------------------------------------------------
exec dbms_stats.gather_index_stats(
ownname=>'xxx',
indname=>'xxx',
partname=>'PART_xxx',
estimate_percent => 10);
統(tǒng)計信息是優(yōu)化器優(yōu)化的重要參考依據(jù),一個完整、準確的統(tǒng)計信息是必要條件。往往在優(yōu)化過程中,第一步就是查看相關(guān)對象的統(tǒng)計信息。 分區(qū)機制是Oracle針對大數(shù)據(jù)的重要解決手段,但也很容易造成所謂“放大效應(yīng)”。即對于普通表而言,統(tǒng)計信息更新不及時可能不會導(dǎo)致執(zhí)行計劃偏差過大;但對于分區(qū)表、索引來說,很容易出現(xiàn)因更新不及時出現(xiàn)0的情況,進而導(dǎo)致執(zhí)行計劃產(chǎn)生嚴重偏差。
最后給大家分享我寫的SQL兩件套:《SQL基礎(chǔ)知識第二版》和《SQL高級知識第二版》的PDF電子版。里面有各個語法的解釋、大量的實例講解和批注等等,非常通俗易懂,方便大家跟著一起來實操。 有需要的讀者可以下載學(xué)習(xí),在下面的公眾號「數(shù)據(jù)前線」(非本號)后臺回復(fù)關(guān)鍵字:SQL,就行 數(shù)據(jù)前線 后臺回復(fù)關(guān)鍵字:1024,獲取一份精心整理的技術(shù)干貨 后臺回復(fù)關(guān)鍵字:進群,帶你進入高手如云的交流群。 推薦閱讀
評論
圖片
表情

