MySQL-Seconds_behind_master的精度誤差

前言

Seconds_behind_master是我們觀察主從延遲的一個重要指標。但任何指標所能表示的精度都是有限的。例如用精度只能到秒的指標去衡量毫秒級的表現(xiàn)就會產生非常大的誤差。如果再以此誤差去分析問題，就會讓思維走上彎路。例如用Seconds_behind_master去評估1s內的主從延遲就是一個典型的例子。

問題現(xiàn)場

在一些問題的排查中，我們注意到一個很奇怪的現(xiàn)象。那就是相同配置的從庫表現(xiàn)出來的主從延遲差距有將近500ms。而這兩個從庫之間的差別就是所在的機房不一樣(和主庫都不在同一個機房)。如下圖所示：

網絡問題

難道是網絡問題？那我們ping一下吧，最多也就相差1ms。那么還有499ms去哪里了呢，看來還得繼續(xù)挖掘。

Seconds_behind_master的取點數(shù)據(jù)

直覺上來說網絡問題不可能導致500ms這么大的誤差，而機器配置和MySQL版本又是一樣的。這就讓筆者不得不懷疑這個兼容數(shù)據(jù)的準確性。所以就先看看這個500ms是怎么計算出來的。

從監(jiān)控取點數(shù)據(jù)來看從庫C確實有主從延遲，不然為什么有那么多取點為0呢。

Seconds_behind_master什么時候計算出來為1

這時候筆者突然想到一個點，如果主從延遲一個是501ms一個是499ms，那么Seconds_behind_master計算的時候會不會采用四舍五入法。501ms(>=500ms)的就是1，499(<500ms)的就是0？為了了解這一問題，筆者就去翻了翻源碼。

Seconds_behind_master在MySQL中的計算源碼

計算這個指標的代碼有很多微妙的分支，應對了各種corner case。在此筆者只列出和當前問題相關的源碼。

long time_diff= ((long)(time(0) - mi->rli->last_master_timestamp)
                       - mi->clock_diff_with_master);

前面time(0) - mi->rli->last_master_timestamp明顯就是指時間差。但是，我們要考慮到一個很容易被忽略的常識，也就是不同機器的時間戳是不一樣的！

那么很明顯的，如果主從實際延遲是0，但是計算的時候沒有剔除掉機器時鐘的差異。那么主從延遲就是6s。源碼中的mi->clock_diff_with_master就是去修正這個差距！而計算這個
clock_diff_with_master就會引起不小的誤差。

什么時候計算clock_diff_with_master

筆者在源碼中翻閱時候注意到clock_diff_with_master不是每次都去計算的，而是在主從連接上或者重連(reconnect)的那一刻去計算一次。

handle_slave_io
    /* 建立主從連接 */
    |->safe_connect(thd, mysql, mi)) 
    /* connected: 主從連接成功后，計算一下主從clock_diff_with_master */
    |->get_master_version_and_clock

這就自然會導致下面的現(xiàn)象，假設一旦clock_diff_with_master計算有了誤差。那么這個誤差就會一直存在，直到下次重連為止！

clock_diff_with_master跨秒誤差

接著筆者又注意到clock_diff_with_master精度只能到秒。那么自然就會出現(xiàn)下面這幾種現(xiàn)象。為了簡單起見，我們假設絕對時鐘是從0開始，而且我們假設主從延遲是0。只看精度誤差所能造成的影響。

在實際主從延遲為0的情況下clock_diff_with_master計算出來是-1，Seconds_behind_master計算為1

盡管有NTP，我們也不可能做到兩臺機器的時間戳在完全一致(除非兩臺機器有銫原子鐘，那基本就沒有毫秒級的誤差了）。兩臺機器之間出現(xiàn)幾百毫秒甚至數(shù)秒的延遲非常正常。例如假設我當前從庫的clock是0.5s，主庫的clock是1s。那么由于計算精度(只能到秒)的原因，實際實際只有0.5s的時間差會放大到1s。

那么我們現(xiàn)在可以計算出來在這種情況下Seconds_behind_master的平均值，在這里有一個預先假設就是我們取監(jiān)控點的時間是隨機的。

在上圖中我們可以看到，在我們取從庫時鐘[0.5,1.5)這個1s的時間段范圍內。在前0.5s，也就是[0.5,1)這個區(qū)間中我們計算出來的Seconds_behind_master是0，而在[1,1.5)區(qū)間計算的確是1
。那我們的平均值就可以計算出來為(0.5*0+0.5*1)/(1.5-0.5)=0.5=500ms!
也就是說，在沒有任何實際主從延遲的情況下，僅僅跨秒這一個因素就能造成好幾百毫秒的誤差。

實際主從延遲為0的情況下clock_diff_with_master計算為0，Seconds_behind_master計算為-1并被校正為0

另外一個有意思的點是，既然誤差能加1，自然也能減1。也就是Seconds_behind_master計算為-1。這就會給觀察人員造成一個錯覺，從庫比主庫快！當然了MySQL源碼考慮到了這一點，強制校正為0。
在這里，筆者將主從連接的那一刻稍微往前偏移0.1s，就可以構造出剛才說的現(xiàn)象，如下圖所示：

MySQL中的源碼注釋和強行校正邏輯如下所示:

      long time_diff= ((long)(time(0) - mi->rli->last_master_timestamp)
                       - mi->clock_diff_with_master);
      /*
        Apparently on some systems time_diff can be <0. Here are possible
        reasons related to MySQL:
        - the master is itself a slave of another master whose time is ahead.
        - somebody used an explicit SET TIMESTAMP on the master.
        Possible reason related to granularity-to-second of time functions
        (nothing to do with MySQL), which can explain a value of -1:
        assume the master's and slave's time are perfectly synchronized, and
        that at slave's connection time, when the master's timestamp is read,
        it is at the very end of second 1, and (a very short time later) when
        the slave's timestamp is read it is at the very beginning of second
        2. Then the recorded value for master is 1 and the recorded value for
        slave is 2. At SHOW SLAVE STATUS time, assume that the difference
        between timestamp of slave and rli->last_master_timestamp is 0
        (i.e. they are in the same second), then we get 0-(2-1)=-1 as a result.
        This confuses users, so we don't go below 0: hence the max().

        last_master_timestamp == 0 (an "impossible" timestamp 1970) is a
        special marker to say "consider we have caught up".
      */
      protocol->store((longlong)(mi->rli->last_master_timestamp ?
                                   max(0L, time_diff) : 0));

如何獲得精確的毫秒級的主從延遲

由于Seconds_behind_master精度的原因，完全無法衡量毫秒級的主從延遲，所以出現(xiàn)了pt-heartbeat這樣的工具去精確的計算主從間毫秒級的延遲。在后續(xù)采用pt-heartbeat對兩個庫進行監(jiān)控后，這兩個看上去平均延遲相差500ms的從庫實際主從延遲差距在10ms之內。

總結

任何指標都有其表示的精度，而在其精度表示范圍之外就會產生相當大的誤差，以至于能夠誤導我們的判斷。當對某一項的指標感到很反常識的時候，可以考慮是不是本身指標并不能描述當前我們想要觀察的現(xiàn)象。例如本文中的闡述就表明Seconds_behind_master對1s的主從延遲的刻畫沒有太大的意義。

公眾號

關注筆者公眾號，獲取更多干貨文章:

MySQL-Seconds_behind_master的精度誤差

MySQL-Seconds_behind_master的精度誤差

前言

問題現(xiàn)場

網絡問題

Seconds_behind_master的取點數(shù)據(jù)

Seconds_behind_master什么時候計算出來為1

Seconds_behind_master在MySQL中的計算源碼

什么時候計算clock_diff_with_master

clock_diff_with_master跨秒誤差

在實際主從延遲為0的情況下clock_diff_with_master計算出來是-1，Seconds_behind_master計算為1

實際主從延遲為0的情況下clock_diff_with_master計算為0，Seconds_behind_master計算為-1并被校正為0

如何獲得精確的毫秒級的主從延遲

總結

公眾號

在實際主從延遲為0的情況下clock_diff_with_master計算出來是-1，Seconds_behind_master計算為1

實際主從延遲為0的情況下clock_diff_with_master計算為0，Seconds_behind_master計算為-1并被校正為0