<kbd id="afajh"><form id="afajh"></form></kbd>
<strong id="afajh"><dl id="afajh"></dl></strong>
    <del id="afajh"><form id="afajh"></form></del>
        1. <th id="afajh"><progress id="afajh"></progress></th>
          <b id="afajh"><abbr id="afajh"></abbr></b>
          <th id="afajh"><progress id="afajh"></progress></th>

          用Keepalived實現(xiàn)PostgreSQL高可用

          共 8349字,需瀏覽 17分鐘

           ·

          2021-11-15 13:41

          點擊上方"藍字"

          關(guān)注我們,享更多干貨!


          一、Keepalived工作原理



          Keepalived可提供VRRP以及health-check功能,可以只用它提供雙機浮動的vip(VRRP虛擬路由功能),這樣可以簡單實現(xiàn)一個雙機熱備高可用功能。

          Keepalived是以VRRP虛擬路由冗余協(xié)議為基礎(chǔ)實現(xiàn)高可用的,可以認為是實現(xiàn)路由器高可用的協(xié)議,即將N臺提供相同功能的路由器組成一個路由器組,這個組里面有一個master和多個backup,master上面有一個對外提供服務(wù)的vip(該路由器所在局域網(wǎng)內(nèi)其他機器的默認路由為該vip),master會發(fā)組播,當(dāng)backup收不到VRRP包時就認為master宕掉了,這時就需要根據(jù)VRRP的優(yōu)先級來選舉一個backup當(dāng)master。這樣的話就可以保證路由器的高可用了。

          下載地址:(復(fù)制鏈接至瀏覽器中瀏覽)

          https://www.keepalived.org/download.html
          https://github.com/acassen/keepalived

          Keepalived工作在TCP/IP參考模型的三層、四層、五層(物理層,鏈路層):

          • 網(wǎng)絡(luò)層(3):Keepalived通過ICMP協(xié)議向服務(wù)器集群中的每一個節(jié)點發(fā)送一個ICMP數(shù)據(jù)包(有點類似與Ping的功能),如果某個節(jié)點沒有返回響應(yīng)數(shù)據(jù)包,那么認為該節(jié)點發(fā)生了故障,Keepalived將報告這個節(jié)點失效,并從服務(wù)器集群中剔除故障節(jié)點。

          • 傳輸層(4):Keepalived在傳輸層里利用了TCP協(xié)議的端口連接和掃描技術(shù)來判斷集群節(jié)點的端口是否正常,比如對于常見的WEB服務(wù)器80端口。或者SSH服務(wù)22端口,Keepalived一旦在傳輸層探測到這些端口號沒有數(shù)據(jù)響應(yīng)和數(shù)據(jù)返回,就認為這些端口發(fā)生異常,然后強制將這些端口所對應(yīng)的節(jié)點從服務(wù)器集群中剔除掉。

          • 應(yīng)用層(5):Keepalived的運行方式也更加全面化和復(fù)雜化,用戶可以通過自定義Keepalived工作方式,例如:可以通過編寫程序或者腳本來運行Keepalived,而Keepalived將根據(jù)用戶的設(shè)定參數(shù)檢測各種程序或者服務(wù)是否允許正常,如果Keepalived的檢測結(jié)果和用戶設(shè)定的不一致時,Keepalived將把對應(yīng)的服務(wù)器從服務(wù)器集群中剔除。

          二、架構(gòu)



          2.1 主機規(guī)劃

          2.2 架構(gòu)圖

          三、環(huán)境準(zhǔn)備



          3.1 關(guān)閉防火墻和selinux

          [root@localhost ~]# systemctl stop firewalld
          [root@localhost keepalived]# vi /etc/selinux/config
          SELINUX=disabled
          [root@localhost keepalived]# getenforce
          Disabled

          3.2 安裝Keepalived

          [root@localhost ~]# yum install gcc openssl openssl‐devel libnl libnl‐devel ipvsadm -y
          [root@localhost ~]# cd /usr/local/src/
          [root@localhost src]# tar -zxf keepalived-2.2.4.tar.gz
          [root@localhost src]# cd keepalived-2.2.4
          [root@localhost keepalived-2.2.4] ./configure --prefix=/usr/local/keepalived
          [root@localhost keepalived-2.2.4] make && make install
          [root@localhost keepalived]# cd /usr/local/keepalived
          [root@localhost keepalived]#
          ln -s /usr/local/src/keepalived-2.2.4/keepalived/etc/init.d/keepalived /etc/init.d/
          [root@localhost keepalived]# ln -s /usr/local/keepalived/etc/sysconfig/keepalived /etc/sysconfig/
          [root@localhost keepalived]# ln -s /usr/local/keepalived/sbin/keepalived /usr/local/sbin/
          [root@localhost keepalived]# mkdir /etc/keepalived/
          [root@localhost keepalived]# ln -s /usr/local/keepalived/etc/keepalived/keepalived.conf /etc/keepalived/
          [root@localhost ~]# cat /etc/keepalived/keepalived.conf

          [root@localhost keepalived-2.2.4]# systemctl start keepalived
          [root@localhost keepalived-2.2.4]# systemctl status keepalived


          3.3?PostgreSQL數(shù)據(jù)庫異步主從流復(fù)制搭建

          192.168.254.128服務(wù)器上配置:

          1、創(chuàng)建流復(fù)制用戶

          postgres=# create user replia replication superuser password ‘123qwert’;

          給superuser權(quán)限是為了后面做pg_rewind操作。

          2、數(shù)據(jù)目錄下配置pg_hba.conf文件添加(192.168.254.129也需要添加)

          ? ? host replication all 192.168.254.1/24 md5

          3、配置環(huán)境變量

          [postgres@localhost pg_data]$ cat ~/.bash_profile 
          # .bash_profile
          # Get the aliases and functions
          if [ -f ~/.bashrc ]; then . ~/.bashrc; fi
          PATH=$PATH:$HOME/.local/bin:$HOME/bin
          export PATH
          export PGHOME=/usr/local/pgsql
          export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
          export DATE=`date +"%Y%m%d%H%M"`
          export PATH=$PGHOME/bin:$PATH:.
          export MANPATH=$PGHOME/share/man:$MANPATH
          export PGUSER=postgres
          export PGDATA=/data/pg_data
          export PGDATABASE=postgres
          alias rm='rm -i'
          alias ll='ls -lh'
          export LANG="zh_CN.UTF8"

          192.168.254.129上進行:

          4、~/.bash_profile 和上面輸入一樣的內(nèi)容

          5、制作基礎(chǔ)備份

          [postgres@localhost ~]$ pg_basebackup -h192.168.254.128 -Ureplia -R -Fp -P --verbose ?-c fast -D /data/pg_data

          6、啟動從庫

          [postgres@localhost ~]$ pg_ctl -D /data/pg_data/ start

          192.168.254.128上查看主庫信息:

          7、查看主庫信息

          postgres=# select * from pg_stat_replication ;

          3.4 PostgreSQL數(shù)據(jù)庫配置

          創(chuàng)建數(shù)據(jù)庫Keepalived,并且創(chuàng)建表探測表sr_delay,后續(xù)Keepalived探測,刷新sr_delay表的last_alive字段為當(dāng)前探測時間。這張表用來判斷主備延遲情況,數(shù)據(jù)庫故障切換時會用到這張表。

          數(shù)據(jù)庫配置

          postgres=# create user keepalived password '123qwert' CONNECTION LIMIT 4 ;
          CREATE ROLE
          postgres=# create database keepalived owner keepalived;
          CREATE DATABASE
          postgres=# \c keepalived keepalived
          You are now connected to database "keepalived" as user "keepalived".
          keepalived=> create table sr_delay(id int4, last_alive timestamp(0) without time zone);
          CREATE TABLE

          表sr_delay只允許寫入一條記錄,并且不允許刪除此表數(shù)據(jù),通過觸發(fā)器實現(xiàn)。創(chuàng)建觸發(fā)器函數(shù),如下所示:

          CREATE FUNCTION cannt_delete () 
          RETURNS trigger
          LANGUAGE plpgsql AS $$
          BEGIN
          RAISE EXCEPTION 'Table sr_delay can not delete !';
          END;
          $$;

          創(chuàng)建觸發(fā)器:

          CREATE TRIGGER trigger_sr_delay_del 
          BEFORE DELETE ON sr_delay
          FOR EACH ROW EXECUTE PROCEDURE cannt_delete() ;
          CREATE TRIGGER trigger_sr_delay_tru BEFORE TRUNCATE ON sr_delay FOR STATEMENT EXECUTE PROCEDURE cannt_delete() ;

          插入數(shù)據(jù):

          INSERT INTO sr_delay VALUES(1,now()) ;

          參數(shù)部分:

          hot_standby = on
          wal_log_hints = on
          archive_mode=on
          archive_command='test ! -f /data/pg_archive/%f && cp %p /data/pg_archive/%f'

          后續(xù)Keepalived會每隔指定時間探測PostgreSQL數(shù)據(jù)庫存活, 并且以Keepalived用戶登錄Keepalived數(shù)據(jù)庫刷新這張表, 配置主備庫pg_hba.conf,增加如下內(nèi)容

          host    keepalived    keepalived        192.168.254.128/32      md5
          host keepalived keepalived 192.168.254.129/32 md5

          隨后pg_ctl reload 操作使配置生效。

          3.5 Keepalived 配置

          [root@localhost ~]# cat /etc/keepalived/keepalived.conf 
          ! Configuration File for keepalived
          global_defs {
          # 郵件通知信息
          notification_email {
          # 定義收件人
          [email protected]
          }
          # 定義發(fā)件人
          notification_email_from [email protected]
          # SMTP服務(wù)器地址
          smtp_server 192.168.254.128
          smtp_connect_timeout 30
          # 路由器標(biāo)識,一般不用改,也可以寫成每個主機自己的主機名
          router_id LVS_DEVEL
          vrrp_skip_check_adv_addr
          vrrp_strict
          vrrp_garp_interval 0
          vrrp_gna_interval 0
          }

          # 定義用于實例執(zhí)行的腳本內(nèi)容,比如可以在線降低優(yōu)先級,用于強制切換
          vrrp_script check_pg_alived {
          script "/data/scripts/pg_monitor.sh"
          interval 10
          fall 3
          }

          # 一個vrrp_instance就是定義一個虛擬路由器的,實例名稱
          vrrp_instance VI_1 {
          # 定義初始狀態(tài),可以是MASTER或者BACKUP
          state MASTER
          #非搶占模式
          nopreempt
          # 工作接口,通告選舉使用哪個接口進行
          interface ens33
          # 虛擬路由ID,如果是一組虛擬路由就定義一個ID,如果是多組就要定義多個,而且這個虛擬
          # ID還是虛擬MAC最后一段地址的信息,取值范圍0-255
          virtual_router_id 51
          #權(quán)重 如果你上面定義了MASTER,這里的優(yōu)先級就需要定義的比其他的高
          priority 100
          #通告頻率 單位s
          advert_int 1
          # 通信認證機制,這里是明文認證還有一種是加密認證
          authentication {
          auth_type PASS
          auth_pass 1111
          }
          # 設(shè)置虛擬VIP地址
          virtual_ipaddress {
          192.168.254.11
          }
          # 追蹤腳本,通常用于去執(zhí)行上面的vrrp_script定義的腳本內(nèi)容
          track_script {
          check_pg_alived
          }
          # 如果主機狀態(tài)變成Master|Backup|Fault之后會去執(zhí)行的通知腳本,腳本要自己寫
          notify_master "/data/scripts/failover.sh"
          notify_fault "/data/scripts/fault.sh"
          }

          以上是Keepalived主節(jié)點的配置,Keepalived備節(jié)點的priority參數(shù)改成90 ,state改為BACKUP,以及smtp_server 改為192.168.254.129,其余參數(shù)配置一樣。

          3.6 數(shù)據(jù)庫監(jiān)控腳本

          [postgres@localhost scripts]$ cat /data/scripts/pg_monitor.sh 
          #!/bin/bash
          # 配置環(huán)境變量
          pgport=5432
          pguser=keepalived
          pgdb=keepalived
          pgpwd='123qwert'
          LANG=en_US.utf8
          PGHOME=/usr/local/pgsql/
          LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib
          PATH=$PGHOME/bin:$PATH:.
          MONITOR_LOG="/data/scripts/pg_monitor.log"
          SQL1="UPDATE sr_delay SET last_alive= now();"
          SQL2='SELECT 1;'
          keeplognums=30000

          #pg_port_status=`lsof ‐i :$pgport | grep LISTEN | wc ‐l`
          #pg_port_status=`ps ‐ef | grep LISTEN | wc ‐l`
          #if [ $pg_port_status ‐lt 1 ];then
          # echo ‐e `date +"%F %T"` "Error: The postgreSQL is not running,please chec k the postgreSQL server status!" >> $LOGFILE
          # exit 1
          #fi

          #此腳本不檢查備庫存活狀態(tài),如果是各庫則退出

          standby_flg=`psql -p$pgport -Upostgres -At -c "SELECT pg_is_in_recovery();"`
          if [ ${standby_flg} == 't' ];
          then
          echo ‐e "`date +%F\ %T`: This is a standby database, exit!\n" > $MONITOR_LOG
          exit 0
          fi
          export PGPASSWORD=$pgpwd
          #主庫更新sr_delay 表
          echo $SQL1 | psql -At -p $pgport -U $pguser -d $pgdb >> $MONITOR_LOG
          #判斷主庫是否可用
          echo $SQL2 | psql -At -p $pgport -U $pguser -d $pgdb
          if [ $? -eq 0 ] ;
          then
          echo -e "`date +%F\ %T`:Primary db is health." >> $MONITOR_LOG
          exit 0
          else
          echo -e "`date +%F\ %T`:Attention: Primary db is not health!" >> $MONITOR_LOG
          exit 1
          fi

          #日志保留 keeplognums 行
          if [ ! -f ${MONITOR_LOG} ] ;then touch ${MONITOR_LOG};fi
          lognums=`cat ${MONITOR_LOG} |wc -l`
          catnum=$((${lognums} -${keeplognums}))
          if [[ $lognums -gt ${keeplognums} ]] ; then sed -i "1,${catnum}d" ${MONITOR_LOG}; fi


          此腳本每隔10秒執(zhí)行一次,執(zhí)行頻率由keepalived.conf配置文件中interval參數(shù)設(shè)置, 腳本主要作用為:

          • 檢測主庫是否存活。

          • 更新sr_delay表last_alive字段為當(dāng)前探測時間。

          [postgres@localhost scripts]$ cat /data/scripts/failover.sh 
          #!/bin/bash
          export PGPORT=5432
          export PGUSER=keepalived
          export PG_OS_USER=postgres
          export PGDBNAME=keepalived
          export LANG=en_US.UTF-8
          export PGPATH=/usr/local/pgsql/bin/
          export PATH=$PATH:$PGPATH
          export PGMIP=127.0.0.1
          LOGFILE='/data/scripts/failover.log'
          # 主備數(shù)據(jù)庫同步時延,單位為秒
          sr_allowed_delay_time=100

          SQL1='select pg_is_in_recovery from pg_is_in_recovery();'
          SQL2="select count(1) as delay_time from sr_delay where now()<(last_alive + interval '${sr_allowed_delay_time} seconds');"
          #SQL2="select count(1) as delay_time from sr_delay where now()<(last_alive + interval '100 seconds');"
          sleep $sr_allowed_delay_time
          db_role=`echo $SQL1 | psql -At -p $PGPORT -U $PGUSER -d $PGDBNAME -w`
          db_sr_delaytime=`echo $SQL2 | psql -p $PGPORT -d $PGDBNAME -U $PGUSER -At -w`
          SWITCH_COMMAND='pg_ctl promote -D /data/pg_data/'
          # 如果為備庫,且延遲大于指定時間則切換為主庫

          if [ ${db_role} == 'f' ];
          then
          echo -e `date +"%F %T"` "Attention:The current postgreSQL DB is master database,cannot switched!" >> $LOGFILE
          exit 0
          fi

          if [ $db_sr_delaytime -gt 0 ];
          then
          echo -e `date +"%F %T"` "Attention:The current master database is health,the standby DB cannot switched!" >> $LOGFILE
          exit 0
          fi

          if [ !$db_sr_delaytime ];
          then
          echo -e `date +"%F %T"` "Attention:The current database is statndby,ready to switch master database!" >> $LOGFILE
          su - $PG_OS_USER -c "$SWITCH_COMMAND"
          elif [ $? -eq 0 ];
          then
          echo -e `date +"%F %T"` "success:The current standby database successed to switched the primary PG database !" >> $LOGFILE
          exit 0
          else
          echo -e `date +"%F %T"` "Error: the standby database failed to switch the primary PG database !,pelease checked it!" >> $LOGFILE
          exit 1
          fi

          [postgres@localhost scripts]$ cat /data/scripts/fault.sh
          #!/bin/bash
          GFILE=/data/scripts/pg_db_fault.log
          PGPORT=5432
          PGMIP=192.168.254.128
          echo -e `date +"%F %T"` "Error:Because of the priamry DB happend some unknown problem,So turn off the PostgreSQL Database!" >> $LOGFILE

          PGPID="`head -n1 /data/pg_data/postmaster.pid`"

          systemctl stop keepalived

          kill -9 $PGPID
          if [ $? -eq 0 ] ;
          then
          echo -e `date +"%F %T"` "Error:Because of the priamry DB happend some unknown problem,So turn off the PostgreSQL Database!" >> $LOGFILE
          systemctl stop keepalived
          exit 1
          fi

          [postgres@localhost scripts]$ chmod 755 /data/scripts/pg_monitor.sh /data/scripts/failover.sh /data/scripts/fault.sh
          四、切換演練



          4.1 在192.168.254.128上停掉數(shù)據(jù)庫

          [root@localhost ~]# su - postgres
          [postgres@localhost ~]$ pg_ctl -D /data/pg_data/ stop

          查看192.168.254.129上的數(shù)據(jù)庫狀態(tài)

          [postgres@localhost ~]$ psql
          psql (13.3)
          Type "help" for help.
          postgres=# insert into t5 (name) values ('b002');
          INSERT 0 1

          修復(fù)192.168.254.128。

          使用pg_rewind 同步新主庫的數(shù)據(jù)到原主庫:

          pg_rewind -R --target-pgdata '/data/pg_data' --source-server 'host=192.168.254.129 port=5432 user=replia password=123qwert dbname=postgres'

          如果這個地方咱們的wal日志已經(jīng)被覆蓋了了很多,那么就需要將我們的備份日志/data/pg_archive/給scp過來。

          [postgres@localhost ~]$ scp [email protected]:/data/pg_archive/00000002*  /data/pg_archive/

          (此時需要注意新主和備上面TimeLineID的差異。并且這種情況需要在配置文件中加上:

          restore_command='cp /data/pg_archive/%f %p'

          啟動數(shù)據(jù)庫,然后檢查主備wal sender和receiver是否正常:

          [postgres@localhost ~]$ pg_ctl -D /data/pg_data/ start
          [postgres@localhost ~]$ ps axu |grep walreceiver
          postgres 53474 0.0 0.2 342008 2136 ? Ss 14:45 0:01 postgres: walreceiver streaming 1/4A036EF0

          192.168.254.128上啟動Keepalived,查看Keepalived狀態(tài):

          [root@localhost ~]# systemctl  start keepalived
          [root@localhost ~]# systemctl status keepalived

          4.2 再次切換:

          在做切換前,先要檢查192.168.254.128和192.168.254.129上面的Keepalived正常以及vip和主從正常,在192.168.254.129上停掉數(shù)據(jù)庫:

          [postgres@localhost ~]$ pg_ctl -D /data/pg_data/ stop -mf

          查看192.168.254.128上的數(shù)據(jù)庫狀態(tài):

          [postgres@localhost pg_data]$ psql
          psql (13.3)
          Type "help" for help.
          postgres=# insert into t5 (name) values ('b005');
          INSERT 0 1

          修復(fù)192.168.254.129。

          使用pg_rewind 同步新主庫的數(shù)據(jù)到原主庫:

          pg_rewind -R --target-pgdata '/data/pg_data' --source-server 'host=192.168.254.128 port=5432 user=replia password=123qwert dbname=postgres'

          如果這個地方咱們的wal日志已經(jīng)被覆蓋了了很多,那么就需要將我們的備份日志/data/pg_archive/給scp過來。

          [postgres@localhost ~]$ scp [email protected]:/data/pg_archive/00000003* ?/data/pg_archive/ ?

          (此時需要注意新主和備上面TimeLineID的差異。并且這種情況需要在配置文件中加上:

          restore_command='cp /data/pg_archive/%f %p'

          啟動數(shù)據(jù)庫,然后檢查主備wal sender和receiver是否正常:

          [postgres@localhost ~]$ pg_ctl -D /data/pg_data/ start
          [postgres@localhost ~]$ ps axu |grep walreceiver

          192.168.254.129上啟動Keepalived,查看Keepalived狀態(tài):

          [root@localhost ~]# systemctl  start keepalived
          [root@localhost ~]# systemctl status keepalived

          五、總結(jié)



          1、可以完善failover時自己的延遲切換邏輯。

          2、可以完善pg_rewind的實現(xiàn),用腳本代替手動方式。

          3、Keepalived較為靈活,能夠在腳本上添加更多校驗和自己的規(guī)則。

          墨天輪原文鏈接:https://www.modb.pro/db/113653?sjhy(復(fù)制到瀏覽器或者點擊“閱讀原文”立即查看)

          關(guān)于作者
          向前龍??,云和恩墨PG技術(shù)顧問,擁有OGCA、PGCE證書。

          END


          推薦閱讀:

          Redis 哨兵模式

          Dubbo源碼分析:小白入門篇

          一周學(xué)完MyBatis源碼,萬字總結(jié)


          關(guān)互聯(lián)網(wǎng)全棧架構(gòu)

          瀏覽 75
          點贊
          評論
          收藏
          分享

          手機掃一掃分享

          分享
          舉報
          評論
          圖片
          表情
          推薦
          點贊
          評論
          收藏
          分享

          手機掃一掃分享

          分享
          舉報
          <kbd id="afajh"><form id="afajh"></form></kbd>
          <strong id="afajh"><dl id="afajh"></dl></strong>
            <del id="afajh"><form id="afajh"></form></del>
                1. <th id="afajh"><progress id="afajh"></progress></th>
                  <b id="afajh"><abbr id="afajh"></abbr></b>
                  <th id="afajh"><progress id="afajh"></progress></th>
                  殴美成人精品 | 在线亚洲欧洲 | 成人高清无码免费看 | 2025天天操 | 草草影院在线观看 |