<kbd id="afajh"><form id="afajh"></form></kbd>
<strong id="afajh"><dl id="afajh"></dl></strong>
    <del id="afajh"><form id="afajh"></form></del>
        1. <th id="afajh"><progress id="afajh"></progress></th>
          <b id="afajh"><abbr id="afajh"></abbr></b>
          <th id="afajh"><progress id="afajh"></progress></th>

          服務(wù)注冊(cè)中心狀態(tài)DOWN問題排查

          共 8027字,需瀏覽 17分鐘

           ·

          2021-03-24 16:14

          作者:wls1036

          來源:SegmentFault 思否社區(qū)



          背景

          某項(xiàng)目需要升級(jí)kubernetes集群,考慮到原k8s版本較低,并且在部署結(jié)構(gòu)上不是很合理,因此決定重新搭建一套新的k8s集群,做應(yīng)用遷移。遷移過程也是非常曲折,這個(gè)后面會(huì)專門寫一篇文章記錄,應(yīng)用遷移后有部分應(yīng)用在注冊(cè)中心狀態(tài)為DOWN



          如果服務(wù)狀態(tài)為DOWN調(diào)用該服務(wù)就報(bào)404錯(cuò)誤,因?yàn)閼?yīng)用配置了健康檢查,懷疑是健康檢查沒有通過,進(jìn)入后臺(tái)調(diào)用接口查看檢查結(jié)果
          $ curl http://localhost:8080/management/health
          {"description":"Remote status from Eureka server","status":"DOWN"}
          服務(wù)使用jhipster框架生成,在配置文件里面有以下配置
          eureka:
          client:
          enabled: true
          healthcheck:
          enabled: true
          fetch-registry: true
          register-with-eureka: true
          instance-info-replication-interval-seconds: 10
          registry-fetch-interval-seconds: 10
          eureka.client.healthcheck.enabled設(shè)置為false后,注冊(cè)中心恢復(fù)正常,因此可以肯定是健康檢查的問題。但是在后臺(tái)沒有任何錯(cuò)誤,甚至將日志級(jí)別調(diào)整到最低也未發(fā)現(xiàn)錯(cuò)誤信息,這個(gè)給排查帶來很大的困難。

          排查

          因?yàn)橛行┓?wù)是正常的,有些服務(wù)不正常,所以第一個(gè)想法就是對(duì)比兩個(gè)服務(wù)的區(qū)別,做了以下嘗試
          • 對(duì)比兩個(gè)服務(wù)的配置文件
          • 對(duì)比兩個(gè)服務(wù)的網(wǎng)絡(luò)數(shù)據(jù)包
          • 將服務(wù)在本地運(yùn)行(依賴的太多,最終沒運(yùn)行起來)
          都沒有找到原因,現(xiàn)在問題解決了回過頭再看是可以找到兩個(gè)服務(wù)不同點(diǎn),也是當(dāng)時(shí)不夠細(xì)心。如果對(duì)比行不通,那么就只能深入源碼,找到問題的根源。首先就得知道源碼要從哪里開始看,在健康檢查中有錯(cuò)誤提示信息Remote status from Eureka server可以拿這個(gè)作為關(guān)鍵字在idea中進(jìn)行全局搜索(在maven導(dǎo)入的時(shí)候需要把源碼一起導(dǎo)入)或者你可以可以在github上搜索,都可以找到這個(gè)關(guān)鍵字出處。最終找到位于org.springframework.cloud.netflix.eureka.EurekaHealthIndicator.getStatus中,如下
          private Status getStatus(Builder builder) {
          Status status = new Status(this.eurekaClient.getInstanceRemoteStatus().toString(),
          "Remote status from Eureka server");
          DiscoveryClient discoveryClient = getDiscoveryClient();
          if (discoveryClient != null && clientConfig.shouldFetchRegistry()) {
          long lastFetch = discoveryClient.getLastSuccessfulRegistryFetchTimePeriod();
          if (lastFetch < 0) {
          status = new Status("UP",
          "Eureka discovery client has not yet successfully connected to a Eureka server");
          }
          else if (lastFetch > clientConfig.getRegistryFetchIntervalSeconds() * 2000) {
          status = new Status("UP",
          "Eureka discovery client is reporting failures to connect to a Eureka server");
          builder.withDetail("renewalPeriod", instanceConfig.getLeaseRenewalIntervalInSeconds());
          builder.withDetail("failCount", lastFetch / clientConfig.getRegistryFetchIntervalSeconds());
          }
          }
          return status;
          }
          因?yàn)闆]有服務(wù)的源碼,而且服務(wù)本身依賴較多,在本地運(yùn)行不大現(xiàn)實(shí),可以本地搭建一個(gè)demo方便了解整個(gè)健康檢查的過程。這個(gè)就是經(jīng)驗(yàn)問題。經(jīng)過閱讀源碼,整個(gè)健康檢查過程大概如下
          • 狀態(tài)是從org.springframework.cloud.netflix.eureka.EurekaHealthCheckHandler.getStatus中獲取
          • EurekaHealthCheckHandler包含org.springframework.boot.actuate.health.CompositeHealthIndicator,主要由CompositeHealthIndicator執(zhí)行具體的健康檢查邏輯
          • CompositeHealthIndicator包含一系列的健康檢查組件,會(huì)依次執(zhí)行每個(gè)組件進(jìn)行檢查(調(diào)用health方法)
          簡(jiǎn)單理清整個(gè)過程后就要祭出神器Arthas,因?yàn)樾枰谌萜髦惺褂肁rthas,所以你可以先看下之前發(fā)表的文章學(xué)習(xí)如何在docker中使用Arthas。
          • 觀察getStatus方法,確實(shí)返回了DOWN狀態(tài)
          ? watch org.springframework.cloud.netflix.eureka.EurekaHealthCheckHandler getStatus "{returnObj}" -x 2

          Affect(class count: 1 , method count: 1) cost in 107 ms, listenerId: 4
          method=org.springframework.cloud.netflix.eureka.EurekaHealthCheckHandler.getStatus location=AtExit
          ts=2021-03-24 09:38:03; [cost=13.776747ms] result=@ArrayList[
          @InstanceStatus[
          UP=@InstanceStatus[UP],
          DOWN=@InstanceStatus[DOWN],
          STARTING=@InstanceStatus[STARTING],
          OUT_OF_SERVICE=@InstanceStatus[OUT_OF_SERVICE],
          UNKNOWN=@InstanceStatus[UNKNOWN],
          $VALUES=@InstanceStatus[][isEmpty=false;size=5],
          name=@String[DOWN],
          ordinal=@Integer[1],
          ],
          ]
          • 觀察CompositeHealthIndicator的health方法
          ? watch org.springframework.boot.actuate.health.CompositeHealthIndicator health "{returnObj,target.indicators}" -x 2
          Press Q or Ctrl+C to abort.
          Affect(class count: 2 , method count: 1) cost in 194 ms, listenerId: 6
          method=org.springframework.boot.actuate.health.CompositeHealthIndicator.health location=AtExit
          ts=2021-03-24 09:46:04; [cost=11.390849ms] result=@ArrayList[
          @Health[
          status=@Status[DOWN],
          details=@UnmodifiableMap[isEmpty=false;size=7],
          ],
          @LinkedHashMap[
          @String[discoveryClient]:@Holder[org.springframework.cloud.client.discovery.health.DiscoveryCompositeHealthIndicator$Holder@47625d8a],
          @String[diskSpaceHealthIndicator]:@DiskSpaceHealthIndicator[org.springframework.boot.actuate.health.DiskSpaceHealthIndicator@3f01e628], @String[redisHealthIndicator]:@RedisHealthIndicator[org.springframework.boot.actuate.health.RedisHealthIndicator@17b54981], @String[dbHealthIndicator]:@DataSourceHealthIndicator[org.springframework.boot.actuate.health.DataSourceHealthIndicator@10534a8a], @String[refreshScopeHealthIndicator]:@RefreshScopeHealthIndicator[org.springframework.cloud.health.RefreshScopeHealthIndicator@2284c82d], @String[configServerHealthIndicator]:@ConfigServerHealthIndicator[org.springframework.cloud.config.client.ConfigServerHealthIndicator@4ec50d1a],
          @String[hystrixHealthIndicator]:@HystrixHealthIndicator[org.springframework.cloud.netflix.hystrix.HystrixHealthIndicator@5c5c6962],
          ],
          ]
          這里可以得到一個(gè)很重要的信息,服務(wù)總共有以下幾個(gè)健康檢查組件
          org.springframework.cloud.client.discovery.health.DiscoveryCompositeHealthIndicator$Holder
          org.springframework.boot.actuate.health.DiskSpaceHealthIndicator
          org.springframework.boot.actuate.health.RedisHealthIndicator
          org.springframework.boot.actuate.health.DataSourceHealthIndicator
          org.springframework.cloud.health.RefreshScopeHealthIndicator
          org.springframework.cloud.config.client.ConfigServerHealthIndicator
          org.springframework.cloud.netflix.hystrix.HystrixHealthIndicator
          那么理論上只要一個(gè)個(gè)檢查過去就能知道是哪個(gè)出問題,不過這里有一個(gè)比較快的方法,因?yàn)檫@些組件都繼承AbstractHealthIndicator所以只要觀察這個(gè)就行
          • 觀察AbstractHealthIndicator health方法
          ? watch org.springframework.boot.actuate.health.AbstractHealthIndicator health "{returnObj,target}" -x 2
          ...
          method=org.springframework.boot.actuate.health.AbstractHealthIndicator.health location=AtExit
          ts=2021-03-24 09:50:55; [cost=7.652594ms] result=@ArrayList[
          @Health[
          status=@Status[DOWN],
          details=@UnmodifiableMap[isEmpty=false;size=1],
          ],
          @RedisHealthIndicator[
          VERSION=@String[version],
          REDIS_VERSION=@String[redis_version],
          redisConnectionFactory=@JedisConnectionFactory[org.springframework.data.redis.connection.jedis.JedisConnectionFactory@4c91526e],
          ],
          ]...
          可以看到是RedisHealthIndicator檢查沒過最終導(dǎo)致了整個(gè)結(jié)果都是DOWN,那么怎么知道報(bào)什么錯(cuò),錯(cuò)誤信息在日志里嗎沒有,可以看下health的源碼
          public abstract class AbstractHealthIndicator implements HealthIndicator {
          @Override
          public final Health health() {
          Health.Builder builder = new Health.Builder();
          try {
          doHealthCheck(builder);
          }
          catch (Exception ex) {
          builder.down(ex);
          }
          return builder.build();
          }
          }
          如果有異常會(huì)進(jìn)入builder.down(ex);我們只需觀察這個(gè)方法就能知道報(bào)什么錯(cuò)
          ? watch org.springframework.boot.actuate.health.Health$Builder down "{params}" -x 2
          ....
          @RedisConnectionFailureException[org.springframework.data.redis.RedisConnectionFailureException: Cannot get Jedis connection; nested exception is redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool],
          ],
          ]
          我們最終拿到了錯(cuò)誤信息
          redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
          redis的配置信息如下
          #redis\u4e3b\u673a
          host=redis
          #redis\u7aef\u53e3
          port=6379
          #\u6388\u6743\u5bc6\u7801
          password=*****
          #\u8d85\u65f6\u65f6\u95f4\uff1a\u5355\u4f4dms
          timeout=100000
          在容器內(nèi)部執(zhí)行curl(因?yàn)闆]有ping命令)
          # curl redis
          curl: (6) Could not resolve host: redis
          dns無法解析redis,但是在k8s中是有redis這個(gè)服務(wù)的,但發(fā)現(xiàn)應(yīng)用和redis是在兩個(gè)命名空間中,kubernetes如果在不同的命名空間域名需要用如下格式
          $svc_name.$namespace.svc.cluster.local
          重新執(zhí)行curl命令
          # curl redis.k2-infrastructure.svc.cluster.local
          Failed to connect to redis.k2-infrastructure.svc.cluster.local port 80: No route to host
          雖然報(bào)錯(cuò)了,但說明dns是解析到了。那意味著是不是需要更改連接呢,不需要的,通過增加搜索域的方式就可以不需要更改連接。

          解決方案

          在rancher部署中,增加搜索域,名稱為$namespace.svc.cluster.local


          或者在yaml中添加dnsConfig

          apiVersion: apps/v1
          kind: Deployment
          ...
          spec:
          ....
          spec:
          ....
          dnsPolicy: ClusterFirst
          dnsConfig:
          searches:
          - xx-infrastructure.svc.cluster.local
          status: {}


          點(diǎn)擊左下角閱讀原文,到 SegmentFault 思否社區(qū) 和文章作者展開更多互動(dòng)和交流,掃描下方”二維碼“或在“公眾號(hào)后臺(tái)回復(fù)“ 入群 ”即可加入我們的技術(shù)交流群,收獲更多的技術(shù)文章~

          - END -


          瀏覽 26
          點(diǎn)贊
          評(píng)論
          收藏
          分享

          手機(jī)掃一掃分享

          分享
          舉報(bào)
          評(píng)論
          圖片
          表情
          推薦
          點(diǎn)贊
          評(píng)論
          收藏
          分享

          手機(jī)掃一掃分享

          分享
          舉報(bào)
          <kbd id="afajh"><form id="afajh"></form></kbd>
          <strong id="afajh"><dl id="afajh"></dl></strong>
            <del id="afajh"><form id="afajh"></form></del>
                1. <th id="afajh"><progress id="afajh"></progress></th>
                  <b id="afajh"><abbr id="afajh"></abbr></b>
                  <th id="afajh"><progress id="afajh"></progress></th>
                  sm调教视频在线观看 | 国内精品视频在线观看免费 | 美女泡友内射 | 五月天亚洲淫淫网 | 大地资源第三页在线观看免费播放最新 |