服務(wù)注冊(cè)中心狀態(tài)DOWN問題排查
作者:wls1036
來源:SegmentFault 思否社區(qū)
DOWN
DOWN調(diào)用該服務(wù)就報(bào)404錯(cuò)誤,因?yàn)閼?yīng)用配置了健康檢查,懷疑是健康檢查沒有通過,進(jìn)入后臺(tái)調(diào)用接口查看檢查結(jié)果$ curl http://localhost:8080/management/health
{"description":"Remote status from Eureka server","status":"DOWN"}
eureka:
client:
enabled: true
healthcheck:
enabled: true
fetch-registry: true
register-with-eureka: true
instance-info-replication-interval-seconds: 10
registry-fetch-interval-seconds: 10
eureka.client.healthcheck.enabled設(shè)置為false后,注冊(cè)中心恢復(fù)正常,因此可以肯定是健康檢查的問題。但是在后臺(tái)沒有任何錯(cuò)誤,甚至將日志級(jí)別調(diào)整到最低也未發(fā)現(xiàn)錯(cuò)誤信息,這個(gè)給排查帶來很大的困難。排查
對(duì)比兩個(gè)服務(wù)的配置文件 對(duì)比兩個(gè)服務(wù)的網(wǎng)絡(luò)數(shù)據(jù)包 將服務(wù)在本地運(yùn)行(依賴的太多,最終沒運(yùn)行起來)
Remote status from Eureka server可以拿這個(gè)作為關(guān)鍵字在idea中進(jìn)行全局搜索(在maven導(dǎo)入的時(shí)候需要把源碼一起導(dǎo)入)或者你可以可以在github上搜索,都可以找到這個(gè)關(guān)鍵字出處。最終找到位于org.springframework.cloud.netflix.eureka.EurekaHealthIndicator.getStatus中,如下private Status getStatus(Builder builder) {
Status status = new Status(this.eurekaClient.getInstanceRemoteStatus().toString(),
"Remote status from Eureka server");
DiscoveryClient discoveryClient = getDiscoveryClient();
if (discoveryClient != null && clientConfig.shouldFetchRegistry()) {
long lastFetch = discoveryClient.getLastSuccessfulRegistryFetchTimePeriod();
if (lastFetch < 0) {
status = new Status("UP",
"Eureka discovery client has not yet successfully connected to a Eureka server");
}
else if (lastFetch > clientConfig.getRegistryFetchIntervalSeconds() * 2000) {
status = new Status("UP",
"Eureka discovery client is reporting failures to connect to a Eureka server");
builder.withDetail("renewalPeriod", instanceConfig.getLeaseRenewalIntervalInSeconds());
builder.withDetail("failCount", lastFetch / clientConfig.getRegistryFetchIntervalSeconds());
}
}
return status;
}
狀態(tài)是從org.springframework.cloud.netflix.eureka.EurekaHealthCheckHandler.getStatus中獲取 EurekaHealthCheckHandler包含org.springframework.boot.actuate.health.CompositeHealthIndicator,主要由CompositeHealthIndicator執(zhí)行具體的健康檢查邏輯 CompositeHealthIndicator包含一系列的健康檢查組件,會(huì)依次執(zhí)行每個(gè)組件進(jìn)行檢查(調(diào)用health方法)
觀察getStatus方法,確實(shí)返回了DOWN狀態(tài)
? watch org.springframework.cloud.netflix.eureka.EurekaHealthCheckHandler getStatus "{returnObj}" -x 2
Affect(class count: 1 , method count: 1) cost in 107 ms, listenerId: 4
method=org.springframework.cloud.netflix.eureka.EurekaHealthCheckHandler.getStatus location=AtExit
ts=2021-03-24 09:38:03; [cost=13.776747ms] result=@ArrayList[
@InstanceStatus[
UP=@InstanceStatus[UP],
DOWN=@InstanceStatus[DOWN],
STARTING=@InstanceStatus[STARTING],
OUT_OF_SERVICE=@InstanceStatus[OUT_OF_SERVICE],
UNKNOWN=@InstanceStatus[UNKNOWN],
$VALUES=@InstanceStatus[][isEmpty=false;size=5],
name=@String[DOWN],
ordinal=@Integer[1],
],
]
觀察CompositeHealthIndicator的health方法
? watch org.springframework.boot.actuate.health.CompositeHealthIndicator health "{returnObj,target.indicators}" -x 2
Press Q or Ctrl+C to abort.
Affect(class count: 2 , method count: 1) cost in 194 ms, listenerId: 6
method=org.springframework.boot.actuate.health.CompositeHealthIndicator.health location=AtExit
ts=2021-03-24 09:46:04; [cost=11.390849ms] result=@ArrayList[
@Health[
status=@Status[DOWN],
details=@UnmodifiableMap[isEmpty=false;size=7],
],
@LinkedHashMap[
@String[discoveryClient]:@Holder[org.springframework.cloud.client.discovery.health.DiscoveryCompositeHealthIndicator$Holder@47625d8a],
@String[diskSpaceHealthIndicator]:@DiskSpaceHealthIndicator[org.springframework.boot.actuate.health.DiskSpaceHealthIndicator@3f01e628], @String[redisHealthIndicator]:@RedisHealthIndicator[org.springframework.boot.actuate.health.RedisHealthIndicator@17b54981], @String[dbHealthIndicator]:@DataSourceHealthIndicator[org.springframework.boot.actuate.health.DataSourceHealthIndicator@10534a8a], @String[refreshScopeHealthIndicator]:@RefreshScopeHealthIndicator[org.springframework.cloud.health.RefreshScopeHealthIndicator@2284c82d], @String[configServerHealthIndicator]:@ConfigServerHealthIndicator[org.springframework.cloud.config.client.ConfigServerHealthIndicator@4ec50d1a],
@String[hystrixHealthIndicator]:@HystrixHealthIndicator[org.springframework.cloud.netflix.hystrix.HystrixHealthIndicator@5c5c6962],
],
]
org.springframework.cloud.client.discovery.health.DiscoveryCompositeHealthIndicator$Holder
org.springframework.boot.actuate.health.DiskSpaceHealthIndicator
org.springframework.boot.actuate.health.RedisHealthIndicator
org.springframework.boot.actuate.health.DataSourceHealthIndicator
org.springframework.cloud.health.RefreshScopeHealthIndicator
org.springframework.cloud.config.client.ConfigServerHealthIndicator
org.springframework.cloud.netflix.hystrix.HystrixHealthIndicator
AbstractHealthIndicator所以只要觀察這個(gè)就行觀察AbstractHealthIndicator health方法
? watch org.springframework.boot.actuate.health.AbstractHealthIndicator health "{returnObj,target}" -x 2
...
method=org.springframework.boot.actuate.health.AbstractHealthIndicator.health location=AtExit
ts=2021-03-24 09:50:55; [cost=7.652594ms] result=@ArrayList[
@Health[
status=@Status[DOWN],
details=@UnmodifiableMap[isEmpty=false;size=1],
],
@RedisHealthIndicator[
VERSION=@String[version],
REDIS_VERSION=@String[redis_version],
redisConnectionFactory=@JedisConnectionFactory[org.springframework.data.redis.connection.jedis.JedisConnectionFactory@4c91526e],
],
]...
public abstract class AbstractHealthIndicator implements HealthIndicator {
@Override
public final Health health() {
Health.Builder builder = new Health.Builder();
try {
doHealthCheck(builder);
}
catch (Exception ex) {
builder.down(ex);
}
return builder.build();
}
}
builder.down(ex);我們只需觀察這個(gè)方法就能知道報(bào)什么錯(cuò)? watch org.springframework.boot.actuate.health.Health$Builder down "{params}" -x 2
....
@RedisConnectionFailureException[org.springframework.data.redis.RedisConnectionFailureException: Cannot get Jedis connection; nested exception is redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool],
],
]
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
#redis\u4e3b\u673a
host=redis
#redis\u7aef\u53e3
port=6379
#\u6388\u6743\u5bc6\u7801
password=*****
#\u8d85\u65f6\u65f6\u95f4\uff1a\u5355\u4f4dms
timeout=100000
# curl redis
curl: (6) Could not resolve host: redis
$svc_name.$namespace.svc.cluster.local
# curl redis.k2-infrastructure.svc.cluster.local
Failed to connect to redis.k2-infrastructure.svc.cluster.local port 80: No route to host
解決方案
$namespace.svc.cluster.local
dnsConfigapiVersion: apps/v1
kind: Deployment
...
spec:
....
spec:
....
dnsPolicy: ClusterFirst
dnsConfig:
searches:
- xx-infrastructure.svc.cluster.local
status: {}

評(píng)論
圖片
表情
