什么鬼,Kubelet 重啟之后容器也跟著重啟了?

更多奇技淫巧歡迎訂閱博客:https://fuckcloudnative.io
1. 問題描述
在修復 cgroup 泄漏問題時會現(xiàn)停掉 kubelet,待修復完成后啟動 kubelet 組件,重啟后收到業(yè)務反饋,業(yè)務容器重啟了。
2. 問題排查
這個問題具體原因的排查還是花了一定時間的,下面會列一下大致的排查思路并結(jié)合源碼(自定義的 1.12.4 分支)進行分析。
排查過程中涉及到了 3 個容器,如下
| 名稱 | 集群 | 宿主 | 結(jié)果 | 重啟次數(shù) | |
|---|---|---|---|---|---|
| 1 | auto-srv-cwhttp-sf-30b71-0 | py | 10.86.98.42 | 重啟 | 1 |
| 2 | conf-master-sf-19cf6-0 | us01 | 10.15.29.31 | 重啟 | 1 |
| 3 | opensource-sf-dc750-2 | us01 | 10.15.29.31 | 未重啟 | 1 |
容器啟停相關(guān)的組件首先想到的就是 kubelet,因此去查看 kubelet 日志,拿 py 的距離,重啟時間為 2020-03-12 10:42:27,所以只需要看這之前的一些日志
這里直接貼出來最后過濾后的日志,省略一些中間過程
[email protected]:~$ grep 0312 /var/log/kubernetes/kubelet.INFO | grep -E "10:42:26|10:42:27" | grep -E "8a61fda8d43e4c28d4092a1bc8e5f372846d955ffffe0353a754c2e42f271b56|auto-srv-cwhttp-sf-30b71-0|bfde4b15-cb98-11e8-a3c8-6c92bf85beda" | grep -v -i volume
I0312 10:42:26.101676 2353235 kubelet.go:1887] SyncLoop (ADD, "api"): "lifejuhe-pre-sf-d46e3-0_default(9c048a53-53b2-11ea-b6ae-6c92bf85be08), prime-manager-sf-71a2e-2_default(35bc54e5-4ef1-11ea-b6ae-6c92bf85be08), ai-business-sf-0cac5-19_default(b60c0a18-4d70-11ea-b6ae-6c92bf85be08), disconf-sf-92894-2_default(4ae4dfb8-fee4-11e9-b433-6c92bf85be08), am-base-api-new-sf-de855-4_default(e59f4da9-cd68-11e9-92ca-6c92bf85be08), auto-srv-menu-sf-96f75-1_default(eb6ef2a6-4c7c-11ea-b6ae-6c92bf85be08), pre-000_default(d8f7688a-cc39-11e8-a3c8-6c92bf85beda), auto-wechat-srv-sf-e680a-0_default(9a6e0fc8-051c-11ea-b433-6c92bf85be08), oracle-pre-sf-253a7-0_default(c0cef405-5fa8-11ea-b6ae-6c92bf85be08), auto-srv-cwhttp-sf-30b71-0_default(bfde4b15-cb98-11e8-a3c8-6c92bf85beda), auto-srv-chewu-sf-dc750-0_default(1d9eac73-620a-11ea-b6ae-6c92bf85be08), member-sf-747bb-64_default(4e761a14-634e-11ea-b6ae-6c92bf85be08), ep-arch-tmp-bh-sf-1eaad-4_default(7a2d3f79-001a-11e9-a95b-6c92bf85beda), nsky-htw-h5-extranet-sf-900e8-1_default(10ac4571-08c5-11e9-a95b-6c92bf85beda), fate-saver-pre-sf-5cee9-0_default(39915cd3-f70f-11e9-b433-6c92bf85be08), shortserver-sf-ab776-4_default(d1909587-6362-11ea-b6ae-6c92bf85be08), rainbow-h5-sf-c72e4-3_default(cba38b41-1cbe-11ea-b433-6c92bf85be08), agency-call-service-sf-674c2-29_default(e74635f4-5b46-11ea-b6ae-6c92bf85be08), hnc-ddrccp-sf-db379-1_default(f16fbe5f-9ba4-11e9-a777-6c92bf85be08), log-hnc-sf-6a5be-152_default(a87148b5-c002-11e9-92ca-6c92bf85be08), actinia-service-py-sf-61fb7-0_default(75ef3250-3998-11e9-a95b-6c92bf85beda), its-timing-test-sf-8dcfc-0_default(3ad3b185-d175-11e8-a3c8-6c92bf85beda), de-st-forecastor-sf-5a382-3_default(5c6c02b8-ea6c-11e9-8a9f-6c92bf85be08), transit-compass-sf-69886-3_default(5f28569f-7152-11e9-9a4a-6c92bf85be08), gundam-centos6-002_default(3d075c28-042a-11e9-b650-6c92bf85be08), loan-credit-server-sf-c411b-4_default(6455d8e9-61fe-11ea-b6ae-6c92bf85be08), marketingmodel-service-sf-b1260-16_default(9d570b88-5f31-11e9-9b14-6c92bf85beda), csi-hnc-sf-a8523-16_default(b6078000-5f60-11ea-b6ae-6c92bf85be08), ms-shutdown-datadriver-sf-30b86-4_default(805e9837-f0d7-11e9-a648-6c92bf85be08), abnormal-shutdown-service-sf-342c4-18_default(ccb413e6-d832-11e9-92ca-6c92bf85be08), fate-lancer-sf-49896-2_default(241ad7ce-5c53-11ea-b6ae-6c92bf85be08), jetfire-sf-7dbf2-2_default(13f50c1d-5de8-11ea-b6ae-6c92bf85be08), hnc-pre-v-sf-745f9-0_default(70a98be0-3fbe-11e9-a95b-6c92bf85beda), orderpre-sf-b433d-0_default(03f84b30-1760-11ea-b433-6c92bf85be08), gatewayserver-sf-fc2e7-9_default(310030c4-62b3-11ea-b6ae-6c92bf85be08), base-message-service-sf-1b6f0-2_default(f63261ce-d0aa-11e9-92ca-6c92bf85be08), athena-api-pre-sf-d61bf-0_default(9f7e4183-0089-11ea-b433-6c92bf85be08), soda-f-api-sf-c88f2-2_default(fc9520db-8211-11e9-913a-6c92bf85beda), soda-d-api-py-sf-c6659-5_default(af4a67f9-3216-11ea-b433-6c92bf85be08), rollsroyce-pre-sf-6ed43-0_default(78a053cf-6375-11ea-b6ae-6c92bf85be08), member-sf-747bb-74_default(7323a872-634e-11ea-b6ae-6c92bf85be08), settle-consumer-abtest-hnc-pre-v-sf-b18cf-0_default(d199f994-6363-11ea-b6ae-6c92bf85be08), dpub-vote-pre-sf-bc747-0_default(7388d4f7-1a82-11ea-b433-6c92bf85be08), delta-hub-web-sf-bc67e-2_default(35699c85-620c-11ea-b6ae-6c92bf85be08), drunkeness-model-service-sf-a38a4-43_default(c3cd5daa-5b47-11ea-b6ae-6c92bf85be08), lifeapi-hnc-sf-4a88c-2_default(46c7368a-026c-11e9-a95b-6c92bf85beda), prod-data-service-sf-a1520-0_default(21e423b0-d04f-11e9-92ca-6c92bf85be08), seed-pre-sf-53a95-0_default(c51e0a0f-592a-11ea-b6ae-6c92bf85be08)"
I0312 10:42:26.101950 2353235 predicate.go:80] pod auto-srv-cwhttp-sf-30b71-0 is running on local node, just admit it.
I0312 10:42:26.101969 2353235 prober_manager.go:160] DEFAULT_PROBER: pod auto-srv-cwhttp-sf-30b71-0 is notready state(Running) and has no readiness prober, add a default one.
I0312 10:42:26.102104 2353235 kubelet_pods.go:1424] Generating status for "auto-srv-cwhttp-sf-30b71-0_default(bfde4b15-cb98-11e8-a3c8-6c92bf85beda)"
I0312 10:42:26.101980 2353235 worker.go:95] INITIAL_STATUS {Running [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-10-09 15:55:54 +0800 CST } {Ready False 0001-01-01 00:00:00 +0000 UTC 2018-12-26 12:14:26 +0800 CST } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2019-04-03 20:02:17 +0800 CST } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2018-10-09 15:55:54 +0800 CST }] 10.86.98.42 10.161.74.102 2018-10-09 15:55:54 +0800 CST [] [{auto-srv-cwhttp-py {nil &ContainerStateRunning{StartedAt:2018-12-26 12:14:25 +0800 CST,} nil} {nil nil &ContainerStateTerminated{ExitCode:137,Signal:0,Reason:Error,Message:,StartedAt:2018-10-09 15:55:55 +0800 CST,FinishedAt:2018-12-26 11:49:33 +0800 CST,ContainerID:docker://60fcdbc337b3b57fc16c9817de1f5314fefec1f532211972646a0ae7cb64a17d,}} true 1 registry.xiaojukeji.com/didibuild/auto-srv-cwhttp.hnc-v.chewu-rest.bcar.carplatform.didi.com.centos67:20180912 docker-pullable://registry.xiaojukeji.com/didibuild/auto-srv-cwhttp.hnc-v.chewu-rest.bcar.carplatform.didi.com.centos67@sha256:3fb5c481c184f0752db8832f15940c7d306d87a494039a6e667a7561a28fd98b docker://8a61fda8d43e4c28d4092a1bc8e5f372846d955ffffe0353a754c2e42f271b56}] Burstable}
I0312 10:42:26.102147 2353235 worker.go:104] LEOPOLD_INIT_PROBER: auto-srv-cwhttp-sf-30b71-0_default(bfde4b15-cb98-11e8-a3c8-6c92bf85beda)'s container auto-srv-cwhttp-py has been already running on the host({auto-srv-cwhttp-py {nil &ContainerStateRunning{StartedAt:2018-12-26 12:14:25 +0800 CST,} nil} {nil nil &ContainerStateTerminated{ExitCode:137,Signal:0,Reason:Error,Message:,StartedAt:2018-10-09 15:55:55 +0800 CST,FinishedAt:2018-12-26 11:49:33 +0800 CST,ContainerID:docker://60fcdbc337b3b57fc16c9817de1f5314fefec1f532211972646a0ae7cb64a17d,}} true 1 registry.xiaojukeji.com/didibuild/auto-srv-cwhttp.hnc-v.chewu-rest.bcar.carplatform.didi.com.centos67:20180912 docker-pullable://registry.xiaojukeji.com/didibuild/auto-srv-cwhttp.hnc-v.chewu-rest.bcar.carplatform.didi.com.centos67@sha256:3fb5c481c184f0752db8832f15940c7d306d87a494039a6e667a7561a28fd98b docker://8a61fda8d43e4c28d4092a1bc8e5f372846d955ffffe0353a754c2e42f271b56})
I0312 10:42:26.102199 2353235 prober_manager.go:193] add a readiness prober for pod: auto-srv-cwhttp-sf-30b71-0_default(bfde4b15-cb98-11e8-a3c8-6c92bf85beda) - auto-srv-cwhttp-py: &Probe{Handler:Handler{Exec:&ExecAction{Command:[bash -ec [[ -f /etc/container/.ddcloud_initing ]] && { [[ -f /etc/container/.ddcloud_inited ]] || exit 2; }; exit 0;],},HTTPGet:nil,TCPSocket:nil,},InitialDelaySeconds:0,TimeoutSeconds:1,PeriodSeconds:2,SuccessThreshold:1,FailureThreshold:10,}
W0312 10:42:26.102284 2353235 status_manager.go:205] Container readiness changed before pod has synced: "auto-srv-cwhttp-sf-30b71-0_default(bfde4b15-cb98-11e8-a3c8-6c92bf85beda)" - "docker://8a61fda8d43e4c28d4092a1bc8e5f372846d955ffffe0353a754c2e42f271b56"
I0312 10:42:26.119238 2353235 status_manager.go:499] Patch status for pod "auto-srv-cwhttp-sf-30b71-0_default(bfde4b15-cb98-11e8-a3c8-6c92bf85beda)" with "{\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"Initialized\"},{\"type\":\"Ready\"},{\"type\":\"ContainersReady\"},{\"type\":\"PodScheduled\"}],\"conditions\":[{\"message\":\"containers with unready status: [auto-srv-cwhttp-py]\",\"reason\":\"ContainersNotReady\",\"type\":\"Ready\"},{\"lastTransitionTime\":\"2020-03-12T02:42:26Z\",\"message\":\"containers with unready status: [auto-srv-cwhttp-py]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"ContainersReady\"}],\"containerStatuses\":[{\"containerID\":\"docker://8a61fda8d43e4c28d4092a1bc8e5f372846d955ffffe0353a754c2e42f271b56\",\"image\":\"registry.xiaojukeji.com/didibuild/auto-srv-cwhttp.hnc-v.chewu-rest.bcar.carplatform.didi.com.centos67:20180912\",\"imageID\":\"docker-pullable://registry.xiaojukeji.com/didibuild/auto-srv-cwhttp.hnc-v.chewu-rest.bcar.carplatform.didi.com.centos67@sha256:3fb5c481c184f0752db8832f15940c7d306d87a494039a6e667a7561a28fd98b\",\"lastState\":{\"terminated\":{\"containerID\":\"docker://60fcdbc337b3b57fc16c9817de1f5314fefec1f532211972646a0ae7cb64a17d\",\"exitCode\":137,\"finishedAt\":\"2018-12-26T03:49:33Z\",\"reason\":\"Error\",\"startedAt\":\"2018-10-09T07:55:55Z\"}},\"name\":\"auto-srv-cwhttp-py\",\"ready\":false,\"restartCount\":1,\"state\":{\"running\":{\"startedAt\":\"2018-12-26T04:14:25Z\"}}}]}}"
I0312 10:42:26.119337 2353235 status_manager.go:506] Status for pod "auto-srv-cwhttp-sf-30b71-0_default(bfde4b15-cb98-11e8-a3c8-6c92bf85beda)" updated successfully: (1, {Phase:Running Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2018-10-09 15:55:54 +0800 CST Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2018-12-26 12:14:26 +0800 CST Reason:ContainersNotReady Message:containers with unready status: [auto-srv-cwhttp-py]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-03-12 10:42:26 +0800 CST Reason:ContainersNotReady Message:containers with unready status: [auto-srv-cwhttp-py]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2018-10-09 15:55:54 +0800 CST Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:10.86.98.42 PodIP:10.161.74.102 StartTime:2018-10-09 15:55:54 +0800 CST InitContainerStatuses:[] ContainerStatuses:[{Name:auto-srv-cwhttp-py State:{Waiting:nil Running:&ContainerStateRunning{StartedAt:2018-12-26 12:14:25 +0800 CST,} Terminated:nil} LastTerminationState:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:137,Signal:0,Reason:Error,Message:,StartedAt:2018-10-09 15:55:55 +0800 CST,FinishedAt:2018-12-26 11:49:33 +0800 CST,ContainerID:docker://60fcdbc337b3b57fc16c9817de1f5314fefec1f532211972646a0ae7cb64a17d,}} Ready:false RestartCount:1 Image:registry.xiaojukeji.com/didibuild/auto-srv-cwhttp.hnc-v.chewu-rest.bcar.carplatform.didi.com.centos67:20180912 ImageID:docker-pullable://registry.xiaojukeji.com/didibuild/auto-srv-cwhttp.hnc-v.chewu-rest.bcar.carplatform.didi.com.centos67@sha256:3fb5c481c184f0752db8832f15940c7d306d87a494039a6e667a7561a28fd98b ContainerID:docker://8a61fda8d43e4c28d4092a1bc8e5f372846d955ffffe0353a754c2e42f271b56}] QOSClass:Burstable})
I0312 10:42:26.155955 2353235 kubelet.go:1932] SyncLoop (PLEG): "auto-srv-cwhttp-sf-30b71-0_default(bfde4b15-cb98-11e8-a3c8-6c92bf85beda)", event: &pleg.PodLifecycleEvent{ID:"bfde4b15-cb98-11e8-a3c8-6c92bf85beda", Type:"ContainerStarted", Data:"8a61fda8d43e4c28d4092a1bc8e5f372846d955ffffe0353a754c2e42f271b56"}
I0312 10:42:26.155986 2353235 kubelet.go:1932] SyncLoop (PLEG): "auto-srv-cwhttp-sf-30b71-0_default(bfde4b15-cb98-11e8-a3c8-6c92bf85beda)", event: &pleg.PodLifecycleEvent{ID:"bfde4b15-cb98-11e8-a3c8-6c92bf85beda", Type:"ContainerDied", Data:"60fcdbc337b3b57fc16c9817de1f5314fefec1f532211972646a0ae7cb64a17d"}
I0312 10:42:26.156016 2353235 kubelet_pods.go:1424] Generating status for "auto-srv-cwhttp-sf-30b71-0_default(bfde4b15-cb98-11e8-a3c8-6c92bf85beda)"
I0312 10:42:26.156102 2353235 kubelet.go:1932] SyncLoop (PLEG): "auto-srv-cwhttp-sf-30b71-0_default(bfde4b15-cb98-11e8-a3c8-6c92bf85beda)", event: &pleg.PodLifecycleEvent{ID:"bfde4b15-cb98-11e8-a3c8-6c92bf85beda", Type:"ContainerStarted", Data:"e4c7483fc2e3611890959e017d5feb042b57f44fd53f4c470a41da4c45e91731"}
I0312 10:42:26.156132 2353235 kubelet.go:1932] SyncLoop (PLEG): "auto-srv-cwhttp-sf-30b71-0_default(bfde4b15-cb98-11e8-a3c8-6c92bf85beda)", event: &pleg.PodLifecycleEvent{ID:"bfde4b15-cb98-11e8-a3c8-6c92bf85beda", Type:"ContainerDied", Data:"1d945817797c26545c4b31a441fa5b09d157bff5dd38487d8ff9822b5c022e2f"}
I0312 10:42:26.156156 2353235 kubelet_pods.go:1424] Generating status for "auto-srv-cwhttp-sf-30b71-0_default(bfde4b15-cb98-11e8-a3c8-6c92bf85beda)"
I0312 10:42:26.402848 2353235 kuberuntime_manager.go:580] computePodActions got {KillPod:false CreateSandbox:false SandboxID:e4c7483fc2e3611890959e017d5feb042b57f44fd53f4c470a41da4c45e91731 Attempt:1 NextInitContainerToStart:nil ContainersToStart:[] ContainersToKill:map[]} for pod "auto-srv-cwhttp-sf-30b71-0_default(bfde4b15-cb98-11e8-a3c8-6c92bf85beda)"
I0312 10:42:26.403129 2353235 server.go:460] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"auto-srv-cwhttp-sf-30b71-0", UID:"bfde4b15-cb98-11e8-a3c8-6c92bf85beda", APIVersion:"v1", ResourceVersion:"4098877133", FieldPath:""}): type: 'Warning' reason: 'MissingClusterDNS' pod: "auto-srv-cwhttp-sf-30b71-0_default(bfde4b15-cb98-11e8-a3c8-6c92bf85beda)". kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
I0312 10:42:26.539178 2353235 prober.go:118] Readiness probe for "auto-srv-cwhttp-sf-30b71-0_default(bfde4b15-cb98-11e8-a3c8-6c92bf85beda):auto-srv-cwhttp-py" succeeded
I0312 10:42:27.024882 2353235 kubelet_pods.go:1424] Generating status for "auto-srv-cwhttp-sf-30b71-0_default(bfde4b15-cb98-11e8-a3c8-6c92bf85beda)"
I0312 10:42:27.032833 2353235 kuberuntime_container.go:561] Killing container "docker://8a61fda8d43e4c28d4092a1bc8e5f372846d955ffffe0353a754c2e42f271b56" with 5 second grace period
I0312 10:42:27.032846 2353235 kuberuntime_container.go:466] Running preStop hook for container "docker://8a61fda8d43e4c28d4092a1bc8e5f372846d955ffffe0353a754c2e42f271b56"
I0312 10:42:31.908195 2353235 status_manager.go:506] Status for pod "auto-srv-cwhttp-sf-30b71-0_default(bfde4b15-cb98-11e8-a3c8-6c92bf85beda)" updated successfully: (2, {Phase:Running Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2018-10-09 15:55:54 +0800 CST Reason: Message:} {Type:Ready Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-03-12 10:42:27 +0800 CST Reason: Message:} {Type:ContainersReady Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-03-12 10:42:27 +0800 CST Reason: Message:} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2018-10-09 15:55:54 +0800 CST Reason: Message:}] Message:container restart time reaches the limit: 1 Reason:RestartLimit NominatedNodeName: HostIP:10.86.98.42 PodIP:10.161.74.102 StartTime:2018-10-09 15:55:54 +0800 CST InitContainerStatuses:[] ContainerStatuses:[{Name:auto-srv-cwhttp-py State:{Waiting:nil Running:&ContainerStateRunning{StartedAt:2018-12-26 12:14:25 +0800 CST,} Terminated:nil} LastTerminationState:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:137,Signal:0,Reason:Error,Message:,StartedAt:2018-10-09 15:55:55 +0800 CST,FinishedAt:2018-12-26 11:49:33 +0800 CST,ContainerID:docker://60fcdbc337b3b57fc16c9817de1f5314fefec1f532211972646a0ae7cb64a17d,}} Ready:true RestartCount:1 Image:registry.xiaojukeji.com/didibuild/auto-srv-cwhttp.hnc-v.chewu-rest.bcar.carplatform.didi.com.centos67:20180912 ImageID:docker-pullable://registry.xiaojukeji.com/didibuild/auto-srv-cwhttp.hnc-v.chewu-rest.bcar.carplatform.didi.com.centos67@sha256:3fb5c481c184f0752db8832f15940c7d306d87a494039a6e667a7561a28fd98b ContainerID:docker://8a61fda8d43e4c28d4092a1bc8e5f372846d955ffffe0353a754c2e42f271b56}] QOSClass:Burstable})
可以看到一個有趣的現(xiàn)象,第 19 條日志在 10:42:26 的時候還顯示 readiness probe 探測成功,等到第 21 條日志 10:42:27 就去 kill 容器了,用飛哥的話來說就是 上一秒還叫人家小甜甜,下一秒就叫人家牛夫人
I0312 10:42:26.539178 2353235 prober.go:118] Readiness probe for "auto-srv-cwhttp-sf-30b71-0_default(bfde4b15-cb98-11e8-a3c8-6c92bf85beda):auto-srv-cwhttp-py" succeeded
I0312 10:42:27.032833 2353235 kuberuntime_container.go:561] Killing container "docker://8a61fda8d43e4c28d4092a1bc8e5f372846d955ffffe0353a754c2e42f271b56" with 5 second grace period
這句話是在真正刪除容器之前打印出來的,對應代碼如下
// killContainer kills a container through the following steps:
// * Run the pre-stop lifecycle hooks (if applicable).
// * Stop the container.
func (m *kubeGenericRuntimeManager) killContainer(pod *v1.Pod, containerID kubecontainer.ContainerID, containerName string, reason string, gracePeriodOverride *int64) error {
...
glog.V(2).Infof("Killing container %q with %d second grace period", containerID.String(), gracePeriod)
...
}
我們不需要關(guān)注他具體的執(zhí)行過程,只需要按照調(diào)用堆棧往上找。killContainer 有三處調(diào)用,分別對應的情況為
-
有 PostStartHook 且執(zhí)行失敗時(ContainersToStart 包含了所有需要 start 的容器) -
killPod 時 -
pod 中有需要 kill 的容器時(ContainersToKill 包含了所有需要 kill 的容器)
從第 17 條日志中 computePodActions 的結(jié)果可以看到?jīng)]有需要 start 和 kill 的容器,所以調(diào)用方為 killPod,而 killPod 也有三處調(diào)用,分別為
-
updateType 為SyncPodKill時(容器被驅(qū)逐時) -
admit 失敗或者 pod 被刪除或者 pod 為 failed 時 -
容器非首次創(chuàng)建且 pod 的 cgroup 不存在時
因為我們沒有開啟驅(qū)逐功能,且此時雖然容器正在運行但是 pod 的 cgroup 是存在的,所以只能由中間一條規(guī)則觸發(fā),也就是必然滿足中間的規(guī)則,且此時 pod 沒有被刪除,也不是 failed 的狀態(tài),所以可以大概判斷出來是 admit 失敗導致的容器重啟,正好我們自定義了全局的 restartcountlimit 類型的 admit handler,即容器重啟次數(shù)大于等于 1 時即可認為 admit 失敗,到這里似乎找到了問題的原因,即RestartCount 大于 0 的容器會在 kubelet 停止一段時間重啟后導致該容器重啟。
然后興沖沖的跑到 py04 測試集群測試了一把,果然復現(xiàn)了,然后屁顛屁顛的到群里說發(fā)現(xiàn)問題了,但是,飛哥立馬質(zhì)疑說美東的那臺宿主上也多個滿足條件的容器,但是只有一個重啟了。頭皮發(fā)麻啊,看來還不是問題根本所在,還得繼續(xù)往前找,為什么同樣滿足 RestartCount 大于 0,有的 admit 失敗,有的成功,有必要看一下具體實現(xiàn)了,如下
func (l *restartLimitAdmitHandler) Admit(attrs *PodAdmitAttributes) PodAdmitResult {
status, ok := l.statusManager.GetPodStatus(attrs.Pod.UID)
if !ok {
status = attrs.Pod.Status
}
isPodReady := func(ps v1.PodStatus) bool {
for _, cond := range ps.Conditions {
if cond.Type == v1.PodReady && cond.Status == v1.ConditionTrue {
return true
}
}
return false
}
if status.Phase == v1.PodRunning && isPodReady(status) {
return PodAdmitResult{Admit: true}
}
maxRestartCount := int32(l.config.RestartCountLimit)
for _, cs := range status.ContainerStatuses {
if cs.State.Running != nil && cs.Ready {
return PodAdmitResult{Admit: true}
}
if maxRestartCount > 0 && cs.RestartCount >= maxRestartCount {
return PodAdmitResult{
Admit: false,
Reason: "RestartLimit",
Message: fmt.Sprintf("container restart time reaches the limit: %v", maxRestartCount),
}
}
}
return PodAdmitResult{Admit: true}
}
這里需要區(qū)分兩個概念,Pod Ready 和 Container Ready。根據(jù)問題背景我們知道此時在做 cgroup 遷移,kubelet 已經(jīng)停了一段時間,且從 apiserver 的日志中也可以看到 node controller 檢測到 NodeNotReady 的事件,把上面的所有 Pod 都設置為 Ready:false。即前 17 行的執(zhí)行邏輯都是一樣的,因為 isRodReady 函數(shù)都是 false,再往下看唯一原因就是重啟的容器不滿足這個條件
cs.State.Running != nil && cs.Ready
但是具體哪個條件不一致導致的,還得繼續(xù)看日志
# 有問題的容器的日志
I0312 10:42:26.119238 2353235 status_manager.go:499] Patch status for pod "auto-srv-cwhttp-sf-30b71-0_default(bfde4b15-cb98-11e8-a3c8-6c92bf85beda)" with "{\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"Initialized\"},{\"type\":\"Ready\"},{\"type\":\"ContainersReady\"},{\"type\":\"PodScheduled\"}],\"conditions\":[{\"message\":\"containers with unready status: [auto-srv-cwhttp-py]\",\"reason\":\"ContainersNotReady\",\"type\":\"Ready\"},{\"lastTransitionTime\":\"2020-03-12T02:42:26Z\",\"message\":\"containers with unready status: [auto-srv-cwhttp-py]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"ContainersReady\"}],\"containerStatuses\":[{\"containerID\":\"docker://8a61fda8d43e4c28d4092a1bc8e5f372846d955ffffe0353a754c2e42f271b56\",\"image\":\"registry.xiaojukeji.com/didibuild/auto-srv-cwhttp.hnc-v.chewu-rest.bcar.carplatform.didi.com.centos67:20180912\",\"imageID\":\"docker-pullable://registry.xiaojukeji.com/didibuild/auto-srv-cwhttp.hnc-v.chewu-rest.bcar.carplatform.didi.com.centos67@sha256:3fb5c481c184f0752db8832f15940c7d306d87a494039a6e667a7561a28fd98b\",\"lastState\":{\"terminated\":{\"containerID\":\"docker://60fcdbc337b3b57fc16c9817de1f5314fefec1f532211972646a0ae7cb64a17d\",\"exitCode\":137,\"finishedAt\":\"2018-12-26T03:49:33Z\",\"reason\":\"Error\",\"startedAt\":\"2018-10-09T07:55:55Z\"}},\"name\":\"auto-srv-cwhttp-py\",\"ready\":false,\"restartCount\":1,\"state\":{\"running\":{\"startedAt\":\"2018-12-26T04:14:25Z\"}}}]}}"
# 沒問題的容器的日志
I0312 11:12:37.927828 2906779 status_manager.go:499] Patch status for pod "opensource-sf-dc750-2_default(67c1ba01-00f9-11e9-a60d-246e9693c184)" with "{\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"Initialized\"},{\"type\":\"Ready\"},{\"type\":\"ContainersReady\"},{\"type\":\"PodScheduled\"}],\"conditions\":[{\"lastTransitionTime\":\"2020-03-12T03:12:37Z\",\"status\":\"True\",\"type\":\"Ready\"}]}}"
對比差異可以看到有問題的容器 ready:false,沒問題的 ready:true,也就對應之前條件里的 cs.Ready,因此雖然 RestartCount 同為 1,admit 結(jié)果卻不同。
那 ready 為啥不一樣啊,在哪里賦值的,繼續(xù)看
func (m *manager) UpdatePodStatus(podUID types.UID, podStatus *v1.PodStatus) {
for i, c := range podStatus.ContainerStatuses {
var ready bool
if c.State.Running == nil {
ready = false
} else if result, ok := m.readinessManager.Get(kubecontainer.ParseContainerID(c.ContainerID)); ok {
ready = result == results.Success
} else {
// The check whether there is a probe which hasn't run yet.
_, exists := m.getWorker(podUID, c.Name, readiness)
ready = !exists
}
podStatus.ContainerStatuses[i].Ready = ready
}
// init containers are ready if they have exited with success or if a readiness probe has
// succeeded.
for i, c := range podStatus.InitContainerStatuses {
var ready bool
if c.State.Terminated != nil && c.State.Terminated.ExitCode == 0 {
ready = true
}
podStatus.InitContainerStatuses[i].Ready = ready
}
}
三處賦值的地方
-
c.State.Running 為空時(新 start 的 container),此時 ready 為 false,顯然并不是這種情況,因為容器之前就已經(jīng)在運行了 -
readiness probe 探測結(jié)果為 success 時設置 ready:true 否則設置 false -
根據(jù) readiness 探針判斷,如果沒有此類型探針,則設置 ready:true,否則設置為 false
這三種情況也都很好理解,針對我們要查的問題,第一種不會觸發(fā),第二三種會根據(jù)情況觸發(fā)(所有 Pod 都默認添加了 readiness probe 探針,探測容器里的一個指定的文件,dockerinit 負責在所有初始化腳本執(zhí)行成功后創(chuàng)建該文件),也就是先去找探測結(jié)果,有的話按探測結(jié)果設置,否則按照探針的設置情況來設置,因為我們有默認探針,所以如果走到最后的 else 里面的話,ready 會被設置為 false,此時就會觸發(fā)容器重啟。這就需要找不到探測結(jié)果才行,直接去找設置探測結(jié)果的代碼,我們只關(guān)注第一次觸發(fā)添加探測結(jié)果的代碼,如下
// Creates and starts a new probe worker.
func newWorker(
m *manager,
probeType probeType,
pod *v1.Pod,
container v1.Container) *worker {
w := &worker{
stopCh: make(chan struct{}, 1), // Buffer so stop() can be non-blocking.
pod: pod,
container: container,
probeType: probeType,
probeManager: m,
}
glog.V(2).Infof("INITIAL_STATUS %v", pod.Status)
switch probeType {
case readiness:
w.spec = container.ReadinessProbe
w.resultsManager = m.readinessManager
w.initialValue = results.Failure
for _, s := range pod.Status.ContainerStatuses {
if s.Name == w.container.Name && s.Ready {
glog.V(3).Infof("LEOPOLD_INIT_PROBER: %s's container %s has been already running on the host(%v)",
format.Pod(w.pod), w.container.Name, s)
w.containerID = kubecontainer.ParseContainerID(s.ContainerID)
w.spec.InitialDelaySeconds = 0
w.initialValue = results.Success
w.resultsManager.Set(w.containerID, results.Success, w.pod)
break
}
}
case liveness:
w.spec = container.LivenessProbe
w.resultsManager = m.livenessManager
w.initialValue = results.Success
}
w.proberResultsMetricLabels = prometheus.Labels{
"probe_type": w.probeType.String(),
"container_name": w.container.Name,
"pod_name": w.pod.Name,
"namespace": w.pod.Namespace,
"pod_uid": string(w.pod.UID),
}
return w
}
簡單解釋一下代碼意思,kubelet 啟動時為每個 Pod 根據(jù)其設置的探針類型創(chuàng)建一個 worker,用來進行相關(guān)的健康檢查,而我們?yōu)榱藢崿F(xiàn)當 kubelet 停掉后容器正常運行且 kubelet 啟動后容器不重啟的效果,在 worker 初始化的時候就為每個 readiness 類型的探針添加了默認的探測結(jié)果:success,這樣在上面設置 ready 時,從探測結(jié)果獲取到 success 后會把 ready 設置為 true。看到這里你可能會一個疑問
-
添加默認探測結(jié)果的時候也用到了 s.Ready,必須為 true 才能添加默認的探測結(jié)果,那為什么又要根據(jù)探測結(jié)果來設置 ready 的值?
其實此 Ready 非彼 Ready,添加默認探測結(jié)果時用到的 Ready 為直接從 apiserver 獲取到的 etcd 中存儲的數(shù)據(jù),此時確實是 Ready:true,因為 node controller 只設置了 Pod Ready:false,并不會去更新 Container 的屬性。而后續(xù)我們要設置的 Ready 才是真整 Container 的狀態(tài),而且此時用到的判斷條件都是結(jié)合 etcd 的信息和 docker 實際運行容器的狀態(tài)信息之后做出的判斷,用來更新 etcd 中存儲的 Pod 及 Containers 的狀態(tài),后續(xù)會定時把實際運行狀態(tài)同步到 etcd 中。
無論有問題的容器還是沒有問題的容器,都會進行默認探測結(jié)果的設置,都可以看到如下日志
glog.V(2).Infof("INITIAL_STATUS %v", pod.Status)
glog.V(3).Infof("LEOPOLD_INIT_PROBER: %s's container %s has been already running on the host(%v)", format.Pod(w.pod), w.container.Name, s)
可以根據(jù)這兩條日志打印的內(nèi)容驗證上述的邏輯。
那么最后的問題來了,為什么都設置了默認探測結(jié)果,最后的執(zhí)行結(jié)果卻不一樣,一個都到了分支 2(設置 ready = true),一個走到了分支 3(設置 ready = false)呢???
因為添加探測結(jié)果和使用探測結(jié)果的代碼執(zhí)行是在兩個 goroutine 進行的?。。。。。?!不信你看
func (kl *Kubelet) HandlePodAdditions(pods []*v1.Pod) {
...
// 觸發(fā)UpdatePod
kl.dispatchWork(pod, kubetypes.SyncPodCreate, mirrorPod, start)
kl.probeManager.AddPod(pod)
}
func (p *podWorkers) UpdatePod(options *UpdatePodOptions) {
...
if podUpdates, exists = p.podUpdates[uid]; !exists {
...
go func() {
defer runtime.HandleCrash()
// 觸發(fā)syncPod, 里面進行ready的賦值
p.managePodLoop(podUpdates)
}()
}
...
}
func (m *manager) AddPod(pod *v1.Pod) {
m.workerLock.Lock()
defer m.workerLock.Unlock()
...
if c.ReadinessProbe != nil {
key.probeType = readiness
if _, ok := m.workers[key]; ok {
glog.Errorf("Readiness probe already exists! %v - %v",
format.Pod(pod), c.Name)
return
}
// 添加默認探測成功的結(jié)果,分支2使用
w := newWorker(m, readiness, pod, c)
w.isDefaultReadinessProber = isDefaultReadinessProber
// 添加探針,分支3使用
m.workers[key] = w
glog.V(3).Infof("add a readiness prober for pod: %v - %v: %v", format.Pod(pod), c.Name, w.spec)
go w.run()
}
...
}
再根據(jù)日志打印時間來看
[email protected]:~$ grep "LEOPOLD_INIT_PROBER: opensource-sf-dc750-2" /var/log/kubernetes/kubelet.INFO | head -n 1
I0312 11:12:37.907399 2906779 worker.go:104] LEOPOLD_INIT_PROBER: opensource-sf-dc750-2_default(67c1ba01-00f9-11e9-a60d-246e9693c184)'s container opensource-us01 has been already running on the host({opensource-us01 {nil &ContainerStateRunning{StartedAt:2018-12-26 10:32:29 +0800 CST,} nil} {nil nil &ContainerStateTerminated{ExitCode:255,Signal:0,Reason:Error,Message:,StartedAt:2018-12-16 14:11:20 +0800 CST,FinishedAt:2018-12-26 10:31:31 +0800 CST,ContainerID:docker://0ea96f4f19542f416bf1afba438f7e26d9ef31508829ef5c94d31c184bfb0a6a,}} true 1 registry.xiaojukeji.com/didibuild/opensource.us01-v.opensource.codeinsight.ep.didi.com.centos67:c2db70ab docker-pullable://registry.xiaojukeji.com/didibuild/opensource.us01-v.opensource.codeinsight.ep.didi.com.centos67@sha256:ce790b51ef3b617cfd78bf9c1108e66e1c95d87e4590952c5746a5bd4294282d docker://a29de66df4998058df97857223fe2c6c228d4d4ec2f7da7c9da05bcdd7ed214e})
[email protected]:~$ grep "Generating status for" /var/log/kubernetes/kubelet.INFO | grep -i opensource | head -n 1
I0312 11:12:37.907439 2906779 kubelet_pods.go:1424] Generating status for "opensource-sf-dc750-2_default(67c1ba01-00f9-11e9-a60d-246e9693c184)"
[email protected]:~$ grep "Generating status for" /var/log/kubernetes/kubelet.INFO | grep -i "conf-master" | head -n 1
I0312 11:12:37.907605 2906779 kubelet_pods.go:1424] Generating status for "conf-master-sf-19cf6-0_default(8371b943-0840-11e9-a60d-246e9693c184)"
[email protected]:~$ grep "LEOPOLD_INIT_PROBER: conf-master" /var/log/kubernetes/kubelet.INFO | head -n 1
I0312 11:12:37.907629 2906779 worker.go:104] LEOPOLD_INIT_PROBER: conf-master-sf-19cf6-0_default(8371b943-0840-11e9-a60d-246e9693c184)'s container conf-master-us01 has been already running on the host({conf-master-us01 {nil &ContainerStateRunning{StartedAt:2018-12-26 10:32:28 +0800 CST,} nil} {nil nil &ContainerStateTerminated{ExitCode:255,Signal:0,Reason:Error,Message:,StartedAt:2018-12-25 20:28:01 +0800 CST,FinishedAt:2018-12-26 10:31:31 +0800 CST,ContainerID:docker://6e0da293cf06d76938df6b1298595663c964c31c1abddc619a6fc5b5b34dfd04,}} true 1 registry.xiaojukeji.com/didibuild/conf-master.us01-v.cm-server.conf-master.public.engine.didi.com.centos72:268c0172 docker-pullable://registry.xiaojukeji.com/didibuild/conf-master.us01-v.cm-server.conf-master.public.engine.didi.com.centos72@sha256:f3b3fbf4a84b85c30f4a8564ce4293bffa440c345c143d9ad3a10904df2b0f7a docker://085089b20cb51d3e3ea6c6bbd04a0e0b467eb9db4e7b6a506bea1151e1d50643})
可以看到重啟的容器先打印"Generating status for"(此處是給 ready 賦值之前打印的),再打印的 LEOPOLD_INIT_PROBER(添加默認探測結(jié)果),也就是說在獲取探測結(jié)果時其實還沒有添加默認探測結(jié)果,所以走到了分支 3,設置 ready:false,而沒問題的容器恰好相反,先添加的探測結(jié)果,再去使用,走到了分支 2。
3. 結(jié)論
賦值和獲取值的操作在兩個 goroutine,且嚴重依賴于第一次的賦值操作,所以應該在保證第一次賦值后在進行取值操作才能確保容器不重啟,雖然分支 3 判斷有沒有默認探針中在設置和獲取的時候都加了鎖,但還是無法保證代碼執(zhí)行順序,所以即使走分支 3,也有可能會出現(xiàn)設置 ready 為 true 的情況(本該設置為 false),但是因為這正是我們想要的效果,所以我們是不會覺察到這種情況的問題的,也就是說開源的代碼中可能也是存在類似誤判的風險的,至少 1.12.4 版本中是存在的。
原文鏈接:https://www.likakuli.com/posts/kubernetes-kubelet-restart/


你可能還喜歡
點擊下方圖片即可閱讀

云原生是一種信仰 ?

掃碼關(guān)注公眾號
后臺回復?k8s?獲取史上最方便快捷的 Kubernetes 高可用部署工具,只需一條命令,連 ssh 都不需要!

點擊 "閱讀原文" 獲取更好的閱讀體驗!
??給個「在看」,是對我最大的支持??

