在k8s中解決pod資源的正確識(shí)別
目錄
1、容器資源限制概述
2、問(wèn)題背景
3、引入 lxcfs
3.1 在 k8s 中部署 lxcfs
3.2 開(kāi)啟命名空間注入
3.3 還原
4、測(cè)試
5、小結(jié)

1、容器資源限制概述
在使用docker作為容器引擎的時(shí)候,可以通過(guò)添加--memory、--cpus及更多參數(shù)來(lái)限制容器可用的cpu和內(nèi)存,具體參數(shù)可以參考docker 資源限制[1],docker對(duì)容器進(jìn)行限制的原理實(shí)際上是利用Linux內(nèi)核的cgroups實(shí)現(xiàn)的,cgroups可以限制、記錄、隔離進(jìn)程組所使用的物理資源(包括:CPU、memory、IO 等),為容器實(shí)現(xiàn)虛擬化提供了基本保證,是構(gòu)建Docker等一系列虛擬化管理工具的基石
關(guān)于cgroups資源限制實(shí)現(xiàn)可以參考Docker 背后的內(nèi)核知識(shí)-cgroups 資源限制[2]
2、問(wèn)題背景
對(duì)于某些容器中運(yùn)行的服務(wù),通常會(huì)自動(dòng)對(duì)當(dāng)前環(huán)境的可用資源數(shù)量進(jìn)行檢測(cè),接著根據(jù)這些數(shù)據(jù)來(lái)合理分配相應(yīng)資源
例如nginx容器,nginx通過(guò)在配置文件中指定nginx worker_processes[3]選項(xiàng),默認(rèn)這個(gè)選項(xiàng)參數(shù)的值為1,表示nginx僅啟動(dòng) 1 個(gè)worker進(jìn)程
如果需要在大并發(fā)環(huán)境下優(yōu)化nginx性能,可以將這個(gè)值手動(dòng)設(shè)置成對(duì)應(yīng)環(huán)境的cpu核數(shù),或者直接配置成auto讓其自動(dòng)設(shè)置,兩種設(shè)置方法中前者需要將配置文件進(jìn)行掛載并手動(dòng)變更配置,后者更為靈活但在容器環(huán)境下會(huì)有一定問(wèn)題,因?yàn)椴还苁峭ㄟ^(guò)docker直接運(yùn)行的容器還是通過(guò)k8s運(yùn)行的最小化單元Pod中的容器,識(shí)別到的cpu和內(nèi)存都是所在node節(jié)點(diǎn)機(jī)器的資源信息,因此對(duì)nginx來(lái)說(shuō)并不能直接通過(guò)auto參數(shù)對(duì)cpu進(jìn)行正確的自動(dòng)識(shí)別,例如我這里的一臺(tái)node節(jié)點(diǎn)及節(jié)點(diǎn)上的pod資源信息
# kubectl describe nodes k8s-node-07|grep -A 5 "Capacity"
Capacity:
cpu: 16
ephemeral-storage: 74408452Ki
hugepages-2Mi: 0
memory: 16430184Ki
pods: 110
# docker info|grep -A 6 "Kernel"
Kernel Version: 4.4.247-1.el7.elrepo.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 15.67GiB
Name: k8s-node-07
# kubectl exec -it test-pod-5dff4b89fd-bsh6b -- bash
root@test-pod-5dff4b89fd-bsh6b:/# free -m
total used free shared buff/cache available
Mem: 16045 7915 2354 1002 5775 6222
Swap: 0 0 0
root@test-pod-5dff4b89fd-bsh6b:/# head -2 /proc/meminfo
MemTotal: 16430184 kB
MemFree: 2374064 kB
如果在k8s中通過(guò)resources限制了Pod的cpu和內(nèi)存,例如
resources:
limits:
cpu: "1"
memory: 2Gi
requests:
cpu: 200m
memory: 512Mi
可以在創(chuàng)建出來(lái)的pod所在節(jié)點(diǎn)機(jī)器上通過(guò)docker命令查看具體的資源信息
# docker inspect b1f4bfb53a2c|grep -i cgroup
"Cgroup": "",
"CgroupParent": "/kubepods/burstable/podc4a25564-225b-4562-afee-fab8cc5d694f",
"DeviceCgroupRules": null,
# cat /sys/fs/cgroup/cpu/kubepods/burstable/podc4a25564-225b-4562-afee-fab8cc5d694f/cpu.cfs_quota_us
100000
# cat /sys/fs/cgroup/cpu/kubepods/burstable/podc4a25564-225b-4562-afee-fab8cc5d694f/cpu.cfs_period_us
100000
通過(guò)查找相關(guān)資料得知,對(duì)nginx來(lái)說(shuō),獲取CPU核心數(shù)是通過(guò)系統(tǒng)調(diào)用sysconf(_SC_NPROCESSORS_ONLN)來(lái)獲取的,實(shí)際上是通過(guò)讀取文件/sys/devices/system/cpu/online來(lái)獲取的,而默認(rèn)情況下pod中的這個(gè)文件信息和宿主機(jī)是一樣的,因此nginx的worker_processes參數(shù)如果設(shè)置成auto,那么最終啟動(dòng)的 worker 進(jìn)程數(shù)將會(huì)是16個(gè),而nginx所在的Pod本身的cpu限制配置較小時(shí),導(dǎo)致每個(gè)worker分配的時(shí)間片比較少,這會(huì)帶來(lái)明顯的響應(yīng)慢的問(wèn)題
# kubectl exec -it test-pod-5dff4b89fd-bsh6b -- cat /sys/devices/system/cpu/online
0-15
3、引入 lxcfs
lxcfs[4]是一個(gè)的小型FUSE文件系統(tǒng),旨在使Linux容器更像一個(gè)虛擬機(jī),能夠幫助容器正確的識(shí)別自身資源,處理對(duì)以下文件的信息
/proc/cpuinfo
/proc/diskstats
/proc/meminfo
/proc/stat
/proc/swaps
/proc/uptime
/sys/devices/system/cpu/online
當(dāng)容器啟動(dòng)時(shí),容器中的/proc/xxx會(huì)被掛載成host上lxcfs的目錄。例如當(dāng)容器內(nèi)的應(yīng)用如果需要讀取/proc/meminfo的信息時(shí),請(qǐng)求就會(huì)被導(dǎo)向lxcfs,而lxcfs又會(huì)通過(guò)cgroup的信息來(lái)返回正確的值最終使得容器內(nèi)的應(yīng)用正確識(shí)別
3.1 在 k8s 中部署 lxcfs
基于k8s部署的lxcfs文件系統(tǒng)的項(xiàng)目地址:https://github.com/denverdino/lxcfs-admission-webhook
其最終利用的原理是基于k8s的動(dòng)態(tài)準(zhǔn)入控制 AdmissionWebhook[5]
我這里的k8s集群版本如下
# kubectl version -o yaml
clientVersion:
buildDate: "2020-12-08T17:59:43Z"
compiler: gc
gitCommit: af46c47ce925f4c4ad5cc8d1fca46c7b77d13b38
gitTreeState: clean
gitVersion: v1.20.0
goVersion: go1.15.5
major: "1"
minor: "20"
platform: darwin/amd64
serverVersion:
buildDate: "2019-06-19T16:32:14Z"
compiler: gc
gitCommit: e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529
gitTreeState: clean
gitVersion: v1.15.0
goVersion: go1.12.5
major: "1"
minor: "15"
platform: linux/amd64
首先獲取資源清單并通過(guò)腳本一鍵部署
# git clone https://github.com/denverdino/lxcfs-admission-webhook.git
# cd lxcfs-admission-webhook
# ls deployment
deployment.yaml lxcfs-daemonset.yaml mutatingwebhook.yaml uninstall.sh web.yaml webhook-patch-ca-bundle.sh
install.sh mutatingwebhook-ca-bundle.yaml service.yaml validatingwebhook.yaml webhook-create-signed-cert.sh
# kubectl apply -f deployment/lxcfs-daemonset.yaml
daemonset.apps/lxcfs created
# ./deployment/install.sh
creating certs in tmpdir /var/folders/8n/11ndbfq95jv79gds8wqj2scc0000gn/T/tmp.c6OKXi4L
Generating RSA private key, 2048 bit long modulus
.......................................+++
...............+++
e is 65537 (0x10001)
certificatesigningrequest.certificates.k8s.io/lxcfs-admission-webhook-svc.default created
NAME AGE REQUESTOR CONDITION
lxcfs-admission-webhook-svc.default 0s admin Pending
certificatesigningrequest.certificates.k8s.io/lxcfs-admission-webhook-svc.default approved
W0327 16:35:14.764281 8953 helpers.go:553] --dry-run is deprecated and can be replaced with --dry-run=client.
secret/lxcfs-admission-webhook-certs created
NAME TYPE DATA AGE
lxcfs-admission-webhook-certs Opaque 2 0s
deployment.apps/lxcfs-admission-webhook-deployment created
service/lxcfs-admission-webhook-svc created
mutatingwebhookconfiguration.admissionregistration.k8s.io/mutating-lxcfs-admission-webhook-cfg created
查看部署結(jié)果,會(huì)運(yùn)行一個(gè)名為lxcfs-admission-webhook-deployment的pod,以及在所有節(jié)點(diǎn)上以ds的方式運(yùn)行一個(gè)lxcfs的pod
kubectl get pods -o wide|grep lxcfs
lxcfs-admission-webhook-deployment-6896958c4c-56k54 1/1 Running 0 80s 172.20.7.51 172.16.1.111 <none> <none>
lxcfs-67cgk 1/1 Running 0 94s 172.20.0.25 172.16.1.100 <none> <none>
lxcfs-c4lkx 1/1 Running 0 93s 172.20.1.25 172.16.1.101 <none> <none>
...
3.2 開(kāi)啟命名空間注入
# kubectl label namespace default lxcfs-admission-webhook=enabled
為指定的命名空間開(kāi)啟lxcfs注入,開(kāi)啟后該命名空間下所有新創(chuàng)建的Pod都將被注入lxcfs
3.3 還原
如果是要還原安裝的環(huán)境,執(zhí)行目錄中的卸載腳本即可
# ./deployment/uninstall.sh
mutatingwebhookconfiguration.admissionregistration.k8s.io "mutating-lxcfs-admission-webhook-cfg" deleted
service "lxcfs-admission-webhook-svc" deleted
deployment.apps "lxcfs-admission-webhook-deployment" deleted
secret "lxcfs-admission-webhook-certs" deleted
# kubectl delete -f deployment/lxcfs-daemonset.yaml
daemonset.apps "lxcfs" deleted
4、測(cè)試
克隆下來(lái)的代碼中提供了一個(gè)用于測(cè)試的httpd pod的yaml,可以直接部署
# kubectl apply -f deployment/web.yaml
deployment.apps/web created
# kubectl get pods -l app=web
NAME READY STATUS RESTARTS AGE
web-5ff5cd75f8-74pr6 1/1 Running 0 27s
web-5ff5cd75f8-bcm2x 1/1 Running 0 27s
進(jìn)入容器查看資源
kubectl exec -it web-5ff5cd75f8-74pr6 -- bash
root@web-5ff5cd75f8-74pr6:/usr/local/apache2# free -m
total used free shared buffers cached
Mem: 256 15 240 0 0 0
-/+ buffers/cache: 14 241
Swap: 0 0 0
root@web-5ff5cd75f8-74pr6:/usr/local/apache2# cat /proc/cpuinfo| grep "processor"| wc -l
1
實(shí)際上通過(guò)lxcfs+動(dòng)態(tài)準(zhǔn)入控制,在創(chuàng)建新的pod時(shí)自動(dòng)掛載了主機(jī)的相關(guān)文件,可以通過(guò)下面的方式查看
# kubectl describe pods web-5ff5cd75f8-74pr6
...
Mounts:
/proc/cpuinfo from lxcfs-proc-cpuinfo (rw)
/proc/diskstats from lxcfs-proc-diskstats (rw)
/proc/loadavg from lxcfs-proc-loadavg (rw)
/proc/meminfo from lxcfs-proc-meminfo (rw)
/proc/stat from lxcfs-proc-stat (rw)
/proc/swaps from lxcfs-proc-swaps (rw)
/proc/uptime from lxcfs-proc-uptime (rw)
/sys/devices/system/cpu/online from lxcfs-sys-devices-system-cpu-online (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-jtj98 (ro)
...
5、小結(jié)
容器中的pod已經(jīng)能正確的讀取到cpu及內(nèi)存的限制值了,如果是自身應(yīng)用要讀取所在環(huán)境的資源配置,如果出現(xiàn)問(wèn)題,一定要從底層弄清楚是如何獲取到的環(huán)境資源
通過(guò)上面的測(cè)試可以看到lxcfs也自動(dòng)掛載了nginx需要的/sys/devices/system/cpu/online文件到pod中了,因此nginx容器中worker process自動(dòng)設(shè)置的問(wèn)題經(jīng)過(guò)測(cè)試驗(yàn)證也已得到了解決
參考資料
docker 資源限制: https://docs.docker.com/config/containers/resource_constraints/
[2]Docker 背后的內(nèi)核知識(shí)-cgroups 資源限制: https://www.infoq.cn/article/docker-kernel-knowledge-cgroups-resource-isolation/
[3]nginx worker_processes: http://nginx.org/en/docs/ngx_core_module.html#worker_processes
lxcfs: https://github.com/lxc/lxcfs
lxcfs-admission-webhook: https://github.com/denverdino/lxcfs-admission-webhook
[5]動(dòng)態(tài)準(zhǔn)入控制 AdmissionWebhook: https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#admission-webhooks
