K8S Service實戰(zhàn)與原理初探


【作者】陳成,中國聯(lián)通軟件研究院容器云研發(fā)工程師,公共平臺與架構研發(fā)事業(yè)部云計算研發(fā)組長,長期從事大規(guī)?;A平臺建設相關工作,先后從事Mesos、KVM、K8S等研究,專注于容器云計算框架、集群調度、虛擬化等。
故事的開始,讓我們先從一件生產故障說起。5月29日,內部某系統(tǒng)出現(xiàn)大規(guī)模訪問Service故障,發(fā)現(xiàn)Pod容器內無法正常訪問ServiceIP:Port,整個故障持續(xù)時間超過12h,相關運維支撐人員沒有找到根本原因和解決辦法。
經(jīng)過復盤,我們發(fā)現(xiàn),大家對于K8S Service的原理不夠清晰,導致對問題的定位不能做得到快速準確,如果當時能夠按照如下的思路去思考問題,排查過程不至于花費如此久的時間。

下面,我們就來細說一下Service在Kubernetes中的作用、使用方法及原理。
Service是一種暴露一組Pod網(wǎng)絡的抽象方式,K8S Service提供了針對于一組Pod的負載均衡的暴露。通過這樣的方式,可以避免不同的pod之間訪問時需要知曉對應pod網(wǎng)絡信息的痛苦。例如:前端->后端,由于前端POD IP隨時變動,后端亦如此,如何處理前端POD和后端POD的通信,就需要Service這一抽象,來保證簡單可靠。
Service的使用
1、典型服務配置方法
當配置了selector之后,Service Controller會自動查找匹配這個selector的pod,并且創(chuàng)建出一個同名的endpoint對象,負責具體service之后連接。
apiVersion: v1kind: Servicemetadata:name: my-servicespec:selector:app: MyAppports:protocol: TCPport: 80targetPort: 9376
2、配置沒有selector的服務
沒有selector的service不會出現(xiàn)Endpoint的信息,需要手工創(chuàng)建Endpoint綁定,Endpoint可以是內部的pod,也可以是外部的服務。
apiVersion: v1kind: Servicemetadata:name: my-servicespec:ports:protocol: TCPport: 80targetPort: 9376---apiVersion: v1kind: Endpointsmetadata:name: my-servicesubsets:addresses:ip: 192.0.2.42ports:port: 9376
Service的類型
1.CluserIP
kubectl expose pod nginx --type=CluserIP --port=80 --name=ng-svcapiVersion: v1kind: Servicemetadata:name: ng-svcnamespace: defaultspec:selector:name: nginxclusterIP: 11.254.0.2ports:name: httpport: 80protocol: TCPtargetPort: 1234sessionAffinity: Nonetype: ClusterIP
2.LoadBalance
apiVersion: v1kind: Servicemetadata:name: my-servicespec:selector:app: MyAppports:protocol: TCPport: 80targetPort: 9376clusterIP: 10.0.171.239type: LoadBalancerstatus:loadBalancer:ingress:ip: 192.0.2.127
apiVersion: v1kind: Servicemetadata:name: my-servicespec:type: NodePortselector:app: MyAppports:port: 80targetPort: 80nodePort: 30007
apiVersion: v1kind: Servicemetadata:labels:run: curlname: my-headless-servicenamespace: defaultspec:clusterIP: Noneports:port: 80protocol: TCPtargetPort: 80selector:run: curltype: ClusterIP
# ping my-headless-servicePING my-headless-service (172.200.6.207): 56 data bytes64 bytes from 172.200.6.207: seq=0 ttl=64 time=0.040 ms64?bytes?from?172.200.6.207:?seq=1?ttl=64?time=0.063?ms
對沒有定義選擇算符的無頭服務,Endpoint 控制器不會創(chuàng)建 Endpoints 記錄。然而 DNS 系統(tǒng)會查找和配置,無論是:
對于 ExternalName 類型的服務,查找其 CNAME 記錄
對所有其他類型的服務,查找與 Service 名稱相同的任何 Endpoints 的記錄
Service的實現(xiàn)方式
1.用戶態(tài)代理訪問

即:當對于每個Service,Kube-Proxy會在本地Node上打開一個隨機選擇的端口,連接到代理端口的請求,都會被代理轉發(fā)給Pod。那么通過Iptables規(guī)則,捕獲到達Service:Port的請求都會被轉發(fā)到代理端口,代理端口重新轉為對Pod的訪問
這種方式的缺點是存在內核態(tài)轉為用戶態(tài),再有用戶態(tài)轉發(fā)的兩次轉換,性能較差,一般不再使用
2.Iptables模式

3.Ipvs模式

Service Iptables實現(xiàn)原理
Iptables表和鏈及處理過程

Service的Traffic流量將會通過prerouting和output重定向到kube-service鏈
-A?PREROUTING?-m?comment?--comment?"kubernetes?service?portals"?-j?KUBE-SERVICES-A?POSTROUTING?-m?comment?--comment?"kubernetes?postrouting?rules"?-j?KUBE-POSTROUTING-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
KUBE-SERVICES->KUBE-SVC-XXXXXXXXXXXXXXXX->KUBE-SEP-XXXXXXXXXXXXXXXX represents a ClusterIP service
KUBE-NODEPORTS->KUBE-SVC-XXXXXXXXXXXXXXXX->KUBE-SEP-XXXXXXXXXXXXXXXX represents a NodePort service
幾種不同類型的Service在Kube-Proxy啟用Iptables模式下上的表現(xiàn)
ClusterIP
-A KUBE-SERVICES ! -s 172.200.0.0/16 -d 10.100.160.92/32 -p tcp -m comment --comment "default/ccs-gateway-clusterip:http cluster IP" -m tcp --dport 30080 -j KUBE-MARK-MASQ-A KUBE-SERVICES -d 10.100.160.92/32 -p tcp -m comment --comment "default/ccs-gateway-clusterip:http cluster IP" -m tcp --dport 30080 -j KUBE-SVC-76GERFBRR2RGHNBJ-A KUBE-SVC-76GERFBRR2RGHNBJ -m comment --comment "default/ccs-gateway-clusterip:http" -m statistic --mode random --probability 0.33333333349 -j KUBE-SEP-GBVECAZBIC3ZKMXB-A KUBE-SVC-76GERFBRR2RGHNBJ -m comment --comment "default/ccs-gateway-clusterip:http" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-PVCYYXEU44D3IMGK-A KUBE-SVC-76GERFBRR2RGHNBJ -m comment --comment "default/ccs-gateway-clusterip:http" -j KUBE-SEP-JECGZLHE32MEARRX-A?KUBE-SVC-CEZPIJSAUFW5MYPQ?-m?comment?--comment?"kubernetes-dashboard/kubernetes-dashboard"?-j?KUBE-SEP-QO6MV4HR5U56RP7M??-A KUBE-SEP-GBVECAZBIC3ZKMXB -s 172.200.6.224/32 -m comment --comment "default/ccs-gateway-clusterip:http" -j KUBE-MARK-MASQ-A?KUBE-SEP-GBVECAZBIC3ZKMXB?-p?tcp?-m?comment?--comment?"default/ccs-gateway-clusterip:http"?-m?tcp?-j?DNAT?--to-destination?172.200.6.224:80...
NodePort
apiVersion: v1kind: Servicemetadata:labels:app: ccs-gatewayspec:clusterIP: 10.101.156.39externalTrafficPolicy: Clusterports:name: httpnodePort: 30081port: 30080protocol: TCPtargetPort: 80selector:app: ccs-gatewaysessionAffinity: Nonetype: NodePort
-A?KUBE-NODEPORTS?-p?tcp?-m?comment?--comment?"default/ccs-gateway-service:http"?-m?tcp?--dport?30081?-j?KUBE-MARK-MASQ-A?KUBE-NODEPORTS?-p?tcp?-m?comment?--comment?"default/ccs-gateway-service:http"?-m?tcp?--dport?30081?-j?KUBE-SVC-QYHRFFHL5VINYT2K############################-A?KUBE-SVC-QYHRFFHL5VINYT2K?-m?comment?--comment?"default/ccs-gateway-service:http"?-m?statistic?--mode?random?--probability?0.50000000000?-j?KUBE-SEP-2NPKETIWKKVUXGCL-A?KUBE-SVC-QYHRFFHL5VINYT2K?-m?comment?--comment?"default/ccs-gateway-service:http"?-j?KUBE-SEP-6O5FHQRN5IVNPW4Q##########################-A?KUBE-SEP-2NPKETIWKKVUXGCL?-s?172.200.6.224/32?-m?comment?--comment?"default/ccs-gateway-service:http"?-j?KUBE-MARK-MASQ-A?KUBE-SEP-2NPKETIWKKVUXGCL?-p?tcp?-m?comment?--comment?"default/ccs-gateway-service:http"?-m?tcp?-j?DNAT?--to-destination?172.200.6.224:80#########################-A?KUBE-SEP-6O5FHQRN5IVNPW4Q?-s?172.200.6.225/32?-m?comment?--comment?"default/ccs-gateway-service:http"?-j?KUBE-MARK-MASQ-A KUBE-SEP-6O5FHQRN5IVNPW4Q -p tcp -m comment --comment "default/ccs-gateway-service:http" -m tcp -j DNAT --to-destination 172.200.6.225:80
同時,可以看到Service所申請的端口38081被Kube-proxy所代理和監(jiān)聽
# netstat -ntlp | grep 30081tcp 0 00.0.0.0:30081 0.0.0.0:* LISTEN 3665705/kube-proxy
LoadBalancer
不帶有Endpoint的Service
kubectl create svc clusterip fake-endpoint --tcp=80-A KUBE-SERVICES -d 10.101.117.0/32 -p tcp -m comment --comment "default/fake-endpoint:80 has no endpoints" -m tcp --dport 80 -j REJECT --reject-with icmp-port-unreachable
帶有外部endpoint的Service
直接通過iptable規(guī)則轉發(fā)到對應的外部ep地址
apiVersion: v1kind: Servicemetadata:labels:app: externalname: externalnamespace: defaultspec:ports:name: httpprotocol: TCPport: 80sessionAffinity: Nonetype: ClusterIP---apiVersion: v1kind: Endpointsmetadata:labels:app: externalname: externalnamespace: defaultsubsets:addresses:ip: 10.124.142.43ports:name: httpport: 80protocol: TCP
-A KUBE-SERVICES ! -s 172.200.0.0/16 -d 10.111.246.87/32 -p tcp -m comment --comment "default/external:http cluster IP" -m tcp --dport 80 -j KUBE-MARK-MASQ-A KUBE-SERVICES -d 10.111.246.87/32 -p tcp -m comment --comment "default/external:http cluster IP" -m tcp --dport 80 -j KUBE-SVC-LI2K5327B6J24KJ3-A KUBE-SEP-QTGIPNOYXN2CZGD5 -s 10.124.142.43/32 -m comment --comment "default/external:http" -j KUBE-MARK-MASQ-A KUBE-SEP-QTGIPNOYXN2CZGD5 -p tcp -m comment --comment "default/external:http" -m tcp -j DNAT --to-destination 10.124.142.43:80
總結
ClusterIP類型,KubeProxy監(jiān)聽Service和Endpoint創(chuàng)建規(guī)則,采用DNAT將目標地址轉換為Pod的ip和端口,當有多個ep時,按照策略進行轉發(fā),默認RR模式時,iptables采用:比如有4個實例,四條規(guī)則的概率分別為0.25, 0.33, 0.5和 1,按照順序,一次匹配完成整個流量的分配。
NodePort類型,將會在上述ClusterIP模式之后,再加上Kube-Proxy的監(jiān)聽(為了確保其他服務不會占用該端口)和KUBE-NODEPORT的iptable規(guī)則
參考文獻
1、iptables https://en.wikipedia.org/wiki/Iptables
2、ipvs https://en.wikipedia.org/wiki/IP_Virtual_Server
3、K8S Service https://kubernetes.io/zh/docs/concepts/services-networking/service/
文章轉載:twt企業(yè)IT社區(qū)
(版權歸原作者所有,侵刪)

點擊下方“閱讀原文”查看更多
