Prometheus Thanos 多集群監(jiān)控
原文鏈接:https://particule.io/en/blog/thanos-monitoring/
介紹
https://github.com/particuleio/teks/tree/main/terragrunt/live/thanos
https://github.com/particuleio/terraform-kubernetes-addons/tree/main/modules/aws
Kubernetes普羅米修斯技術(shù)棧
Prometheus:收集度量標(biāo)準(zhǔn)
告警管理器:根據(jù)指標(biāo)查詢向各種提供者發(fā)送警報(bào)
Grafana:可視化豪華儀表板

Thanos,它來了

Thanos Store
Thanos Sidecar
Thanos Query
多集群架構(gòu)

一個(gè)觀察者集群[3]
一個(gè)被觀察集群[4]
.
├──???env_tags.yaml
├──???eu-west-1
│??├──???clusters
│??│??└──???observer
│??│?????├──???eks
│??│?????│??├──???kubeconfig
│??│?????│??└──???terragrunt.hcl
│??│?????├──???eks-addons
│??│?????│??└──???terragrunt.hcl
│??│?????└──???vpc
│??│????????└──???terragrunt.hcl
│??└──???region_values.yaml
└──???eu-west-3
???├──???clusters
???│??└──???observee
???│?????├──???cluster_values.yaml
???│?????├──???eks
???│?????│??├──???kubeconfig
???│?????│??└──???terragrunt.hcl
???│?????├──???eks-addons
???│?????│??└──???terragrunt.hcl
???│?????└──???vpc
???│????????└──???terragrunt.hcl
???└──???region_values.yaml
Grafana啟用
Thanos邊車上傳到特定的桶
kube-prometheus-stack?=?{
??enabled?????????????????????=?true
??allowed_cidrs???????????????=?dependency.vpc.outputs.private_subnets_cidr_blocks
??thanos_sidecar_enabled??????=?true
??thanos_bucket_force_destroy?=?true
??extra_values????????????????=?<<-EXTRA_VALUES
????grafana:
??????deploymentStrategy:
????????type:?Recreate
??????ingress:
????????enabled:?true
????????annotations:
??????????kubernetes.io/ingress.class:?nginx
??????????cert-manager.io/cluster-issuer:?"letsencrypt"
????????hosts:
??????????-?grafana.${local.default_domain_suffix}
????????tls:
??????????-?secretName:?grafana.${local.default_domain_suffix}
????????????hosts:
??????????????-?grafana.${local.default_domain_suffix}
??????persistence:
????????enabled:?true
????????storageClassName:?ebs-sc
????????accessModes:
??????????-?ReadWriteOnce
????????size:?1Gi
????prometheus:
??????prometheusSpec:
????????replicas:?1
????????retention:?2d
????????retentionSize:?"10GB"
????????ruleSelectorNilUsesHelmValues:?false
????????serviceMonitorSelectorNilUsesHelmValues:?false
????????podMonitorSelectorNilUsesHelmValues:?false
????????storageSpec:
??????????volumeClaimTemplate:
????????????spec:
??????????????storageClassName:?ebs-sc
??????????????accessModes:?["ReadWriteOnce"]
??????????????resources:
????????????????requests:
??????????????????storage:?10Gi
????EXTRA_VALUES
這個(gè)CA將被進(jìn)入sidecar的被觀察集群所信任
為Thanos querier組件生成TLS證書,這些組件將查詢被觀察集群
Thanos組件全部部署完成
查詢前端,作為Grafana的數(shù)據(jù)源端點(diǎn)
存儲網(wǎng)關(guān)用于查詢觀察者桶
Query將對存儲網(wǎng)關(guān)和其他查詢器執(zhí)行查詢
配置了TLS的Thanos查詢器對每個(gè)被觀察集群進(jìn)行查詢
thanos-tls-querier?=?{
??"observee"?=?{
????enabled?????????????????=?true
????default_global_requests?=?true
????default_global_limits???=?false
????stores?=?[
??????"thanos-sidecar.${local.default_domain_suffix}:443"
????]
??}
}
thanos-storegateway?=?{
??"observee"?=?{
????enabled?????????????????=?true
????default_global_requests?=?true
????default_global_limits???=?false
????bucket??????????????????=?"thanos-store-pio-thanos-observee"
????region??????????????????=?"eu-west-3"
??}
Thanos這邊就是上傳給觀察者特定的桶
Thanos邊車與TLS客戶端認(rèn)證的入口對象一起發(fā)布,并信任觀察者集群CA
kube-prometheus-stack?=?{
??enabled?????????????????????=?true
??allowed_cidrs???????????????=?dependency.vpc.outputs.private_subnets_cidr_blocks
??thanos_sidecar_enabled??????=?true
??thanos_bucket_force_destroy?=?true
??extra_values????????????????=?<<-EXTRA_VALUES
????grafana:
??????enabled:?false
????prometheus:
??????thanosIngress:
????????enabled:?true
????????ingressClassName:?nginx
????????annotations:
??????????cert-manager.io/cluster-issuer:?"letsencrypt"
??????????nginx.ingress.kubernetes.io/ssl-redirect:?"true"
??????????nginx.ingress.kubernetes.io/backend-protocol:?"GRPC"
??????????nginx.ingress.kubernetes.io/auth-tls-verify-client:?"on"
??????????nginx.ingress.kubernetes.io/auth-tls-secret:?"monitoring/thanos-ca"
????????hosts:
????????-?thanos-sidecar.${local.default_domain_suffix}
????????paths:
????????-?/
????????tls:
????????-?secretName:?thanos-sidecar.${local.default_domain_suffix}
??????????hosts:
??????????-?thanos-sidecar.${local.default_domain_suffix}
??????prometheusSpec:
????????replicas:?1
????????retention:?2d
????????retentionSize:?"6GB"
????????ruleSelectorNilUsesHelmValues:?false
????????serviceMonitorSelectorNilUsesHelmValues:?false
????????podMonitorSelectorNilUsesHelmValues:?false
????????storageSpec:
??????????volumeClaimTemplate:
????????????spec:
??????????????storageClassName:?ebs-sc
??????????????accessModes:?["ReadWriteOnce"]
??????????????resources:
????????????????requests:
??????????????????storage:?10Gi
????EXTRA_VALUES
Thanos壓縮器來管理這個(gè)特定集群的下采樣
thanos?=?{
??enabled?=?true
??bucket_force_destroy?=?true
??trusted_ca_content??????=?dependency.thanos-ca.outputs.thanos_ca
??extra_values?=?<<-EXTRA_VALUES
????compactor:
??????retentionResolution5m:?90d
????query:
??????enabled:?false
????queryFrontend:
??????enabled:?false
????storegateway:
??????enabled:?false
????EXTRA_VALUES
}
再深入一點(diǎn)
kubectl?-n?monitoring?get?pods
NAME????????????????????????????????????????????????????????READY???STATUS????RESTARTS???AGE
alertmanager-kube-prometheus-stack-alertmanager-0???????????2/2?????Running???0??????????120m
kube-prometheus-stack-grafana-c8768466b-rd8wm???????????????2/2?????Running???0??????????120m
kube-prometheus-stack-kube-state-metrics-5cf575d8f8-x59rd???1/1?????Running???0??????????120m
kube-prometheus-stack-operator-6856b9bb58-hdrb2?????????????1/1?????Running???0??????????119m
kube-prometheus-stack-prometheus-node-exporter-8hvmv????????1/1?????Running???0??????????117m
kube-prometheus-stack-prometheus-node-exporter-cwlfd????????1/1?????Running???0??????????120m
kube-prometheus-stack-prometheus-node-exporter-rsss5????????1/1?????Running???0??????????120m
kube-prometheus-stack-prometheus-node-exporter-rzgr9????????1/1?????Running???0??????????120m
prometheus-kube-prometheus-stack-prometheus-0???????????????3/3?????Running???1??????????120m
thanos-compactor-74784bd59d-vmvps???????????????????????????1/1?????Running???0??????????119m
thanos-query-7c74db546c-d7bp8???????????????????????????????1/1?????Running???0??????????12m
thanos-query-7c74db546c-ndnx2???????????????????????????????1/1?????Running???0??????????12m
thanos-query-frontend-5cbcb65b57-5sx8z??????????????????????1/1?????Running???0??????????119m
thanos-query-frontend-5cbcb65b57-qjhxg??????????????????????1/1?????Running???0??????????119m
thanos-storegateway-0???????????????????????????????????????1/1?????Running???0??????????119m
thanos-storegateway-1???????????????????????????????????????1/1?????Running???0??????????118m
thanos-storegateway-observee-storegateway-0?????????????????1/1?????Running???0??????????12m
thanos-storegateway-observee-storegateway-1?????????????????1/1?????Running???0??????????11m
thanos-tls-querier-observee-query-dfb9f79f9-4str8???????????1/1?????Running???0??????????29m
thanos-tls-querier-observee-query-dfb9f79f9-xsq24???????????1/1?????Running???0??????????29m
kubectl?-n?monitoring?get?ingress
NAME????????????????????????????CLASS????HOSTS????????????????????????????????????????????ADDRESS?????????????????????????????????????????????????????????????????????????PORTS?????AGE
kube-prometheus-stack-grafana??????grafana.thanos.teks-tg.clusterfrak-dynamics.io???k8s-ingressn-ingressn-afa0a48374-f507283b6cd101c5.elb.eu-west-1.amazonaws.com???80,?443???123m
kubectl?-n?monitoring?get?pods
NAME????????????????????????????????????????????????????????READY???STATUS????RESTARTS???AGE
alertmanager-kube-prometheus-stack-alertmanager-0???????????2/2?????Running???0??????????39m
kube-prometheus-stack-kube-state-metrics-5cf575d8f8-ct292???1/1?????Running???0??????????39m
kube-prometheus-stack-operator-6856b9bb58-4cngc?????????????1/1?????Running???0??????????39m
kube-prometheus-stack-prometheus-node-exporter-bs4wp????????1/1?????Running???0??????????39m
kube-prometheus-stack-prometheus-node-exporter-c57ss????????1/1?????Running???0??????????39m
kube-prometheus-stack-prometheus-node-exporter-cp5ch????????1/1?????Running???0??????????39m
kube-prometheus-stack-prometheus-node-exporter-tnqvq????????1/1?????Running???0??????????39m
kube-prometheus-stack-prometheus-node-exporter-z2p49????????1/1?????Running???0??????????39m
kube-prometheus-stack-prometheus-node-exporter-zzqp7????????1/1?????Running???0??????????39m
prometheus-kube-prometheus-stack-prometheus-0???????????????3/3?????Running???1??????????39m
thanos-compactor-7576dcbcfc-6pd4v???????????????????????????1/1?????Running???0??????????38m
kubectl?-n?monitoring?get?ingress
NAME???????????????????????????????????CLASS???HOSTS???????????????????????????????????????????????????ADDRESS?????????????????????????????????????????????????????????????????????????PORTS?????AGE
kube-prometheus-stack-thanos-gateway???nginx???thanos-sidecar.thanos.teks-tg.clusterfrak-dynamics.io???k8s-ingressn-ingressn-95903f6102-d2ce9013ac068b9e.elb.eu-west-3.amazonaws.com???80,?443???40m
k?-n?monitoring?logs?-f?thanos-tls-querier-observee-query-687dd88ff5-nzpdh
level=info?ts=2021-02-23T15:37:35.692346206Z?caller=storeset.go:387?component=storeset?msg="adding?new?storeAPI?to?query?storeset"?address=thanos-sidecar.thanos.teks-tg.clusterfrak-dynamics.io:443?extLset="{cluster=\"pio-thanos-observee\",?prometheus=\"monitoring/kube-prometheus-stack-prometheus\",?prometheus_replica=\"prometheus-kube-prometheus-stack-prometheus-0\"}"
kubectl?-n?monitoring?port-forward?thanos-tls-querier-observee-query-687dd88ff5-nzpdh?10902

kubectl?-n?monitoring?port-forward?thanos-query-7c74db546c-d7bp8?10902

觀察者把本地Thanos聚集
我們的存儲網(wǎng)關(guān)(一個(gè)用于遠(yuǎn)程觀測者集群,一個(gè)用于本地觀測者集群)
本地TLS查詢器,它可以查詢被觀察的sidecar
在Grafana可視化

總結(jié)
- END -
?推薦閱讀? 31天拿下Kubernetes含金量最高的CKA+CKS證書! ?
三只兔子的故事理解 Kubernetes 污點(diǎn)和容忍,真的很簡單! 比 netstat 好用?Linux 網(wǎng)絡(luò)狀態(tài)工具 ss 詳解 Linux Shell 腳本編程最佳實(shí)踐 我的云服務(wù)器被植入挖礦木馬,CPU飆升200% 做了這么多年運(yùn)維工作,現(xiàn)在才看清職業(yè)方向 一篇文章講清楚云原生圖景及發(fā)展路線 K8s kubectl 常用命令總結(jié)(建議收藏) 一名運(yùn)維小哥對運(yùn)維規(guī)則的10個(gè)總結(jié) K8s運(yùn)維錦囊,19個(gè)常見故障解決方法 Linux 系統(tǒng)日常巡檢腳本 終于明白了 DevOps 與 SRE 的區(qū)別! 編寫 Dockerfile 最佳實(shí)踐 搭建一套完整的企業(yè)級 K8s 集群(kubeadm方式)
點(diǎn)亮,服務(wù)器三年不宕機(jī)


