Prometheus 中使用 PrometheusAlert 進(jìn)行聚合報(bào)警

本身prometheus已經(jīng)有了alertmanager這個(gè)組件提供了一些報(bào)警媒介比如email wechat等,為什么我還還要使用prometheusalert呢?
Prometheus Alert是開(kāi)源的運(yùn)維告警中心消息轉(zhuǎn)發(fā)系統(tǒng),支持主流的監(jiān)控系統(tǒng)Prometheus,Zabbix,日志系統(tǒng)Graylog和數(shù)據(jù)可視化系統(tǒng)Grafana發(fā)出的預(yù)警消息,支持釘釘,微信,華為云短信,騰訊云短信,騰訊云電話,阿里云短信,阿里云電話等,可以看出prometheusalert相比和alertmanager內(nèi)置的報(bào)警媒介相比支持的更全面,并且配置和調(diào)試更方便,prometheusalert 并不能取代alertmanager,而是要作為webhoook與alertmanager結(jié)合使用。
安裝
#Kubernetes中運(yùn)行可以直接執(zhí)行以下命令行即可(注意默認(rèn)的部署模版中未掛載模版數(shù)據(jù)庫(kù)文件 db/PrometheusAlertDB.db,為防止模版數(shù)據(jù)丟失,請(qǐng)自行增加掛載配置 )
wget https://raw.githubusercontent.com/feiyu563/PrometheusAlert/master/example/kubernetes/PrometheusAlert-Deployment.yaml
注意:這里需要修改一下文件中volumemount中使用的configmap名稱,參見(jiàn)https://github.com/feiyu563/PrometheusAlert/issues(2021-01-26)
修改后文件如下
# apiVersion: v1
# kind: Namespace
# metadata:
# name: monitoring
---
apiVersion: v1
data:
app.conf: |
#---------------------↓全局配置-----------------------
appname = PrometheusAlert
#監(jiān)聽(tīng)端口
httpport = 8080
runmode = dev
#設(shè)置代理 proxy = http://123.123.123.123:8080
proxy =
#開(kāi)啟JSON請(qǐng)求
copyrequestbody = true
#告警消息標(biāo)題
title=PrometheusAlert
#鏈接到告警平臺(tái)地址
GraylogAlerturl=http://graylog.org
#釘釘告警 告警logo圖標(biāo)地址
logourl=https://raw.githubusercontent.com/feiyu563/PrometheusAlert/master/doc/alert-center.png
#釘釘告警 恢復(fù)logo圖標(biāo)地址
rlogourl=https://raw.githubusercontent.com/feiyu563/PrometheusAlert/master/doc/alert-center.png
#短信告警級(jí)別(等于3就進(jìn)行短信告警) 告警級(jí)別定義 0 信息,1 警告,2 一般嚴(yán)重,3 嚴(yán)重,4 災(zāi)難
messagelevel=3
#電話告警級(jí)別(等于4就進(jìn)行語(yǔ)音告警) 告警級(jí)別定義 0 信息,1 警告,2 一般嚴(yán)重,3 嚴(yán)重,4 災(zāi)難
phonecalllevel=4
#默認(rèn)撥打號(hào)碼(頁(yè)面測(cè)試短信和電話功能需要配置此項(xiàng))
defaultphone=xxxxxxxx
#故障恢復(fù)是否啟用電話通知0為關(guān)閉,1為開(kāi)啟
phonecallresolved=0
#自動(dòng)告警抑制(自動(dòng)告警抑制是默認(rèn)同一個(gè)告警源的告警信息只發(fā)送告警級(jí)別最高的第一條告警信息,其他消息默認(rèn)屏蔽,這么做的目的是為了減少相同告警來(lái)源的消息數(shù)量,防止告警炸彈,0為關(guān)閉,1為開(kāi)啟)
silent=0
#是否前臺(tái)輸出file or console
logtype=file
#日志文件路徑
logpath=logs/prometheusalertcenter.log
#轉(zhuǎn)換Prometheus,graylog告警消息的時(shí)區(qū)為CST時(shí)區(qū)(如默認(rèn)已經(jīng)是CST時(shí)區(qū),請(qǐng)勿開(kāi)啟)
prometheus_cst_time=0
#數(shù)據(jù)庫(kù)驅(qū)動(dòng),支持sqlite3,mysql,如使用mysql,請(qǐng)開(kāi)啟db_host,db_user,db_password,db_name的注釋
db_driver=sqlite3
#db_host=127.0.0.1:3306
#db_user=root
#db_password=root
#db_name=prometheusalert
#---------------------↓webhook-----------------------
#是否開(kāi)啟釘釘告警通道,可同時(shí)開(kāi)始多個(gè)通道0為關(guān)閉,1為開(kāi)啟
open-dingding=1
#默認(rèn)釘釘機(jī)器人地址
ddurl=https://oapi.dingtalk.com/robot/send?access_token=xxxxx
#是否開(kāi)啟 @所有人(0為關(guān)閉,1為開(kāi)啟)
dd_isatall=1
#是否開(kāi)啟微信告警通道,可同時(shí)開(kāi)始多個(gè)通道0為關(guān)閉,1為開(kāi)啟
open-weixin=0
#默認(rèn)企業(yè)微信機(jī)器人地址
wxurl=https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxx
#是否開(kāi)啟飛書告警通道,可同時(shí)開(kāi)始多個(gè)通道0為關(guān)閉,1為開(kāi)啟
open-feishu=0
#默認(rèn)飛書機(jī)器人地址
fsurl=https://open.feishu.cn/open-apis/bot/hook/xxxxxxxxx
#---------------------↓騰訊云接口-----------------------
#是否開(kāi)啟騰訊云短信告警通道,可同時(shí)開(kāi)始多個(gè)通道0為關(guān)閉,1為開(kāi)啟
open-txdx=0
#騰訊云短信接口key
TXY_DX_appkey=xxxxx
#騰訊云短信模版ID 騰訊云短信模版配置可參考 prometheus告警:{1}
TXY_DX_tpl_id=xxxxx
#騰訊云短信sdk app id
TXY_DX_sdkappid=xxxxx
#騰訊云短信簽名 根據(jù)自己審核通過(guò)的簽名來(lái)填寫
TXY_DX_sign=騰訊云
#是否開(kāi)啟騰訊云電話告警通道,可同時(shí)開(kāi)始多個(gè)通道0為關(guān)閉,1為開(kāi)啟
open-txdh=0
#騰訊云電話接口key
TXY_DH_phonecallappkey=xxxxx
#騰訊云電話模版ID
TXY_DH_phonecalltpl_id=xxxxx
#騰訊云電話sdk app id
TXY_DH_phonecallsdkappid=xxxxx
#---------------------↓華為云接口-----------------------
#是否開(kāi)啟華為云短信告警通道,可同時(shí)開(kāi)始多個(gè)通道0為關(guān)閉,1為開(kāi)啟
open-hwdx=0
#華為云短信接口key
HWY_DX_APP_Key=xxxxxxxxxxxxxxxxxxxxxx
#華為云短信接口Secret
HWY_DX_APP_Secret=xxxxxxxxxxxxxxxxxxxxxx
#華為云APP接入地址(端口接口地址)
HWY_DX_APP_Url=https://rtcsms.cn-north-1.myhuaweicloud.com:10743
#華為云短信模板ID
HWY_DX_Templateid=xxxxxxxxxxxxxxxxxxxxxx
#華為云簽名名稱,必須是已審核通過(guò)的,與模板類型一致的簽名名稱,按照自己的實(shí)際簽名填寫
HWY_DX_Signature=華為云
#華為云簽名通道號(hào)
HWY_DX_Sender=xxxxxxxxxx
#---------------------↓阿里云接口-----------------------
#是否開(kāi)啟阿里云短信告警通道,可同時(shí)開(kāi)始多個(gè)通道0為關(guān)閉,1為開(kāi)啟
open-alydx=0
#阿里云短信主賬號(hào)AccessKey的ID
ALY_DX_AccessKeyId=xxxxxxxxxxxxxxxxxxxxxx
#阿里云短信接口密鑰
ALY_DX_AccessSecret=xxxxxxxxxxxxxxxxxxxxxx
#阿里云短信簽名名稱
ALY_DX_SignName=阿里云
#阿里云短信模板ID
ALY_DX_Template=xxxxxxxxxxxxxxxxxxxxxx
#是否開(kāi)啟阿里云電話告警通道,可同時(shí)開(kāi)始多個(gè)通道0為關(guān)閉,1為開(kāi)啟
open-alydh=0
#阿里云電話主賬號(hào)AccessKey的ID
ALY_DH_AccessKeyId=xxxxxxxxxxxxxxxxxxxxxx
#阿里云電話接口密鑰
ALY_DH_AccessSecret=xxxxxxxxxxxxxxxxxxxxxx
#阿里云電話被叫顯號(hào),必須是已購(gòu)買的號(hào)碼
ALY_DX_CalledShowNumber=xxxxxxxxx
#阿里云電話文本轉(zhuǎn)語(yǔ)音(TTS)模板ID
ALY_DH_TtsCode=xxxxxxxx
#---------------------↓容聯(lián)云接口-----------------------
#是否開(kāi)啟容聯(lián)云電話告警通道,可同時(shí)開(kāi)始多個(gè)通道0為關(guān)閉,1為開(kāi)啟
RLY_DH_open-rlydh=0
#容聯(lián)云基礎(chǔ)接口地址
RLY_URL=https://app.cloopen.com:8883/2013-12-26/Accounts/
#容聯(lián)云后臺(tái)SID
RLY_ACCOUNT_SID=xxxxxxxxxxx
#容聯(lián)云api-token
RLY_ACCOUNT_TOKEN=xxxxxxxxxx
#容聯(lián)云app_id
RLY_APP_ID=xxxxxxxxxxxxx
#---------------------↓郵件配置-----------------------
#是否開(kāi)啟郵件
open-email=0
#郵件發(fā)件服務(wù)器地址
Email_host=smtp.qq.com
#郵件發(fā)件服務(wù)器端口
Email_port=465
#郵件帳號(hào)
[email protected]
#郵件密碼
Email_password=xxxxxx
#郵件標(biāo)題
Email_title=運(yùn)維告警
#默認(rèn)發(fā)送郵箱
[email protected],[email protected]
#---------------------↓七陌云接口-----------------------
#是否開(kāi)啟七陌短信告警通道,可同時(shí)開(kāi)始多個(gè)通道0為關(guān)閉,1為開(kāi)啟
open-7moordx=0
#七陌賬戶ID
7MOOR_ACCOUNT_ID=Nxxx
#七陌賬戶APISecret
7MOOR_ACCOUNT_APISECRET=xxx
#七陌賬戶短信模板編號(hào)
7MOOR_DX_TEMPLATENUM=n
#注意:七陌短信變量這里只用一個(gè)var1,在代碼里寫死了。
#-----------
#是否開(kāi)啟七陌webcall語(yǔ)音通知告警通道,可同時(shí)開(kāi)始多個(gè)通道0為關(guān)閉,1為開(kāi)啟
open-7moordh=0
#請(qǐng)?jiān)谄吣捌脚_(tái)添加虛擬服務(wù)號(hào)、文本節(jié)點(diǎn)
#七陌賬戶webcall的虛擬服務(wù)號(hào)
7MOOR_WEBCALL_SERVICENO=xxx
# 文本節(jié)點(diǎn)里被替換的變量,我配置的是text。如果被替換的變量不是text,請(qǐng)修改此配置
7MOOR_WEBCALL_VOICE_VAR=text
#---------------------↓telegram接口-----------------------
#是否開(kāi)啟telegram告警通道,可同時(shí)開(kāi)始多個(gè)通道0為關(guān)閉,1為開(kāi)啟
open-tg=0
#tg機(jī)器人token
TG_TOKEN=xxxxx
#tg消息模式 個(gè)人消息或者頻道消息 0為關(guān)閉(推送給個(gè)人),1為開(kāi)啟(推送給頻道)
TG_MODE_CHAN=0
#tg用戶ID
TG_USERID=xxxxx
#tg頻道name
TG_CHANNAME=xxxxx
#tg api地址, 可以配置為代理地址
#TG_API_PROXY="https://api.telegram.org/bot%s/%s"
#---------------------↓workwechat接口-----------------------
#是否開(kāi)啟workwechat告警通道,可同時(shí)開(kāi)始多個(gè)通道0為關(guān)閉,1為開(kāi)啟
open-workwechat=0
# 企業(yè)ID
WorkWechat_CropID=xxxxx
# 應(yīng)用ID
WorkWechat_AgentID=xxxx
# 應(yīng)用secret
WorkWechat_AgentSecret=xxxx
# 接受用戶
WorkWechat_ToUser="zhangsan|lisi"
# 接受部門
WorkWechat_ToParty="ops|dev"
# 接受標(biāo)簽
WorkWechat_ToTag=""
# 消息類型, 暫時(shí)只支持markdown
# WorkWechat_Msgtype = "markdown"
user.csv: |
2019年4月10日,15888888881,小張,15999999999,備用聯(lián)系人小陳,15999999998,備用聯(lián)系人小趙
2019年4月11日,15888888882,小李,15999999999,備用聯(lián)系人小陳,15999999998,備用聯(lián)系人小趙
2019年4月12日,15888888883,小王,15999999999,備用聯(lián)系人小陳,15999999998,備用聯(lián)系人小趙
2019年4月13日,15888888884,小宋,15999999999,備用聯(lián)系人小陳,15999999998,備用聯(lián)系人小趙
kind: ConfigMap
metadata:
name: prometheus-alert-center-conf
namespace: monitoring
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: prometheus-alert-center
alertname: prometheus-alert-center
name: prometheus-alert-center
# namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: prometheus-alert-center
alertname: prometheus-alert-center
template:
metadata:
labels:
app: prometheus-alert-center
alertname: prometheus-alert-center
spec:
containers:
- image: feiyu563/prometheus-alert
name: prometheus-alert-center
env:
- name: TZ
value: "Asia/Shanghai"
ports:
- containerPort: 8080
name: http
resources:
limits:
cpu: 200m
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
volumeMounts:
- name: prometheus-alert-center-conf-map
mountPath: /app/conf/app.conf
subPath: app.conf
- name: prometheus-alert-center-conf-map
mountPath: /app/user.csv
subPath: user.csv
volumes:
- name: prometheus-alert-center-conf-map
configMap:
name: prometheus-alert-center-conf
items:
- key: app.conf
path: app.conf
- key: user.csv
path: user.csv
---
apiVersion: v1
kind: Service
metadata:
labels:
alertname: prometheus-alert-center
name: prometheus-alert-center
# namespace: monitoring
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '8080'
spec:
ports:
- name: http
port: 8080
targetPort: http
selector:
app: prometheus-alert-center
---
# apiVersion: networking.k8s.io/v1beta1
# kind: Ingress
# metadata:
# annotations:
# kubernetes.io/ingress.class: nginx
# name: prometheus-alert-center
# namespace: monitoring
# spec:
# rules:
# - host: alert-center.local
# http:
# paths:
# - backend:
# serviceName: prometheus-alert-center
# servicePort: 8080
# path: /
文件中的注釋很詳細(xì),這里需要額外說(shuō)一下PrometheusAlert 同時(shí)支持按照日期發(fā)送告警到不同號(hào)碼,并且已經(jīng)加入告警失敗或者被告警人未接聽(tīng)電話后轉(zhuǎn)聯(lián)系默認(rèn)聯(lián)系人defaultphone 只需新建user.csv文件,并將文件放到程序運(yùn)行目錄下即可自動(dòng)加載 同時(shí)該文件也是電話回調(diào)接口所必需文件(如回調(diào)接口返回非0狀態(tài),則會(huì)在此文件中尋找下一號(hào)碼進(jìn)行撥打,如開(kāi)啟回調(diào)功能,請(qǐng)務(wù)必創(chuàng)建該文件) ps:目前grafana/graylog的電話和短信告警依賴于該文件,prometheus電話和短信告警優(yōu)先從rules的mobile字段讀取,如未配置號(hào)碼,則會(huì)從user.csv中讀取,如user.csv中也未配置,則會(huì)直接發(fā)送到defaultphone
關(guān)于user.csv中值班時(shí)間切換問(wèn)題,默認(rèn)每日上午10點(diǎn)進(jìn)行切換,
2019年4月10日,15888888881,小張,15999999999,備用聯(lián)系人小陳,15999999998,備用聯(lián)系人小趙
2019年4月11日,15888888882,小李,15999999999,備用聯(lián)系人小陳,15999999998,備用聯(lián)系人小趙
2019年4月12日,15888888883,小王,15999999999,備用聯(lián)系人小陳,15999999998,備用聯(lián)系人小趙
2019年4月13日,15888888884,小宋,15999999999,備用聯(lián)系人小陳,15999999998,備用聯(lián)系人小趙
我們先需要修改一下configmap中的報(bào)警媒介地址,然后執(zhí)行kubectl apply -f deployment.yaml
prometheus配置
Prometheus支持兩種配置,任選其一或者兩者搭配均可。
通過(guò)Prometheus Rules方式
通過(guò)這種方式會(huì)使用報(bào)警規(guī)則中定義的一些報(bào)警媒介的信息。
首先需要在Alertmanager配置Webhook,可參考如下模板:
global:
resolve_timeout: 5m
route:
group_by: ['instance']
group_wait: 10m
group_interval: 10s
repeat_interval: 10m
receiver: 'web.hook.prometheusalert'
receivers:
- name: 'web.hook.prometheusalert'
webhook_configs:
- url: 'http://prometheus-alert-center:8080/prometheus/alert'
send_resolved: true
Prometheus Server 的告警rules配置,可參考如下模板:
groups:
- name: node_alert
rules:
- alert: 主機(jī)CPU告警
expr: node_load1 > 1
labels:
name: prometheusalertcenter
level: 3 #告警級(jí)別,告警級(jí)別定義 0 信息,1 警告,2 一般嚴(yán)重,3 嚴(yán)重,4 災(zāi)難
annotations:
description: "{{ $labels.instance }} CPU load占用過(guò)高" #告警信息
mobile: 15888888881,15888888882,15888888883 #告警發(fā)送目標(biāo)手機(jī)號(hào)(需要設(shè)置電話和短信告警級(jí)別)
ddurl: "https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" #支持添加多個(gè)釘釘機(jī)器人告警,用,號(hào)分割即可,如果留空或者未填寫,則默認(rèn)發(fā)送到配置文件中填寫的釘釘器人地址
fsurl: "https://open.feishu.cn/open-apis/bot/hook/xxxxxxxxx,https://open.feishu.cn/open-apis/bot/hook/xxxxxxxxx" #支持添加多個(gè)飛書機(jī)器人告警,用,號(hào)分割即可,如果留空或者未填寫,則默認(rèn)發(fā)送到配置文件中填寫的飛書器人地址
wxurl: "https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxxxxx-xxxxxx-xxxxxx-xxxxxx,https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxx-xxxx-xxxxxxx-xxxxx" #支持添加多個(gè)企業(yè)微信機(jī)器人告警,用,號(hào)分割即可,如果留空或者未填寫,則默認(rèn)發(fā)送到配置文件中填寫的企業(yè)微信機(jī)器人地址
通過(guò)Prometheus AlertManager router方式
針對(duì) /prometheus/router AlertManager router指定接收端接口,該接口可在url中直接指定告警的接收端,目前支持三個(gè)參數(shù),分別是:wxurl,ddurl,phone(phone用于短信和電話告警)
在 Prometheus Alertmanager 中配置Webhook,可參考如下模板:
global:
resolve_timeout: 5m
route:
group_by: ['instance']
group_wait: 10m
group_interval: 10s
repeat_interval: 10m
receiver: 'web.hook.prometheusalert'
routes:
- receiver: 'prometheusalert-weixin'
group_wait: 10s
match:
level: '1'
- receiver: 'prometheusalert-dingding'
group_wait: 10s
match:
level: '2'
- receiver: 'prometheusalert-feishu'
group_wait: 10s
match:
level: '3'
- receiver: 'prometheusalert-all'
group_wait: 10s
match:
level: '4'
receivers:
- name: 'web.hook.prometheusalert'
webhook_configs:
- url: 'http://[prometheusalert_url]:8080/prometheus/alert'
- name: 'prometheusalert-weixin'
webhook_configs:
- url: 'http://[prometheusalert_url]:8080/prometheus/router?wxurl=https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxx'
- name: 'prometheusalert-dingding'
webhook_configs:
- url: 'http://[prometheusalert_url]:8080/prometheus/router?ddurl=https://oapi.dingtalk.com/robot/send?access_token=xxxxx'
- name: 'prometheusalert-feishu'
webhook_configs:
- url: 'http://[prometheusalert_url]:8080/prometheus/router?fsurl=https://open.feishu.cn/open-apis/bot/hook/xxxxxxxxx'
- name: 'prometheusalert-all'
webhook_configs:
- url: 'http://[prometheusalert_url]:8080/prometheus/router?wxurl=https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxx&ddurl=https://oapi.dingtalk.com/robot/send?access_token=xxxxx&phone=15395105573'
最終告警效果:

使用dashboard進(jìn)行調(diào)試
depoyment.yaml文件中默認(rèn)注釋了ingress相關(guān)資源,我們需要調(diào)試可以取消注釋使用。

我們可以點(diǎn)擊上圖的test切換到測(cè)試頁(yè)面,然后點(diǎn)擊告警測(cè)試即可測(cè)試我們配置的報(bào)警媒介是否能正常發(fā)送告警消息。


我們可以收到如下的告警測(cè)試消息

原文鏈接:https://www.lishuai.fun/2021/01/27/prometheusalert/
K8S 進(jìn)階訓(xùn)練營(yíng)
點(diǎn)擊屏末 | 閱讀原文 | 即刻學(xué)習(xí)

掃描二維碼獲取
更多云原生知識(shí)
k8s 技術(shù)圈

