ceph-exporter没有带”up”之类的自身状态指标, 当ceph-exporter挂掉后, 是没办法第一时间知道的。如果此时ceph集群出现异常就不能及时收到告警,因此通过脚本定时去请求ceph的api获取当前集群状态
- 可以通过python脚本调取ceph dashboard api定期获取健康状态,结合钉钉机器人进行通知
脚本
# %%
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import requests
import json
auth_path = "/api/auth"
logout_path = "/api/auth/logout"
health_path = "/api/health/full"
ding_webhook = "https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxxxxxxxx"
# %%
def dingtalk(message):
header = {"Content-Type": "application/json"}
data = {
"msgtype": "markdown",
"markdown": {
"title": "Ceph集群异常",
"text": message
},
"at": {
"isAtAll": False
}
}
res = requests.post(ding_webhook, headers=header, data=json.dumps(data))
return(res.json())
# %%
class Ceph(object):
def __init__(self, host, username, password):
self.host = host
self.username = username
self.password = password
header = {
"Content-Type": "application/json",
"Accept": "application/vnd.ceph.api.v1.0+json"
}
data = {
"username": self.username,
"password": self.password
}
r = requests.post(self.host+auth_path, headers=header, data=json.dumps(data))
token = r.json().get("token")
authorization = f"Bearer {token}"
self.op_header = {
"Authorization": authorization
}
def healthCheck(self):
r = requests.get(self.host+health_path, headers=self.op_header)
return r.json().get("health")["status"]
def close(self):
r = requests.post(self.host+logout_path, headers=self.op_header)
return r.json()
# %%
host = "http://dashboard.ohops.com"
username = "ohmyuser"
password = "xxxxxx"
ceph = Ceph(host, username, password)
status = ceph.healthCheck()
ceph.close()
print(status)
if status != "HEALTH_OK":
message = f"### Ceph集群异常 \n\n > 当前状态: **<font color='#FF0000'>{status}</font>**\n\n[查看详情](http://grafana.ohops.com/d/xxxxxx)"
send_msg = dingtalk(message)
效果
转载请注明来源, 欢迎对文章中的引用来源进行考证, 欢迎指出任何有错误或不够清晰的表达, 可以邮件至 chinaops666@gmail.com