1、下载blackbox-exporter ,官网下载比较慢,可以参照下面的方式来下载
我的文件存储路径为
/opt/module/blackbox-exporter/
[root@ambari-hadoop2 blackbox-exporter]# wget https://git.xfj0.cn/https://github.com/prometheus/blackbox_exporter/releases/download/v0.24.0/blackbox_exporter-0.24.0.linux-amd64.tar.gz --no-check-certificate
--2023-12-25 22:43:23-- https://git.xfj0.cn/https://github.com/prometheus/blackbox_exporter/releases/download/v0.24.0/blackbox_exporter-0.24.0.linux-amd64.tar.gz
正在解析主机 git.xfj0.cn (git.xfj0.cn)... 172.67.180.13, 104.21.96.120, 2606:4700:3037::ac43:b40d, ...
正在连接 git.xfj0.cn (git.xfj0.cn)|172.67.180.13|:443... 已连接。
警告: 无法验证 git.xfj0.cn 的由 “/C=US/O=Let's Encrypt/CN=E1” 颁发的证书:
颁发的证书已经过期。
已发出 HTTP 请求,正在等待回应... 200 OK
长度:10956196 (10M) [application/octet-stream]
正在保存至: “blackbox_exporter-0.24.0.linux-amd64.tar.gz”
100%[==============================================================================================================>] 10,956,196 2.03MB/s 用时 5.1s
2023-12-25 22:43:29 (2.03 MB/s) - 已保存 “blackbox_exporter-0.24.0.linux-amd64.tar.gz” [10956196/10956196])
查看
[root@ambari-hadoop2 blackbox-exporter]# ll
总用量 10700
-rw-r--r-- 1 root root 10956196 5月 16 2023 blackbox_exporter-0.24.0.linux-amd64.tar.gz
解压
[root@ambari-hadoop2 blackbox-exporter]# tar -zxvf blackbox_exporter-0.24.0.linux-amd64.tar.gz
blackbox_exporter-0.24.0.linux-amd64/
blackbox_exporter-0.24.0.linux-amd64/blackbox.yml
blackbox_exporter-0.24.0.linux-amd64/NOTICE
blackbox_exporter-0.24.0.linux-amd64/blackbox_exporter
blackbox_exporter-0.24.0.linux-amd64/LICENSE
查看
[root@ambari-hadoop2 blackbox-exporter]# ll
总用量 10700
drwxr-xr-x 2 1001 1002 80 5月 16 2023 blackbox_exporter-0.24.0.linux-amd64
-rw-r--r-- 1 root root 10956196 5月 16 2023 blackbox_exporter-0.24.0.linux-amd64.tar.gz
删除安装包并修改文件名
[root@ambari-hadoop2 blackbox-exporter]# rm -rf blackbox_exporter-0.24.0.linux-amd64.tar.gz
[root@ambari-hadoop2 blackbox-exporter]# mv blackbox_exporter-0.24.0.linux-amd64/ blackbox_exporter-0.24
切换路径到
[root@ambari-hadoop2 blackbox_exporter-0.24]# cd /usr/lib/systemd/system/
创建系统启动服务
[root@ambari-hadoop2 system]# vim blackbox-exporter.service
[Unit]
Description=Blackbox Exporter
Wants=network-online.target
After=network-online.target
[Service]
ExecStart=/opt/module/blackbox-exporter/blackbox_exporter-0.24/blackbox_exporter --config.file=/opt/module/blackbox-exporter/blackbox_exporter-0.24/blackbox.yml
Restart=on-failure
[Install]
WantedBy=multi-user.target
其中/opt/module/blackbox-exporter/blackbox_exporter-0.24为本机的blackbox_exporter安装路径,各位看官老爷可以根据自己的实际路径作修改
[root@ambari-hadoop2 system]# systemctl enable blackbox-exporter
Created symlink from /etc/systemd/system/multi-user.target.wants/blackbox-exporter.service to /usr/lib/systemd/system/blackbox-exporter.service.
[root@ambari-hadoop2 system]# systemctl daemon-reload
[root@ambari-hadoop2 system]# systemctl start blackbox-exporter
查看状态
[root@ambari-hadoop2 system]# systemctl status blackbox-exporter
● blackbox-exporter.service - Blackbox Exporter
Loaded: loaded (/usr/lib/systemd/system/blackbox-exporter.service; enabled; vendor preset: disabled)
Active: active (running) since 一 2023-12-25 23:24:28 CST; 15s ago
Main PID: 6830 (blackbox_export)
Tasks: 7
Memory: 1.9M
CGroup: /system.slice/blackbox-exporter.service
└─6830 /opt/module/blackbox-exporter/blackbox_exporter-0.24/blackbox_exporter --config.file=/opt/module/blackbox-exporter/blackbox_exporter-0.24/blackbox.yml
12月 25 23:24:28 ambari-hadoop2 systemd[1]: Started Blackbox Exporter.
12月 25 23:24:28 ambari-hadoop2 blackbox_exporter[6830]: ts=2023-12-25T15:24:28.744Z caller=main.go:78 level=info msg="Starting blackbox_exporter" version="(version=0.24.0,...56011dff)"
12月 25 23:24:28 ambari-hadoop2 blackbox_exporter[6830]: ts=2023-12-25T15:24:28.744Z caller=main.go:79 level=info build_context="(go=go1.20.4, platform=linux/amd64, user=ro...gs=netgo)"
12月 25 23:24:28 ambari-hadoop2 blackbox_exporter[6830]: ts=2023-12-25T15:24:28.745Z caller=main.go:91 level=info msg="Loaded config file"
12月 25 23:24:28 ambari-hadoop2 blackbox_exporter[6830]: ts=2023-12-25T15:24:28.745Z caller=tls_config.go:274 level=info msg="Listening on" address=[::]:9115
12月 25 23:24:28 ambari-hadoop2 blackbox_exporter[6830]: ts=2023-12-25T15:24:28.745Z caller=tls_config.go:277 level=info msg="TLS is disabled." http2=false address=[::]:9115
Hint: Some lines were ellipsized, use -l to show in full.
访问IP+端口号(9115) 查看
在blackbox-exporter的安装节点编辑配置文件blackbox.yml
modules:
http_2xx:
prober: http
timeout: 5s
http:
valid_http_versions:
- "HTTP/1.1"
- "HTTP/2"
valid_status_codes: [] # Defaults to 2xx
enable_http2: false
method: GET
no_follow_redirects: false
# fail_if_ssl为true时,表示如果站点启用了SSL则探针失败,反之成功;
# fail_if_not_ssl刚好相反;
fail_if_ssl: false
fail_if_not_ssl: false
# fail_if_body_matches_regexp, fail_if_body_not_matches_regexp, fail_if_header_matches, fail_if_header_not_matches
# 可以定义一组正则表达式,用于验证HTTP返回内容是否符合或者不符合正则表达式的内容
fail_if_body_matches_regexp:
- "Could not connect to database"
tls_config:
insecure_skip_verify: false
preferred_ip_protocol: "ip4" # defaults to "ip6"
http_post_2xx:
prober: http
http:
method: POST
tcp_connect:
prober: tcp
pop3s_banner:
prober: tcp
tcp:
query_response:
- expect: "^+OK"
tls: true
tls_config:
insecure_skip_verify: false
grpc:
prober: grpc
grpc:
tls: true
preferred_ip_protocol: "ip4"
grpc_plain:
prober: grpc
grpc:
tls: false
service: "service1"
ssh_banner:
prober: tcp
tcp:
query_response:
- expect: "^SSH-2.0-"
- send: "SSH-2.0-blackbox-ssh-check"
irc_banner:
prober: tcp
tcp:
query_response:
- send: "NICK prober"
- send: "USER prober prober prober :prober"
- expect: "PING :([^ ]+)"
send: "PONG ${1}"
- expect: "^:[^ ]+ 001"
icmp:
prober: icmp
icmp_ttl5:
prober: icmp
timeout: 5s
icmp:
ttl: 5
编辑prometheus配置文件
vim /opt/module/prometheus/prometheus-2.34.0.linux-amd64/prometheus.yml
#添加
- job_name: 'blackbox'
metrics_path: /probe
params:
module: [http_2xx] # Look for a HTTP 200 response.
static_configs:
- targets:
- 10.0.0.6
- www.google.com
- www.baidu.com
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: "192.168.0.22:9115" # 根据自己的配置,填写Blackbox exporter的安装ip+端口号.
- target_label: region
replacement: "remote"
重启prometheus服务
systemctl restart prometheus
访问blackbox-exporter的安装节点的IP+端口号
访问prometheus安装节点的ip地址+端口号
创建告警规则
在prometheus的安装节点上修改prometheus的配置文件,我的路径为
vim /opt/module/prometheus/prometheus-2.34.0.linux-amd64/prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- 192.168.0.21:9093
# - alertmanager:9093
# - rules/alert-rules-*.yml
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- rules/alert-rules-*.yml
#告警规则的路径,可以根据自己的实际情况做出修改,也可以采用这么默认路径,但是后期需要在prometheus的这个路径下创建告警规则
# - "first_rules.yml"
# - "second_rules.yml"
注意:这是prometheus.yml的顶行,有写地方是不需要修改的,只需要将没有的内容添加上去即可
[root@ambari-hadoop1 prometheus-2.34.0.linux-amd64]# pwd
/opt/module/prometheus/prometheus-2.34.0.linux-amd64
在这个路径下mkdir rules
cd rules
创建文件touch alert-rules-blackbox-exporter.yml
因为在prometheus.yml文件中创建的匹配规则为alert-rules-*.yml,所以此处创建的配置文件能被识别到
vim alert-rules-blackbox-exporter.yml
groups:
- name: blackbox
rules:
# Blackbox probe failed
- alert: BlackboxProbeFailed
expr: probe_success == 0
for: 0m
labels:
severity: critical
annotations:
summary: Blackbox probe failed (instance {{ $labels.instance }})
description: "Probe failed\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
# Blackbox slow probe
- alert: BlackboxSlowProbe
expr: avg_over_time(probe_duration_seconds[1m]) > 1
for: 1m
labels:
severity: warning
annotations:
summary: Blackbox slow probe (instance {{ $labels.instance }})
description: "Blackbox probe took more than 1s to complete\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
# Blackbox probe HTTP failure
- alert: BlackboxProbeHttpFailure
expr: probe_http_status_code <= 199 OR probe_http_status_code >= 400
for: 0m
labels:
severity: critical
annotations:
summary: Blackbox probe HTTP failure (instance {{ $labels.instance }})
description: "HTTP status code is not 200-399\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
# Blackbox probe slow HTTP
- alert: BlackboxProbeSlowHttp
expr: avg_over_time(probe_http_duration_seconds[1m]) > 1
for: 1m
labels:
severity: warning
annotations:
summary: Blackbox probe slow HTTP (instance {{ $labels.instance }})
description: "HTTP request took more than 1s\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
# Blackbox probe slow ping
- alert: BlackboxProbeSlowPing
expr: avg_over_time(probe_icmp_duration_seconds[1m]) > 1
for: 1m
labels:
severity: warning
annotations:
summary: Blackbox probe slow ping (instance {{ $labels.instance }})
description: "Blackbox ping took more than 1s\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
保存后重启prometheus服务
[root@ambari-hadoop1 rules]# systemctl restart prometheus
prometheus页面中查看