Consul集群部署

  1. 场景
  2. 1. 下载最新物料包
  3. 2. 解压安装
  4. 3. 创建配置文件、数据目录
  5. 4. 创建配置文件
  6. 5. systemd守护进程
  7. 6. 常用命令
  8. 7. 访问入口
  9. 问题

场景

Consul在我这里的应用场景有两个:

  1. 注册中心: Prometheus基于Consul的自动Target发现, 加上CMDB系统可实现监控自动录入
  2. 配置中心: Prometheus的rule以k/v的形式存放在Consul注册中心中,然后通过confd实时监听,模板映射成rule文件,实现Prometheus告警规则统一界面管理.

本文只写部署, 后面有时间再补充如何实现的自动化监控体系(在上家公司做的,已经好久了)

主机名 角色 内网IP 配置目录 数据目录
shb-manager-mw-consul-node01 master、client 192.168.63.217 /etc/consul.d/ /data/consul
shb-manager-mw-consul-node02 master、client 192.168.63.218 /etc/consul.d/ /data/consul
shb-manager-mw-consul-node03 master、client 192.168.63.219 /etc/consul.d/ /data/consul

阿里云slb:
内网负载
192.168.63.205:8500
后端服务器组:
192.168.63.217:8500
192.168.63.218:8500
192.168.63.219:8500

1. 下载最新物料包

所有主机

cd /opt
wget -c https://releases.hashicorp.com/consul/1.10.3/consul_1.10.3_linux_amd64.zip

2. 解压安装

所有主机

unzip -q consul_1.10.3_linux_amd64.zip
mv consul /usr/local/bin/
consul -autocomplete-install
complete -C /usr/local/bin/consul consul

3. 创建配置文件、数据目录

所有主机

mkdir -p /etc/consul.d/ /data/consul
useradd --system --home /etc/consul.d --shell /bin/false consul
mkdir --parents /data/consul
chown --recursive consul:consul /data/consul /etc/consul.d

4. 创建配置文件

node1上执行,获取encrypt

consul keygen
mVINgJxtdZGy8SMlKYNFFbaBEjVSHChlVPlLjvfjnII=

所有主机执行,注意node名改为各自的信息

cat << EOF > /etc/consul.d/consul.hcl
datacenter = "aliyun"
node_name = "node01"
data_dir = "/data/consul"
enable_syslog = true
log_level = "INFO"
retry_join = ["192.168.63.217", "192.168.63.218", "192.168.63.219"]
start_join = ["192.168.63.217", "192.168.63.218", "192.168.63.219"]
retry_interval = "30s"
rejoin_after_leave = true
client_addr = "0.0.0.0"
bind_addr = "0.0.0.0"
encrypt = "mVINgJxtdZGy8SMlKYNFFbaBEjVSHChlVPlLjvfjnII="
performance {
  raft_multiplier = 1
}
limits {
  http_max_conns_per_client = 2000
}
EOF

所有主机执行

cat << EOF > /etc/consul.d/server.hcl
server = true
bootstrap_expect = 3
ui = true
EOF

5. systemd守护进程

cat << EOF > /etc/systemd/system/consul.service
[Unit]
Description="HashiCorp Consul - A service mesh solution"
Documentation=https://www.consul.io/
Requires=network-online.target
After=network-online.target
ConditionFileNotEmpty=/etc/consul.d/consul.hcl

[Service]
User=consul
Group=consul
ExecStart=/usr/local/bin/consul agent -config-dir=/etc/consul.d/
ExecReload=/usr/local/bin/consul reload
KillMode=process
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable consul
systemctl start consul

6. 常用命令

# 集群状态
consul members

# 获取集群节点
curl "192.168.63.217:8300/v1/status/peers"

# 获取leader节点
curl "192.168.63.217:8500/v1/status/leader"

# 注册service
cat << EOF > node_exporter.json
{
  "ID": "node-exporter-192.168.1.36",
  "Name": "node-exporter-master-36",
  "Tags": [
    "aliyun",
    "bendihua",
    "linux"
  ],
  "Address": "192.168.1.36",
  "Port": 9100,
  "Meta": {
    "env": "test",
    "hostname": "master-36",
    "instance": "192.168.1.36",
    "os": "centos7"
  },
  "EnableTagOverride": false,
  "Check": {
    "HTTP": "http://192.168.1.36:9100/metrics",
    "Interval": "10s"
  },
  "Weights": {
    "Passing": 10,
    "Warning": 1
  }
}
EOF

curl --request PUT --data @node_exporter.json http://192.168.63.205:8500/v1/agent/service/register


# 删除service
curl -X PUT http://192.168.63.205:8500/v1/agent/service/deregister/node-exporter-192.168.1.36

# 更新service与注册方式一样,相同的json文件中修改信息后put提交即为更新`

# 获取所有serviceid
curl http://192.168.63.205:8500/v1/health/state/any | python -m json.tool | grep ServiceID | awk '{print $2}' |sed 's/"//g' | sed 's/,//g'

# 批量删除不健康的service
servicelist=$(curl http://192.168.63.205:8500/v1/health/state/critical | python -m json.tool | grep ServiceID | awk '{print $2}' |sed 's/"//g' | sed 's/,//g'|grep jvm)
for i in $servicelist ;do
    echo $i;
  curl -X PUT http://192.168.63.205:8500/v1/agent/service/deregister/${i}
done

# 批量删除所有状态的service
servicelist=$(curl http://192.168.63.205:8500/v1/health/state/any | python -m json.tool | grep ServiceID | awk '{print $2}' |sed 's/"//g' | sed 's/,//g'|grep jvm)
for i in $servicelist ;do
    echo $i;
  curl -X PUT http://192.168.63.205:8500/v1/agent/service/deregister/${i}
done


# 用命令行方式获取所有services列表
consul catalog services|grep jvm > /tmp/jvm-list

# 用命令行方式批量删除services
for i in $(cat /tmp/jvm-list) ;do
    echo $i;
  curl -X PUT http://127.0.0.1:8500/v1/agent/service/deregister/${i}
done

7. 访问入口

http://192.168.63.205:8500/

问题

使用agent接口注册与删除服务有一个问题,就是前面挂了一个slb到三个agent节点,注册的service也是分散再各节点的, 发现还有一个catalog接口注册的办法,参考
https://blog.51cto.com/l0vesql/2489813
https://edgar615.github.io/consul-service-register.html
https://www.cnblogs.com/Qing-840/p/10144184.html
https://edgar615.github.io/consul-service-register.html


转载请注明来源, 欢迎对文章中的引用来源进行考证, 欢迎指出任何有错误或不够清晰的表达, 可以邮件至 chinaops666@gmail.com
相册