安装
官方: https://developer.hashicorp.com/consul/install
参考: https://gitcode.csdn.net/65ed701d1a836825ed798f7a.html?dp_token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpZCI6NjYxNjIyLCJleHAiOjE3MTE0NDgzMjIsImlhdCI6MTcxMDg0MzUyMiwidXNlcm5hbWUiOiJxcV80NDg3Njg5NCJ9.ZLQ5xpdAILQgxM21FUCcRB0LojuAfU8J_xdZRmZ72wY
简单启动
wget https://releases.hashicorp.com/consul/1.18.0/consul_1.18.0_linux_amd64.zip
unzip consul_1.18.0_linux_amd64.zip
sudo mv consul /usr/bin/
# 启动
consul agent -dev -bind=10.0.0.1 -http-port=8500
分布式方式启动
# 启动3个服务
consul agent -server -bootstrap-expect=3 -node=sv-v -data-dir=/tmp/consul -bind=10.0.0.1 -client=0.0.0.0 -datacenter=dc -ui
consul agent -server -bootstrap-expect=3 -node=sv-v2 -data-dir=/tmp/consul -bind=10.0.0.32 -client=0.0.0.0 -datacenter=dc -ui -retry-join-wan=10.0.0.1
consul agent -server -bootstrap-expect=3 -node=ps1 -data-dir=/tmp/consul -bind=10.0.0.2 -client=0.0.0.0 -datacenter=dc -ui -retry-join=10.0.0.1
启用分布式 agent server 和 client 监听地址 0.0.0.0,使用 consul members
命令行访问 consul ,登录 ui 页面是 bind 地址加上 8500 ,由于是分布式启动任何一个掉线,依然能从别的地址上访问到 consul。
生产部署
https://developer.hashicorp.com/consul/tutorials/production-deploy/deployment-guide
从配置文件启动
命令行启动: https://developer.hashicorp.com/consul/docs/agent/config/cli-flags#_config_file
consul agent -config-file=/home/mrh/Sync/sv-v-config/consul/sv-v-server.json
有关 json 配置文件的介绍:
https://developer.hashicorp.com/consul/docs/agent/config/config-files
配置文件示例:
{
"datacenter": "dc",
"data_dir": "/home/mrh/Sync/sv-v-config/consul/data",
"log_level": "INFO",
"node_name": "sv-v",
"server": true,
"bootstrap_expect": 3,
"bind_addr": "10.0.0.1",
"ui_config": {
"enabled": true
},
"client_addr":["10.0.0.1", "127.0.0.1"],
"connect": {
"enabled": true
},
"acl":{
"enabled": true,
"default_policy": "allow",
"enable_token_persistence":true
},
"retry_join": ["sv-v2", "ps1", "sv-v"],
"watches": [
{
"type": "checks",
"handler": ""
}
]
}
配置了 client_addr 之后就必须使用该地址访问 ui 页面,或者 http 请求。例如
# 正确
consul members -http-addr=http://10.0.0.1:8500
# 错误,因为它使用了默认 http://127.0.0.1:8500 地址请求,由于 sv-v-server.json 已经配置 "client_addr":"10.0.0.1" ,因此默认地址无法访问
consul members
ACL 秘钥访问控制
在这条consul agent -config-file=/home/mrh/Sync/sv-v-config/consul/sv-v-server.json
启动指令中,配置文件 sv-v-server.json 启动 ACL 的配置项:
"acl":{
"enabled": true,
"default_policy": "allow",
"enable_token_persistence":true
},
但是目前来说, “client_addr”:“10.0.0.1” 设置为 10 私有网段,实测只有加入这个网段的服务器才可以相互访问。用 192.168.2.31 也访问不了。
"default_policy": "allow"
是为了能够创建一个管理员秘钥,创建完管理员秘钥后,后续可以将它改为 "default_policy": "deny"
,以后用管理员 token 来进行后续操作。
生成 ACL 秘钥
# 生成一个令牌,默认的令牌是不受限制的管理令牌
consul acl bootstrap -http-addr=http://10.0.0.1:8500
AccessorID: 229b0b4e-748e-0859-48a0-b3b89bce4a5f
SecretID: b69317df-62be-59a3-9d98-21743514ed9d
Description: Bootstrap Token (Global Management)
Local: false
Create Time: 2024-03-20 19:53:46.124615905 +0800 CST
Policies:
00000000-0000-0000-0000-000000000001 - global-management
http://10.0.0.1:8500/ui/dc/acls/policies 查看秘钥所属读写策略
由于默认的管理秘钥具有非常多的权限,为了避免应用耦合,点击网页 http://10.0.0.1:8500 Policy - 右上角 - create - 新增一个仅用于注册和服务发现的策略:
acl = "write"
service_prefix "" {
policy = "write"
intentions = "write"
}
key_prefix "" {
policy = "write"
}
然后新建另一个 token 秘钥,将 Policy 应用到这个秘钥中:
http://10.0.0.1:8500/ui/dc/acls/tokens
注册服务
定义 service.json 配置文件
https://developer.hashicorp.com/consul/docs/services/usage/define-services#define-multiple-services-in-a-single-file
https://developer.hashicorp.com/consul/commands/services/register
# 注册服务
consul services register -token=a657c158-d4ac-459d-cd7c-8379a2a7e0ae -http-addr=http://10.0.0.1:8500 /home/mrh/Sync/sv-v-config/consul/config/service.json
# 移除服务
consul services deregister -token=a657c158-d4ac-459d-cd7c-8379a2a7e0ae -http-addr=http://10.0.0.1:8500 /home/mrh/Sync/sv-v-config/consul/config/service.json
健康检查
http
https://developer.hashicorp.com/consul/docs/services/usage/checks#http-checks
{
"services": [
{
"id": "video-get-pc",
"name": "video-get",
"address": "10.0.0.12",
"port": 9082,
"check": {
"http": "http://10.0.0.12:9082/",
"method": "GET",
"interval": "25s",
"timeout": "1s"
}
},
{
"id": "video-get-pc2",
"name": "video-get",
"address": "10.0.0.24",
"port": 9082,
"check": {
"http": "http://10.0.0.24:9082/",
"method": "GET",
"interval": "25s",
"timeout": "1s"
}
}
]
}
# 注册服务
consul services register -token=a657c158-d4ac-459d-cd7c-8379a2a7e0ae -http-addr=http://10.0.0.1:8500 /home/mrh/Sync/sv-v-config/consul/config/service.json
# 注销指定服务
consul services deregister -token=a657c158-d4ac-459d-cd7c-8379a2a7e0ae -http-addr=http://10.0.0.1:8500 -id=vector-sv-v2
请注意,从上述指令我在 10.0.0.1 服务器注册了两个服务,分别是 10.0.0.12 和 10.0.0.24,这两个服务并不是在 10.0.0.1 主机上(此处为了方便而简单配置)。在 consul 正确实践中应当注册的服务在同一台主机上,否则会有歧义。
也就是说,只能在 consul 主机上,只能注册主机自己的服务,如果你有别的 consul 集群主机 10.0.0.2 ,则应该注册自己的服务如 10.0.0.2:9000,10.0.0.1:9002 。
gRPC
{
"services": [
{
"id": "vector-sv-v2",
"name": "vector",
"check": {
"grpc": "10.0.0.32:18600",
"interval": "25s",
"timeout": "1s"
}
}
]
}
proto: https://github.com/grpc/grpc/blob/master/doc/health-checking.md
python server
class HealthService(HealthBase):
async def Check(self, stream):
request:vector_pb2.SaveDocToVectorRequest = await stream.recv_message()
try:
engine = create_engine(DATABASE_URL)
connection = engine.connect()
# 如果能成功连接到数据库,那么数据库就是健康的
connection.close()
responses = health_pb2.HealthCheckResponse(status=health_pb2.HealthCheckResponse.SERVING)
except psycopg2.Error as e:
responses = health_pb2.HealthCheckResponse(status=health_pb2.HealthCheckResponse.NOT_SERVING)
logger.error(f"Database is not available: {str(e)}")
await stream.send_message(responses)
async def main():
server = Server([VectorService(),HealthService()])
host = "0.0.0.0"
port = 18600
with graceful_exit([server]):
await server.start(host, port)
logger.info(f'Serving on {host}:{port}')
await server.wait_closed()
if __name__ == '__main__':
asyncio.run(main())
服务方式运行
https://developer.hashicorp.com/consul/tutorials/production-deploy/deployment-guide#configure-the-consul-process
ln /home/mrh/Sync/sv-v-config/consul/consul.service /etc/systemd/system/consul.service
sudo systemctl enable consul
sudo systemctl start consul
sudo systemctl status consul
sudo journalctl -u consul.service --since "1 hour ago"
服务发现
https://developer.hashicorp.com/consul/docs/concepts/service-discovery
服务的消费者通过 DNS 与服务通信。与 SRV 不同的是,consul 是一个中介服务,可以直接将请求路由到目标服务,而不用DNS获得实际IP然后才主动访问IP和端口。
https://developer.hashicorp.com/consul/docs/services/discovery/dns-overview
默认情况下,consul 服务发现使用 DNS 解析。
创建DNS策略
参考 https://developer.hashicorp.com/consul/tutorials/security/access-control-setup-production#token-for-dns
可以在命令行创建,也可以在 ui 页面创建。最终得到 dns token 2782c566-2efd-9b1b-e9fc-e3727767e85a
export CONSUL_HTTP_TOKEN=b69317df-62be-59a3-9d98-21743514ed9d
# 设置 DNS 默认令牌,需要使用具有读写的 token 值环境变量
consul acl set-agent-token default 2782c566-2efd-9b1b-e9fc-e3727767e85a
# 查看 DNS SRV 记录。
dig @127.0.0.1 -p 8600 video-get.service.consul SRV
# 返回
# 注意,如果 SRV 没有返回 9082 端口和 10.0.0.24 地址,说明注册 service 的配置文件中没有定义 address 和 port 字段
# 例如注册服务时, consul services register ./config/video-get.json ,在 json 文件中,必须定义 "port": 9082, 和 "address": "10.0.0.12", 字段
;; ANSWER SECTION:
video-get.service.consul. 0 IN SRV 1 1 9082 0a000018.addr.dc.consul.
video-get.service.consul. 0 IN SRV 1 1 9082 0a00000c.addr.dc.consul.
;; ADDITIONAL SECTION:
0a000018.addr.dc.consul. 0 IN A 10.0.0.24
sv-v.node.dc.consul. 0 IN TXT "consul-version=1.18.0"
0a00000c.addr.dc.consul. 0 IN A 10.0.0.12
sv-v.node.dc.consul. 0 IN TXT "consul-version=1.18.0"
python 服务发现示例
import random
import re
import socket
import struct
from dns import resolver
# 配置自定义 DNS 服务器
custom_resolver = resolver.Resolver()
custom_resolver.nameservers = ['10.0.0.1', '10.0.0.32', '10.0.0.2']
# 设置端口号(这里假设所有服务器都使用 8600 端口)
for server in custom_resolver.nameservers:
custom_resolver.port = 8600 # 注意:通常不需要对每个服务器单独设置,除非它们使用不同端口
def extract_ip_prefix(encoded_address:str):
match = encoded_address.split(".")[0]
return match
def decode_address(encoded_address):
hex_address = extract_ip_prefix(encoded_address)
binary_address = bytes.fromhex(hex_address)
return socket.inet_ntoa(binary_address)
def srv_to_address_random_choice(srv_record_name):
try:
answer = custom_resolver.resolve(srv_record_name, 'SRV')
selected_rdata = random.choice(answer)
ip_address = decode_address(selected_rdata.target.to_text())
print(f"Service: {srv_record_name} - {ip_address}:{selected_rdata.port} ")
return ip_address,selected_rdata.port
except resolver.NoAnswer as e:
print(f"No answer found for the query: {srv_record_name}")
except resolver.NXDOMAIN as e:
print(f"Domain not found: {srv_record_name}")
except Exception as e:
print(f"Error occurred while resolving DNS: {str(e)}")
def main():
# 查询 SRV 记录
srv_record_name = 'video-get.service.consul'
# srv_record_name = 'consul.service.consul'
srv_to_address_random_choice(srv_record_name)
if __name__ == "__main__":
main()
负载均衡
在官方文档中只有 Nginx 的负载均衡实现:https://developer.hashicorp.com/consul/tutorials/load-balancing/load-balancing-nginx
使用 Caddy 实现 SRV 发现和负载均衡: https://caddyserver.com/docs/caddyfile/directives/reverse_proxy#dynamic-upstreams