这几天被docker compose创建的服务器不能用的问题困扰了好久。大概知道是网络问题,因为nginx前端的请求发送很久都得不到回应,而从后端的log来看根本就没收到请求。
先说结论 ,如果确认是网络问题,删掉docker-compose创建的默认桥接网络,再重启服务即可。 具体操作看文章结尾。
情景再现:
我有个docker compose组成的docker服务,脱敏简化后docker-compose.yml 文件大概长这样
version: "3.9" # optional since v1.27.0
services:
ngi:
image: nginx
ports:
- "9009:8000"
backend:
image:conda
ports:
- "5011:5000"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
按照docker-compose的逻辑,只要创建了服务,docker-compose会自动为这个服务创建一个桥接网络,网络里的各个容器可以互通。如下:
#开启服务
$ docker-compose start
#查看网络列表
$ docker network ls
NETWORK ID NAME DRIVER SCOPE
d17e18e668f7 bridge bridge local
1713b4f93cd1 e2efold_default bridge local
# 查看网络细节
$ docker network inspect e2efold_default
[
{
"Name": "e2efold_default",
"Id": "1713b4f93cd1eb17fa5b8dd6c6f0e188cc69acbb9cd252aa2d856a1698715497",
"Created": "2022-07-22T14:01:15.625522568+08:00",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "172.22.0.0/16",
"Gateway": "172.22.0.1"
}
]
},
"Internal": false,
"Attachable": true,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"857dd3f688e0c12788046c8933cab0a65a656e816e700a76c6260962494ae194": {
"Name": "e2efold_backend_1",
"EndpointID": "b619315f66db55d2eec9f7fe4d2627132ddb4c6464e96033be8629e81f1c89c7",
"MacAddress": "02:42:ac:16:00:03",
"IPv4Address": "172.22.0.3/16",
"IPv6Address": ""
},
"8e21f8919db9edec3f0adcd66aa7aa3edaae70b228ea7601bbe6b4785ba94642": {
"Name": "e2efold_ngi_1",
"EndpointID": "18f1e1c666885b4ae4365f58a3d416001bd7163ab10cd4a959a91bfa478e066b",
"MacAddress": "02:42:ac:16:00:04",
"IPv4Address": "172.22.0.4/16",
"IPv6Address": ""
},
},
"Options": {},
"Labels": {
"com.docker.compose.network": "default",
"com.docker.compose.project": "e2efold",
"com.docker.compose.version": "1.29.2"
}
}
]
这个bug诡异就诡异在,bridge网络参数一切正常,但是网络内部就是ping不通,比如进入backend容器,去ping ngi容器:
$ docker exec -it e2efold_backend_1 /bin/bash
root@857dd3f688e0:/app# ping 172.22.0.4
会出现超时错误。
起初我以为是代理设置问题,因为这个服务器是学校的,经过学校代理来连接公网。所以常常会出现代理设置不当导致网络错误。
查看代理:
$ echo $http_proxy
结果是无代理设置。
这就很蹊跷了。
于是本着重启治百病的原则,我把这个docker的桥接网络删了!然后重启docker compose的服务:
$ docker-compose stop
$ docker network prune
$ docker-compose down
$ docker-compose up
然后奇迹出现。一切都恢复了:)