Kuberntets介绍
Kubernets是一个Google主导的机群管理系统,目前底层可以使用Docker,实现Docker实例的应用编排。Kubernets的介绍很多,本文简单介绍安装和使用的过程。更多资料可参考Kerbernets官网。
Kuberntets安装
Kubernets可以在虚拟机VM或安装Linux的服务器上安装,本文以Ubuntu Server服务器为例,详细可参见官网的Ubuntu安装指南。
先下载Kubernets源码,目前最新版为1.4.1
root@node3:/usr/src# git clone https://github.com/kubernetes/kubernetes.git
root@node3:/usr/src# cd /usr/src/kubernetes
root@node3:/usr/src# git checkout v1.4.1
本文中存在两个节点,node3(192.168.200.13)和node4(192.168.200.14),node3作为控制节点和计算节点,node4作为计算节点。于是修改kubernetes/cluster/ubuntu/config-default.sh
export nodes=${nodes:-"root@192.168.200.13 root@192.168.200.14"}
roles=${roles:-"ai i"}
export NUM_NODES=${NUM_NODES:-2}
export SERVICE_CLUSTER_IP_RANGE=${SERVICE_CLUSTER_IP_RANGE:-100.0.0.0/16} # formerly PORTAL_NET
export FLANNEL_NET=${FLANNEL_NET:-172.16.0.0/16}
DNS_SERVER_IP=${DNS_SERVER_IP:-"100.0.0.2"}
以上就是对配置文件的全部改动,请放置在相应位置。然后进行安装:
$ cd kubernetes/cluster
$ KUBERNETES_PROVIDER=ubuntu ./kube-up.sh
...........
Validate output:
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health": "true"}
Cluster validation succeeded
Done, listing cluster services:
Kubernetes master is running at http://192.168.200.13:8080
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
如果不出错则会提示安装完毕。此时将Kubernets的命令放于PATH中。
root@node3:/usr/src# export PATH=/usr/src/kubernetes/cluster/ubuntu/binaries/:$PATH
root@node3:/usr/src# which kubectl
/usr/src/kubernetes/cluster/ubuntu/binaries//kubectl
然后安装dashboard和dns组件:
$ cd cluster/ubuntu
$ KUBERNETES_PROVIDER=ubuntu ./deployAddons.sh
可能存在的问题:
- 如果需要重装,请运行KUBERNETES_PROVIDER=ubuntu ./kube-down.sh,停掉相关服务,然后要还原/etc/default/docker配置文件。
- Kubernets会从Google的镜像仓库(gcr.io)获取某些镜像,但国内被墙了,所以可以选择一个http代理服务器,并在需要启动这些镜像的主机上为docker添加代理,方法是在/etc/default/docker中的开头添加:
export HTTP_PROXY=...
然后重启docker:
service docker restart
完毕需要将代理去掉再重启docker,具体可参考这篇文章。
确保所有需运行这些镜像的节点本地都要有这些镜像!!,可以先在一个节点上用代理下载所有镜像,然后上传到私有仓库,再在其他节点上下载这些镜像即可。
- 在运行带有运行 google镜像时,如果本地已经有该镜像的时候,但配置文件中带有
imagePullPolicy: Always
时,则仍会从Google仓库去获取,一种方法是将其变为:
imagePullPolicy: IfNotPresent
另一种方法是放到私有仓库中。
* 如果配置文件中没有指定imagePullPolicy,老版本会优先从本地找该版本的镜像,如有则直接启动;但发现1.4.1版本会优先pull。测试需要在配置文件中加入:
然后再运行安装或其他的命令。
- Kubernetes会修改/etc/default/docker,请注意不要被覆盖原来的一些配置,否则docker pull私有仓库可能有问题,我的配置是:
DOCKER_OPTS=" --registry-mirror=http://2687282c.m.daocloud.io -H tcp://127.0.0.1:4243 -H unix:///var/run/docker.sock --bip=172.16.69.1/24 --mtu=1450"
Kubernets的一些概念
Pod
每一个Pod都是运行在某个节点上的实例或实例集合。可以对应于docker的实例或实例集合。
通过下面的命令可以查看运行的实例:
root@node3:~# kubectl get pods
NAME READY STATUS RESTARTS AGE
django-default-yxm7u 1/1 Running 0 15m
django-x-q9twt 1/1 Running 0 15m
django-y-wgy0c 1/1 Running 0 15m
nginx-ingress-e049x 1/1 Running 0 14m
还可以看一些系统的实例:
root@node3:/usr/sr# kubectl get pods --namespace=kube-system
NAME READY STATUS RESTARTS AGE
kube-dns-v20-h35xt 3/3 Running 3 1h
kubernetes-dashboard-v1.4.0-5g12f 1/1 Running 6 13h
还可以看这些pod运行在什么节点,这对排查问题比较有用。
root@node3:/usr/src# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
django-default-yxm7u 1/1 Running 0 20m 172.16.77.4 192.168.200.13
django-x-q9twt 1/1 Running 0 20m 172.16.9.2 192.168.200.14
django-y-wgy0c 1/1 Running 0 20m 172.16.9.3 192.168.200.14
nginx-ingress-e049x 1/1 Running 0 19m 172.16.77.5 192.168.200.13
Service
Service是提供对外可见的服务,可以使用下面的配置文件service.yaml新建服务。
# 3 Services for the 3 endpoints of the Ingress
apiVersion: v1
kind: Service
metadata:
name: django-x
labels:
app: django-x
spec:
type: NodePort
ports:
- port: 18111
#nodePort: 30301
targetPort: 8111
protocol: TCP
name: http
selector:
app: django-x
---
apiVersion: v1
kind: Service
metadata:
name: django-default
labels:
app: django-default
spec:
type: NodePort
ports:
- port: 18111
#nodePort: 30302
targetPort: 8111
protocol: TCP
name: http
selector:
app: django-default
---
apiVersion: v1
kind: Service
metadata:
name: django-y
labels:
app: django-y
spec:
type: NodePort
ports:
- port: 18111
#nodePort: 30284
targetPort: 8111
protocol: TCP
name: http
selector:
app: django-y
可以用kubectl查看服务(svc是services的缩写):
root@node3:/usr/src# kubectl create -f service.yaml
root@node3:/usr/src# kubectl get svc
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
basic 100.0.53.240 <nodes> 18112/TCP 8s
django-default 100.0.53.222 <nodes> 18111/TCP 21m
django-x 100.0.34.47 <nodes> 18111/TCP 21m
django-y 100.0.95.86 <nodes> 18111/TCP 21m
kubernetes 100.0.0.1 <none> 443/TCP 13h
正常情况下,在集群内部,是可以通过CLUSTER_IP : PORT来访问服务的,如在node3或node4上运行curl http://100.0.34.47:18111来访问django-x的服务的。
CLUSTER是Kubernets内部维护的机群,所以CLUSTER-IP和PORT是服务面向集群提供的内部ip和端口,互联网是访问不到的,例如上面的kubernets和basic服务就是这种情况。如果要从互联网访问这些服务,我们会在下面讲到,可用ingress的方法,使用一个代理将请求转到CLUSTER-IP : PORT。
如果每个物理节点是互联网可以直接访问到的话,那么也可以使用NodePort的类型,如上面的三个django都是这类服务,所以其外部IP是nodes。这样每个节点上的kube-proxy服务都会开一个端口P,外部网络可以通过访问任意一个节点的端口P进行访问。
那么P如何获得呢?可以通过查询服务获知:
root@node3:~/k8s-test# kubectl describe svc django-x
Name: django-x
Namespace: default
Labels: app=django-x
Selector: app=django-x
IP: 100.0.34.47
Port: http 18111/TCP
NodePort: http 32400/TCP
Endpoints: 172.16.76.3:8111
Session Affinity: None
No events.
那么就可以通过curl http://192.168.200.13:32400来访问该服务了。
Replication Controller
Replication Controller(RC)是控制Pod数量和部署的控制器,Kubernets区别原生Docker很重要的一点是它实现了对资源的监控、弹性部署。假设一个pod挂了,rc可以再启动一个;或者机群要扩容,rc也可以很快增加pod实现。
以下配置文件rc.yaml可以新建三个RC:
# A single RC matching all Services
apiVersion: v1
kind: ReplicationController
metadata:
name: django-x
spec:
replicas: 1
template:
metadata:
labels:
app: django-x
spec:
containers:
- name: django-x
image: appstore:5000/liuwenmao/django-hello
ports:
- containerPort: 8111
---
apiVersion: v1
kind: ReplicationController
metadata:
name: django-default
spec:
replicas: 1
template:
metadata:
labels:
app: django-default
spec:
containers:
- name: django-default
image: appstore:5000/liuwenmao/django-hello
ports:
- containerPort: 8111
---
apiVersion: v1
kind: ReplicationController
metadata:
name: django-y
spec:
replicas: 1
template:
metadata:
labels:
app: django-y
spec:
containers:
- name: django-y
image: appstore:5000/liuwenmao/django-hello
ports:
- containerPort: 8111
如果要获取rc,可运行:
root@node3:/usr/src# kubectl create -f rc.yaml
root@node3:/usr/src# kubectl get rc
NAME DESIRED CURRENT AGE
django-default 1 1 2h
django-x 1 1 2h
django-y 1 1 2h
nginx-ingress 1 1 2h
可能存在的问题
* 当要查看某个pod、service或rc的详细信息,可以用describe。如某个pod挂了,可以详细查看具体日志:
root@node3:/usr/src/kubernetes/cluster/ubuntu# kubectl get pods --namespace=kube-system
NAME READY STATUS RESTARTS AGE
kube-dns-v20-0gnu3 0/3 ErrImagePull 0 9m
kubernetes-dashboard-v1.4.0-5g12f 1/1 Running 4 11h
其中kube-dns看似有问题,继续检查:
root@node3:/usr/src/kubernetes/cluster/ubuntu# kubectl describe pods kube-dns-v20-0gnu3 --namespace=kube-system
Name: kube-dns-v20-0gnu3
Namespace: kube-system
Node: 192.168.200.14/192.168.200.14
Start Time: Thu, 13 Oct 2016 09:56:24 +0800
Labels: k8s-app=kube-dns
version=v20
Status: Pending
IP: 172.16.9.2
Controllers: ReplicationController/kube-dns-v20
Containers:
kubedns:
Container ID:
Image: gcr.io/google_containers/kubedns-amd64:1.8
Image ID:
Ports: 10053/UDP, 10053/TCP
Args:
--domain=cluster.local.
--dns-port=10053
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
State: Waiting
Reason: ErrImagePull
Ready: False
Restart Count: 0
Liveness: http-get http://:8080/healthz-kubedns delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8081/readiness delay=3s timeout=5s period=10s #success=1 #failure=3
Environment Variables: <none>
dnsmasq:
Container ID:
Image: gcr.io/google_containers/kube-dnsmasq-amd64:1.4
Image ID:
.......(略)
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
6m 6m 1 {default-scheduler } Normal Scheduled Successfully assigned kube-dns-v20-0gnu3 to 192.168.200.14
5m 5m 1 {kubelet 192.168.200.14} spec.containers{kubedns} Warning Failed Failed to pull image "gcr.io/google_containers/kubedns-amd64:1.8": image pull failed for gcr.io/google_containers/kubedns-amd64:1.8, this may be because there are no credentials on this request. details: (Error response from daemon: {"message":"Get https://gcr.io/v1/_ping: dial tcp 64.233.188.82:443: i/o timeout"})
4m 4m 1 {kubelet 192.168.200.14} spec.containers{dnsmasq} Warning Failed Failed to pull image "gcr.io/google_containers/kube-dnsmasq-amd64:1.4": image pull failed for gcr.io/google_containers/kube-dnsmasq-amd64:1.4, this may be because there are no credentials on this request. details: (Error response from daemon: {"message":"Get https://gcr.io/v1/_ping: dial tcp 64.233.188.82:443: i/o timeout"})
3m 3m 1 {kubelet 192.168.200.14} Warning FailedSync Error syncing pod, skipping: [failed to "StartContainer" for "kubedns" with ErrImagePull: "image pull failed for gcr.io/google_containers/kubedns-amd64:1.8, this may be because there are no credentials on this request. details: (Error response from daemon: {\"message\":\"Get https://gcr.io/v1/_ping: dial tcp 64.233.188.82:443: i/o timeout\"})"
, failed to "StartContainer" for "dnsmasq" with ErrImagePull: "image pull failed for gcr.io/google_containers/kube-dnsmasq-amd64:1.4, this may be because there are no credentials on this request. details: (Error response from daemon: {\"message\":\"Get https://gcr.io/v1/_ping: dial tcp 64.233.188.82:443: i/o timeout\"})"
, failed to "StartContainer" for "healthz" with ErrImagePull: "image pull failed for gcr.io/google_containers/exechealthz-amd64:1.2, this may be because there are no credentials on this request. details: (Error response from daemon: {\"message\":\"Get https://gcr.io/v1/_ping: dial tcp 64.233.188.82:443: i/o timeout\"})"
]
3m 3m 1 {kubelet 192.168.200.14} spec.containers{healthz} Warning Failed Failed to pull image "gcr.io/google_containers/exechealthz-amd64:1.2": image pull failed for gcr.io/google_containers/exechealthz-amd64:1.2, this may be because there are no credentials on this request. details: (Error response from daemon: {"message":"Get https://gcr.io/v1/_ping: dial tcp 64.233.188.82:443: i/o timeout"})
1m 1m 1 {kubelet 192.168.200.14} spec.containers{dnsmasq} Warning Failed Failed to pull image "gcr.io/google_containers/kube-dnsmasq-amd64:1.4": image pull failed for gcr.io/google_containers/kube-dnsmasq-amd64:1.4, this may be because there are no credentials on this request. details: (Error response from daemon: {"message":"Get https://gcr.io/v1/_ping: dial tcp 64.233.189.82:443: i/o timeout"})
4m 1m 2 {kubelet 192.168.200.14} spec.containers{healthz} Normal Pulling pulling image "gcr.io/google_containers/exechealthz-amd64:1.2"
59s 59s 1 {kubelet 192.168.19.14} spec.containers{healthz} Warning Failed Failed to pull image "gcr.io/google_containers/exechealthz-amd64:1.2": image pull failed for gcr.io/google_containers/exechealthz-amd64:1.2, this may be because there are no credentials on this request. details: (Error response from daemon: {"message":"Get https://gcr.io/v1/_ping: dial tcp 64.233.189.82:443: i/o timeout"})
59s 59s 1 {kubelet 192.168.200.14} Warning FailedSync Error syncing pod, skipping: [failed to "StartContainer" for "kubedns" with ErrImagePull: "image pull failed for gcr.io/google_containers/kubedns-amd64:1.8, this may be because there are no credentials on this request. details: (Error response from daemon: {\"message\":\"Get https://gcr.io/v1/_ping: dial tcp 64.233.189.82:443: i/o timeout\"})"
, failed to "StartContainer" for "dnsmasq" with ErrImagePull: "image pull failed for gcr.io/google_containers/kube-dnsmasq-amd64:1.4, this may be because there are no credentials on this request. details: (Error response from daemon: {\"message\":\"Get https://gcr.io/v1/_ping: dial tcp 64.233.189.82:443: i/o timeout\"})"
, failed to "StartContainer" for "healthz" with ErrImagePull: "image pull failed for gcr.io/google_containers/exechealthz-amd64:1.2, this may be because there are no credentials on this request. details: (Error response from daemon: {\"message\":\"Get https://gcr.io/v1/_ping: dial tcp 64.233.189.82:443: i/o timeout\"})"
能看到是因为向google的仓库下载超时导致的,可参考上面的代理方法解决。
比如现在我需要运行一个django的web服务,那么可以通过运行下面的配置文件即可实现。
为Service添加互联网入口
Load balance
可以expose deployment(带–type=”LoadBalance”)的方式将服务暴露出去,但是目前这种方式支持公有云,如Google Container Engine等,貌似不能应用于私有的数据中心。具体可以参考官网Hello World的Allow external traffic一节。
Ingress
在内网部署服务,希望对外暴露,可以使用Ingress的方式,以下配置文件为将上述服务在80端口上做映射,实现虚拟主机的功能。
# An Ingress with 2 hosts and 3 endpoints
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: echomap
spec:
rules:
- host: foo.bar.com
http:
paths:
- path: /foo
backend:
serviceName: django-x
servicePort: 18111
- host: bar.baz.com
http:
paths:
- path: /bar
backend:
serviceName: django-y
servicePort: 18111
- path: /foo
backend:
serviceName: django-x
servicePort: 18111
然后运行并查看Ingress
root@node3:/usr/src/gohome/src/k8s.io/contrib/ingress/controllers/nginx-alpha# kubectl create -f rc.yaml
replicationcontroller "nginx-ingress" created
root@node3:~$ kubectl get ing
NAME HOSTS ADDRESS PORTS AGE
echomap foo.bar.com,bar.baz.com 80 3d
root@node3:~$ kubectl describe ing echomap
Name: echomap
Namespace: default
Address:
Default backend: default-http-backend:80 (<none>)
Rules:
Host Path Backends
---- ---- --------
foo.bar.com
/foo django-x:18111 (<none>)
bar.baz.com
/bar django-y:18111 (<none>)
/foo django-x:18111 (<none>)
Annotations:
No events.
Ingress是Kubernets的一种用于访问服务的机制,可以通过api获得这些映射关系,不过如何实现具体的功能,例如上例中的虚拟主机功能,可以使用Kubernets的contrib中的ingress-controller实现,本文给出的是使用nginx实现。
contrib是一大堆未进入Kubernets核心的代码集合,代码在(https://github.com/kubernetes/contrib),安装请按照项目的README进行(需要在$GOPATH里面,不是随便安装即可的),假设$GOPATH=/usr/src/gohome,那么Kubernets Contrib在/usr/src/gohome/src/k8s.io/contrib/,而我们说的nginx Ingress控制器则在/usr/src/gohome/src/k8s.io/contrib/ingress/controllers/nginx-alpha。
查看一下nginx-alpha目录下面的rc.yaml可知,Ingress Controller后台使用了gcr.io/google_containers/nginx-ingress镜像,不过这个镜像在笔者测试时有问题,所以实验中还是根据同目录下的Dockerfile重新生成的。查看Dockerfile可知,这个镜像是基于nginx的,并通过运行controller程序将Kubernets API获得的Ingress映射翻译成nginx的配置文件,从而实现了反向代理到运行不同网站的Service的功能。
当运行完创建rc.yaml后,我们可以找到相应的docker容器,观察其中的映射规则:
root@node3:/usr/src# kubectl get pods -o wide|grep nginx
nginx-ingress-g518r 1/1 Running 2 3d 172.16.66.3 192.168.200.13
root@node3:/usr/src# docker ps |grep nginx
4374a4965333 gcr.io/google_containers/nginx-ingress:0.1 "/controller" 46 hours ago Up 46 hours k8s_nginx.a9cb3eb9_nginx-ingress-g518r_default_71b457b9-914e-11e6-821d-c81f66f3c543_f9c7501f
0051bb8806d1 gcr.io/google_containers/pause-amd64:3.0 "/pause" 46 hours ago Up 46 hours 0.0.0.0:80->80/tcp k8s_POD.6cfd0339_nginx-ingress-g518r_default_71b457b9-914e-11e6-821d-c81f66f3c543_e01f441e
root@node3:/usr/src# docker exec -it 4374a4965333 /bin/bash
[ root@nginx-ingress-g518r:/etc/nginx ]$ ls
certs/ fastcgi.conf koi-utf mime.types proxy_params sites-available/ snippets/ win-utf
conf.d/ fastcgi_params koi-win nginx.conf scgi_params sites-enabled/ uwsgi_params
[ root@nginx-ingress-g518r:/etc/nginx ]$ cat nginx.conf
events {
worker_connections 1024;
}
http {
# http://nginx.org/en/docs/http/ngx_http_core_module.html
types_hash_max_size 2048;
server_names_hash_max_size 512;
server_names_hash_bucket_size 64;
server {
listen 80;
server_name foo.bar.com;
location /foo {
proxy_set_header Host $host;
proxy_pass http://django-x.default.svc.cluster.local:18111;
}
}
server {
listen 80;
server_name bar.baz.com;
location /bar {
proxy_set_header Host $host;
proxy_pass http://django-y.default.svc.cluster.local:18111;
}
location /foo {
proxy_set_header Host $host;
proxy_pass http://django-x.default.svc.cluster.local:18111;
}
}
[ root@nginx-ingress-g518r:/etc/nginx ]$ ping django-x.default.svc.cluster.local
PING django-x.default.svc.cluster.local (100.0.87.12) 56(84) bytes of data.
64 bytes from django-x.default.svc.cluster.local (100.0.87.12): icmp_seq=1 ttl=47 time=265 ms
64 bytes from django-x.default.svc.cluster.local (100.0.87.12): icmp_seq=2 ttl=47 time=253 ms
到此为止,背后的原理已经清楚了。我们可以看一下运行效果。因为三个服务运行的都是django,所以可以查看django的输出日志(如果用supervisor运行的,可以进容器查看/var/log/supervisor里面相关的log文件)查看真实的Web访问情况。
此时,如果访问http://bar.baz.com/bar,就会转到django-y的服务(http://django-y.default.svc.cluster.local:18111),如果访问http://bar.baz.com/foo或http://foo.bar.com/foo,就转到django-x,其余的则转到ngix的404页面。
上述思路不仅可以使用在web服务,还可以用于如外部接入ssh等。