Kubernetes搭建Hadoop服务

网上使用Kubernetes搭建Hadoop的资料较少,因此自己尝试做了一个,记录下过程和遇到的问题。

一、选择镜像

首先从官方Docker Hub中选择比较热门的镜像。这里选择了bde2020的系列镜像,因为其Githab上的资料比较完善。https://github.com/big-data-europe/docker-hadoop

二、使用docker-compose进行测试

网站上给出的是使用docker-compose运行此hadoop镜像的方法,按照网站上操作即可。

docker-compose是Docker自带的容器编排工具,操作简单,只需要将docker-compose.yml和hadoop.env文件下载到本地,使用docker-compose up命令即可启动。停止服务执行docker-compose down命令。

三、编写各个组件的Kubernetes yaml文件

上面的docker-compose案例虽然简单,但是功能较少,且运行于同一台机器上。我们要做的就是把docker-compose的yaml文件的语法改写为Kubernetes的yaml文件语法。

1.创建configmap

配置文件可以通过configmap录入。参考hadoop.env,编写configmap.yaml如下:

apiVersion: v1
kind: ConfigMap
metadata:
  name: hadoop-config
data:
  CORE_CONF_fs_defaultFS: "hdfs://namenode:8020"
  CORE_CONF_hadoop_http_staticuser_user: "root"
  CORE_CONF_hadoop_proxyuser_hue_hosts: "*"
  CORE_CONF_hadoop_proxyuser_hue_groups: "*"

  HDFS_CONF_dfs_webhdfs_enabled: "true"
  HDFS_CONF_dfs_permissions_enabled: "false"
 
  YARN_CONF_yarn_log___aggregation___enable: "true"
  YARN_CONF_yarn_resourcemanager_recovery_enabled: "true"
  YARN_CONF_yarn_resourcemanager_store_class: "org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore"
  YARN_CONF_yarn_resourcemanager_fs_state___store_uri: "/rmstate"
  YARN_CONF_yarn_nodemanager_remote___app___log___dir: "/app-logs"
  YARN_CONF_yarn_log_server_url: "http://historyserver:8188/applicationhistory/logs/"
  YARN_CONF_yarn_timeline___service_enabled: "true"
  YARN_CONF_yarn_timeline___service_generic___application___history_enabled: "true"
  YARN_CONF_yarn_resourcemanager_system___metrics___publisher_enabled: "true"
  YARN_CONF_yarn_resourcemanager_hostname: "resourcemanager"
  YARN_CONF_yarn_timeline___service_hostname: "historyserver"
  YARN_CONF_yarn_resourcemanager_address: "resourcemanager:8032"
  YARN_CONF_yarn_resourcemanager_scheduler_address: "resourcemanager:8030"
  YARN_CONF_yarn_resourcemanager_resource___tracker_address: "resourcemanager:8031"

2.创建namenode

hadoop节点间的通信使用hostname,但是pod在创建时会被系统随机指定一个hostname并写入自己的/etc/hosts文件中,从而造成节点间的通信问题,出现UnresolvedAddressException等错误信息。这里坑了我好久,查了很多资料才发现问题。

解决方法就是在service中将clusterIP指定为None,并在deployment中指定hostname与service名称一致。为了避免混淆,后面的service name、container name、hostname等都设为相同的值。

注意service中clusterIP一定要设定为None,否则使用yarn处理MapReduce任务时会报错!

namenode需要挂载volume,因此先编写pvc.yaml(需要先创建StorageClass,具体可参考我之前的博客https://www.cnblogs.com/00986014w/p/9406962.html):

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: hadoop-namenode-pvc
spec:
  storageClassName: nfs
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi

  

编写namenode的service和deployment文件namenode.yaml如下(把所有可能用到的端口都暴露了,其实不需要这么多):

apiVersion: v1
kind: Service
metadata:
  name: namenode
  labels:
    name: namenode
spec:
  ports:
    - port: 50070
      name: http
    - port: 8020
      name: hdfs
    - port: 50075
      name: hdfs1
    - port: 50010
      name: hdfs2
    - port: 50020
      name: hdfs3
    - port: 9000
      name: hdfs4
    - port: 50090
      name: hdfs5
    - port: 31010
      name: hdfs6
    - port: 8030
      name: yarn1
    - port: 8031
      name: yarn2
    - port: 8032
      name: yarn3
    - port: 8033
      name: yarn4
    - port: 8040
      name: yarn5
    - port: 8042
      name: yarn6
    - port: 8088
      name: yarn7
    - port: 8188
      name: historyserver
  selector:
    name: namenode
  clusterIP: None
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: namenode
spec:
  replicas: 1
  template:
    metadata:
      labels:
        name: namenode
    spec:
      hostname: namenode
      containers:
        - name: namenode
          image: bde2020/hadoop-namenode:1.1.0-hadoop2.7.1-java8
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 50070
              name: http
            - containerPort: 8020
              name: hdfs
            - containerPort: 50075
              name: hdfs1
            - containerPort: 50010
              name: hdfs2
            - containerPort: 50020
              name: hdfs3
            - containerPort: 9000
              name: hdfs4
            - containerPort: 50090
              name: hdfs5
            - containerPort: 31010
              name: hdfs6
            - containerPort: 8030
              name: yarn1
            - containerPort: 8031
              name: yarn2
            - containerPort: 8032
              name: yarn3
            - containerPort: 8033
              name: yarn4
            - containerPort: 8040
              name: yarn5
            - containerPort: 8042
              name: yarn6
            - containerPort: 8088
              name: yarn7
            - containerPort: 8188
              name: historyserver
          env:
            - name: CLUSTER_NAME
              value: test
          envFrom:
            - configMapRef:
                name: hadoop-config
          volumeMounts:
            - name: hadoop-namenode
              mountPath: /hadoop/dfs/name
      volumes:
        - name: hadoop-namenode
          persistentVolumeClaim:
            claimName: hadoop-namenode-pvc

2.datanode

创建3个datanode。以datanode1为例,编写datanode的datanode.yaml如下(pvc与namenode的类似,不贴出来了):

apiVersion: v1
kind: Service
metadata:
  name: datanode1
  labels:
    name: datanode1
spec:
  ports:
    - port: 50070
      name: http
    - port: 8020
      name: hdfs
    - port: 50075
      name: hdfs1
    - port: 50010
      name: hdfs2
    - port: 50020
      name: hdfs3
    - port: 9000
      name: hdfs4
    - port: 50090
      name: hdfs5
    - port: 31010
      name: hdfs6
    - port: 8030
      name: yarn1
    - port: 8031
      name: yarn2
    - port: 8032
      name: yarn3
    - port: 8033
      name: yarn4
    - port: 8040
      name: yarn5
    - port: 8042
      name: yarn6
    - port: 8088
      name: yarn7
    - port: 8188
      name: historyserver
  selector:
    name: datanode1
  clusterIP: None
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: datanode1
spec:
  replicas: 1
  template:
    metadata:
      labels:
        name: datanode1
    spec:
      hostname: datanode1
      containers:
        - name: datanode1
          image: bde2020/hadoop-datanode:1.1.0-hadoop2.7.1-java8
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 50070
              name: http
            - containerPort: 8020
              name: hdfs
            - containerPort: 50075
              name: hdfs1
            - containerPort: 50010
              name: hdfs2
            - containerPort: 50020
              name: hdfs3
            - containerPort: 9000
              name: hdfs4
            - containerPort: 50090
              name: hdfs5
            - containerPort: 31010
              name: hdfs6
            - containerPort: 8030
              name: yarn1
            - containerPort: 8031
              name: yarn2
            - containerPort: 8032
              name: yarn3
            - containerPort: 8033
              name: yarn4
            - containerPort: 8040
              name: yarn5
            - containerPort: 8042
              name: yarn6
            - containerPort: 8088
              name: yarn7
            - containerPort: 8188
              name: historyserver
          envFrom:
            - configMapRef:
                name: hadoop-config
          volumeMounts:
            - name: hadoop-datanode1
              mountPath: /hadoop/dfs/data
      volumes:
        - name: hadoop-datanode1
          persistentVolumeClaim:
            claimName: hadoop-datanode1-pvc     

 

创建完成后,一定要用kubectl logs查看一下日志,确认没有错误信息后再继续下一步。

3.resourcemanager

编写resourcemanager.yaml文件如下:

apiVersion: v1
kind: Service
metadata:
  name: resourcemanager
  labels:
    name: resourcemanager
spec:
  ports:
    - port: 50070
      name: http
    - port: 8020
      name: hdfs
    - port: 50075
      name: hdfs1
    - port: 50010
      name: hdfs2
    - port: 50020
      name: hdfs3
    - port: 9000
      name: hdfs4
    - port: 50090
      name: hdfs5
    - port: 31010
      name: hdfs6
    - port: 8030
      name: yarn1
    - port: 8031
      name: yarn2
    - port: 8032
      name: yarn3
    - port: 8033
      name: yarn4
    - port: 8040
      name: yarn5
    - port: 8042
      name: yarn6
    - port: 8088
      name: yarn7
    - port: 8188
      name: historyserver
  selector:
    name: resourcemanager
  clusterIP: None
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: resourcemanager
spec:
  replicas: 1
  template:
    metadata:
      labels:
        name: resourcemanager
    spec:
      hostname: resourcemanager
      containers:
        - name: resourcemanager
          image: bde2020/hadoop-resourcemanager:1.1.0-hadoop2.7.1-java8
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 50070
              name: http
            - containerPort: 8020
              name: hdfs
            - containerPort: 50075
              name: hdfs1
            - containerPort: 50010
              name: hdfs2
            - containerPort: 50020
              name: hdfs3
            - containerPort: 9000
              name: hdfs4
            - containerPort: 50090
              name: hdfs5
            - containerPort: 31010
              name: hdfs6
            - containerPort: 8030
              name: yarn1
            - containerPort: 8031
              name: yarn2
            - containerPort: 8032
              name: yarn3
            - containerPort: 8033
              name: yarn4
            - containerPort: 8040
              name: yarn5
            - containerPort: 8042
              name: yarn6
            - containerPort: 8088
              name: yarn7
            - containerPort: 8188
              name: historyserver
          envFrom:
            - configMapRef:
                name: hadoop-config 

 

4.nodemanager

编写nodemanager.yaml如下:

apiVersion: v1
kind: Service
metadata:
  name: nodemanager1
  labels:
    name: nodemanager1
spec:
  ports:
    - port: 50070
      name: http
    - port: 8020
      name: hdfs
    - port: 50075
      name: hdfs1
    - port: 50010
      name: hdfs2
    - port: 50020
      name: hdfs3
    - port: 9000
      name: hdfs4
    - port: 50090
      name: hdfs5
    - port: 31010
      name: hdfs6
    - port: 8030
      name: yarn1
    - port: 8031
      name: yarn2
    - port: 8032
      name: yarn3
    - port: 8033
      name: yarn4
    - port: 8040
      name: yarn5
    - port: 8042
      name: yarn6
    - port: 8088
      name: yarn7
    - port: 8188
      name: historyserver
  selector: 
    name: nodemanager1
  clusterIP: None
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: nodemanager1
spec:
  replicas: 1
  template:
    metadata:
      labels:
        name: nodemanager1
    spec:
      hostname: nodemanager1
      containers:
        - name: nodemanager1
          image: bde2020/hadoop-nodemanager:1.1.0-hadoop2.7.1-java8
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 50070
              name: http
            - containerPort: 8020
              name: hdfs
            - containerPort: 50075
              name: hdfs1
            - containerPort: 50010
              name: hdfs2
            - containerPort: 50020
              name: hdfs3
            - containerPort: 9000
              name: hdfs4
            - containerPort: 50090
              name: hdfs5
            - containerPort: 31010
              name: hdfs6
            - containerPort: 8030
              name: yarn1
            - containerPort: 8031
              name: yarn2
            - containerPort: 8032
              name: yarn3
            - containerPort: 8033
              name: yarn4
            - containerPort: 8040
              name: yarn5
            - containerPort: 8042
              name: yarn6
            - containerPort: 8088
              name: yarn7
            - containerPort: 8188
          envFrom:
            - configMapRef:
                name: hadoop-config

 

5.historyserver

pvc与前面类似。编写historyserver.yaml如下:

apiVersion: v1
kind: Service
metadata:
  name: historyserver
  labels:
    name: historyserver
spec:
  ports:
    - port: 50070
      name: http
    - port: 8020
      name: hdfs
    - port: 50075
      name: hdfs1
    - port: 50010
      name: hdfs2
    - port: 50020
      name: hdfs3
    - port: 9000
      name: hdfs4
    - port: 50090
      name: hdfs5
    - port: 31010
      name: hdfs6
    - port: 8030
      name: yarn1
    - port: 8031
      name: yarn2
    - port: 8032
      name: yarn3
    - port: 8033
      name: yarn4
    - port: 8040
      name: yarn5
    - port: 8042
      name: yarn6
    - port: 8088
      name: yarn7
    - port: 8188
      name: historyserver
  selector:
    name: historyserver
  clusterIP: None
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: historyserver
spec:
  replicas: 1
  template:
    metadata:
      labels:
        name: historyserver
    spec:
      hostname: historyserver
      containers:
        - name: historyserver
          image: bde2020/hadoop-historyserver:1.1.0-hadoop2.7.1-java8
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 50070
              name: http
            - containerPort: 8020
              name: hdfs
            - containerPort: 50075
              name: hdfs1
            - containerPort: 50010
              name: hdfs2
            - containerPort: 50020
              name: hdfs3
            - containerPort: 9000
              name: hdfs4
            - containerPort: 50090
              name: hdfs5
            - containerPort: 31010
              name: hdfs6
            - containerPort: 8030
              name: yarn1
            - containerPort: 8031
              name: yarn2
            - containerPort: 8032
              name: yarn3
            - containerPort: 8033
              name: yarn4
            - containerPort: 8040
              name: yarn5
            - containerPort: 8042
              name: yarn6
            - containerPort: 8088
              name: yarn7
            - containerPort: 8188
          envFrom:
            - configMapRef:
                name: hadoop-config 
          volumeMounts:
            - name: hadoop-historyserver
              mountPath: /hadoop/yarn/timeline
      volumes:
        - name: hadoop-historyserver
          persistentVolumeClaim:
            claimName: hadoop-historyserver-pvc

 

以上几部分都用kubectl create创建后,参考GitHub,按照这5个部件对应的endpoint加上对应的端口,在浏览器上测试(需要在集群内部的某台机器上进行操作),如果能够正确显示Hadoop的页面,说明搭建成功!

6.测试hdfs

简单地测试一下节点间是否能够正常通行。

使用kubectl exec -it namenode /bin/bash进入namenode内部,执行hdfs dfs -put /etc/issue /,看看是否能够正常上传。

7.测试yarn

进入namenode容器内部,按照https://www.cnblogs.com/ccskun/p/7820977.html中的操作进行测试,看看任务能否正常执行,看看resourcemanager的web页面能否看到finish的任务。

转载于:https://www.cnblogs.com/00986014w/p/9732796.html

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值