在K8S1.24使用Helm3部署Alluxio2.8.1

在K8S1.24使用Helm3部署Alluxio2.8.1

前言

Alluxio官网:https://www.alluxio.io/

在这里插入图片描述

关于什么是Alluxio不再赘述,多去官网看看就明白了。笔者更关心的是它的功能。

我们去年就白piao了Alluxio,用作热数据缓存及统一文件层:https://mp.weixin.qq.com/s/kBetfi_LxAQGwgMBpI70ow

可以搜索标题:

【Alluxio&大型银行】科技赋能金融,兴业银行按下“大数据处理加速键”

先了解下Primo Ramdisk这款软件:https://www.romexsoftware.com/zh-cn/primo-ramdisk/overview.html

在这里插入图片描述

Alluxio用作内存盘,就类似于Primo Ramdisk,将持久化到硬盘及文件系统的数据预先读取到内存中,之后应用程序直接从Alluxio集群的内存中读取数据,那速度自然是爽的飞起。笔者之前有做了个4G的内存盘,专用于拷贝U盘文件。现在Win10开机为神马占用很多内存?其实就是预加载了硬盘的热数据【咳咳咳,此处点名批评细数SN550的冷数据门】,加速读写提升IO的同时,减少了对SSD的读写,提高了硬盘寿命。

使用Alluxio将热数据缓存到内存实现读写提速就是这个原理,由于加载到内存时数据离计算节点更近,还能显著减少对网络带宽的占用,降低交换机负载【对于云服务器ECS计量付费而言,节省的就是白花花的银子】。

使用Alluxio的这个特性,还可以给Spark等计算引擎做RSS,Alluxio正是做这个起家的【当它还是叫Tachyon时,就是给Spark做堆外缓存使用】:https://spark.apache.org/third-party-projects.html

在Spark的第三方项目中,至今也可以看到Alluxio的踪影:

在这里插入图片描述

另一个重要的功能就是统一文件层了。由于其可以兼容多种文件系统协议,不管是Amazon的S3对象存储,还是HDFS或者NFS等,都可以Mount到Alluxio,实现统一接口访问。屏蔽了不同文件系统的差异,处理异构数据源时还是能方便不少,开发人员也不必掌握很多种文件系统的API了,Alluxio的API一套即可通吃。

笔者已经有不少虚拟机了,本着方便挂起,随时使用的原则,还是搭单节点。这次部署在K8S上。

官网文档:https://docs.alluxio.io/os/user/stable/cn/deploy/Running-Alluxio-On-Kubernetes.html

后续主要参照这篇官网的文档安装Alluxio2.8.1 On K8S1.24。

当前环境

虚拟机及K8S环境:https://lizhiyong.blog.csdn.net/article/details/126236516

root@zhiyong-ksp1:/home/zhiyong# kubectl get pods -owide --all-namespaces
NAMESPACE                      NAME                                                              READY   STATUS      RESTARTS        AGE     IP              NODE           NOMINATED NODE   READINESS GATES
argocd                         devops-argocd-application-controller-0                            1/1     Running     0               3h27m   10.233.107.78   zhiyong-ksp1   <none>           <none>
argocd                         devops-argocd-applicationset-controller-5864597bfc-pf8ht          1/1     Running     0               3h27m   10.233.107.79   zhiyong-ksp1   <none>           <none>
argocd                         devops-argocd-dex-server-f885fb4b4-fkpls                          1/1     Running     0               3h27m   10.233.107.77   zhiyong-ksp1   <none>           <none>
argocd                         devops-argocd-notifications-controller-54b744556f-f4g24           1/1     Running     0               3h27m   10.233.107.74   zhiyong-ksp1   <none>           <none>
argocd                         devops-argocd-redis-556fdd5876-xftmq                              1/1     Running     0               3h27m   10.233.107.73   zhiyong-ksp1   <none>           <none>
argocd                         devops-argocd-repo-server-5dbf9b87db-9tw2c                        1/1     Running     0               3h27m   10.233.107.76   zhiyong-ksp1   <none>           <none>
argocd                         devops-argocd-server-6f9898cc75-s7jkm                             1/1     Running     0               3h27m   10.233.107.75   zhiyong-ksp1   <none>           <none>
istio-system                   istiod-1-11-2-54dd699c87-99krn                                    1/1     Running     0               23h     10.233.107.41   zhiyong-ksp1   <none>           <none>
istio-system                   jaeger-collector-67cfc55477-7757f                                 1/1     Running     5 (22h ago)     22h     10.233.107.61   zhiyong-ksp1   <none>           <none>
istio-system                   jaeger-operator-fccc48b86-vtcr8                                   1/1     Running     0               23h     10.233.107.47   zhiyong-ksp1   <none>           <none>
istio-system                   jaeger-query-8497bdbfd7-csbts                                     2/2     Running     0               22h     10.233.107.67   zhiyong-ksp1   <none>           <none>
istio-system                   kiali-75c777bdf6-xhbq7                                            1/1     Running     0               22h     10.233.107.58   zhiyong-ksp1   <none>           <none>
istio-system                   kiali-operator-c459985f7-sttfs                                    1/1     Running     0               23h     10.233.107.38   zhiyong-ksp1   <none>           <none>
kube-system                    calico-kube-controllers-f9f9bbcc9-2v7lm                           1/1     Running     2 (22h ago)     9d      10.233.107.45   zhiyong-ksp1   <none>           <none>
kube-system                    calico-node-4mgc7                                                 1/1     Running     2 (22h ago)     9d      192.168.88.20   zhiyong-ksp1   <none>           <none>
kube-system                    coredns-f657fccfd-2gw7h                                           1/1     Running     2 (22h ago)     9d      10.233.107.39   zhiyong-ksp1   <none>           <none>
kube-system                    coredns-f657fccfd-pflwf                                           1/1     Running     2 (22h ago)     9d      10.233.107.43   zhiyong-ksp1   <none>           <none>
kube-system                    kube-apiserver-zhiyong-ksp1                                       1/1     Running     2 (22h ago)     9d      192.168.88.20   zhiyong-ksp1   <none>           <none>
kube-system                    kube-controller-manager-zhiyong-ksp1                              1/1     Running     2 (22h ago)     9d      192.168.88.20   zhiyong-ksp1   <none>           <none>
kube-system                    kube-proxy-cn68l                                                  1/1     Running     2 (22h ago)     9d      192.168.88.20   zhiyong-ksp1   <none>           <none>
kube-system                    kube-scheduler-zhiyong-ksp1                                       1/1     Running     2 (22h ago)     9d      192.168.88.20   zhiyong-ksp1   <none>           <none>
kube-system                    nodelocaldns-96gtw                                                1/1     Running     2 (22h ago)     9d      192.168.88.20   zhiyong-ksp1   <none>           <none>
kube-system                    openebs-localpv-provisioner-68db4d895d-p9527                      1/1     Running     1 (22h ago)     9d      10.233.107.40   zhiyong-ksp1   <none>           <none>
kube-system                    snapshot-controller-0                                             1/1     Running     2 (22h ago)     9d      10.233.107.42   zhiyong-ksp1   <none>           <none>
kubesphere-controls-system     default-http-backend-587748d6b4-ccg59                             1/1     Running     2 (22h ago)     9d      10.233.107.50   zhiyong-ksp1   <none>           <none>
kubesphere-controls-system     kubectl-admin-5d588c455b-82cnk                                    1/1     Running     2 (22h ago)     9d      10.233.107.48   zhiyong-ksp1   <none>           <none>
kubesphere-devops-system       devops-27679170-8nrzx                                             0/1     Completed   0               65m     10.233.107.90   zhiyong-ksp1   <none>           <none>
kubesphere-devops-system       devops-27679200-kdgvk                                             0/1     Completed   0               35m     10.233.107.91   zhiyong-ksp1   <none>           <none>
kubesphere-devops-system       devops-27679230-v9h2l                                             0/1     Completed   0               5m34s   10.233.107.92   zhiyong-ksp1   <none>           <none>
kubesphere-devops-system       devops-apiserver-6b468c95cb-9s7lz                                 1/1     Running     0               3h27m   10.233.107.82   zhiyong-ksp1   <none>           <none>
kubesphere-devops-system       devops-controller-667f8449d7-gjgj8                                1/1     Running     0               3h27m   10.233.107.80   zhiyong-ksp1   <none>           <none>
kubesphere-devops-system       devops-jenkins-bf85c664c-c6qnq                                    1/1     Running     0               3h27m   10.233.107.84   zhiyong-ksp1   <none>           <none>
kubesphere-devops-system       s2ioperator-0                                                     1/1     Running     0               3h27m   10.233.107.83   zhiyong-ksp1   <none>           <none>
kubesphere-logging-system      elasticsearch-logging-curator-elasticsearch-curator-2767784rhhk   0/1     Completed   0               23h     10.233.107.51   zhiyong-ksp1   <none>           <none>
kubesphere-logging-system      elasticsearch-logging-data-0                                      1/1     Running     0               23h     10.233.107.65   zhiyong-ksp1   <none>           <none>
kubesphere-logging-system      elasticsearch-logging-discovery-0                                 1/1     Running     0               23h     10.233.107.64   zhiyong-ksp1   <none>           <none>
kubesphere-monitoring-system   alertmanager-main-0                                               2/2     Running     4 (22h ago)     9d      10.233.107.56   zhiyong-ksp1   <none>           <none>
kubesphere-monitoring-system   kube-state-metrics-6d6786b44-bbb4f                                3/3     Running     6 (22h ago)     9d      10.233.107.44   zhiyong-ksp1   <none>           <none>
kubesphere-monitoring-system   node-exporter-8sz74                                               2/2     Running     4 (22h ago)     9d      192.168.88.20   zhiyong-ksp1   <none>           <none>
kubesphere-monitoring-system   notification-manager-deployment-6f8c66ff88-pt4l8                  2/2     Running     4 (22h ago)     9d      10.233.107.53   zhiyong-ksp1   <none>           <none>
kubesphere-monitoring-system   notification-manager-operator-6455b45546-nkmx8                    2/2     Running     4 (22h ago)     9d      10.233.107.52   zhiyong-ksp1   <none>           <none>
kubesphere-monitoring-system   prometheus-k8s-0                                                  2/2     Running     0               3h25m   10.233.107.85   zhiyong-ksp1   <none>           <none>
kubesphere-monitoring-system   prometheus-operator-66d997dccf-c968c                              2/2     Running     4 (22h ago)     9d      10.233.107.37   zhiyong-ksp1   <none>           <none>
kubesphere-system              ks-apiserver-6b9bcb86f4-hsdzs                                     1/1     Running     2 (22h ago)     9d      10.233.107.55   zhiyong-ksp1   <none>           <none>
kubesphere-system              ks-console-599c49d8f6-ngb6b                                       1/1     Running     2 (22h ago)     9d      10.233.107.49   zhiyong-ksp1   <none>           <none>
kubesphere-system              ks-controller-manager-66747fcddc-r7cpt                            1/1     Running     2 (22h ago)     9d      10.233.107.54   zhiyong-ksp1   <none>           <none>
kubesphere-system              ks-installer-5fd8bd46b8-dzhbb                                     1/1     Running     2 (22h ago)     9d      10.233.107.46   zhiyong-ksp1   <none>           <none>
kubesphere-system              minio-746f646bfb-hcf5c                                            1/1     Running     0               3h32m   10.233.107.71   zhiyong-ksp1   <none>           <none>
kubesphere-system              openldap-0                                                        1/1     Running     1 (3h30m ago)   3h32m   10.233.107.69   zhiyong-ksp1   <none>           <none>
root@zhiyong-ksp1:/home/zhiyong#

可以看到Pod们目前状态相当正常。再来看看helm:

root@zhiyong-ksp1:/home/zhiyong# helm
The Kubernetes package manager

Common actions for Helm:

- helm search:    search for charts
- helm pull:      download a chart to your local directory to view
- helm install:   upload the chart to Kubernetes
- helm list:      list releases of charts

Environment variables:

| Name                               | Description                                                                       |
|------------------------------------|-----------------------------------------------------------------------------------|
| $HELM_CACHE_HOME                   | set an alternative location for storing cached files.                             |
| $HELM_CONFIG_HOME                  | set an alternative location for storing Helm configuration.                       |
| $HELM_DATA_HOME                    | set an alternative location for storing Helm data.                                |
| $HELM_DEBUG                        | indicate whether or not Helm is running in Debug mode                             |
| $HELM_DRIVER                       | set the backend storage driver. Values are: configmap, secret, memory, postgres   |
| $HELM_DRIVER_SQL_CONNECTION_STRING | set the connection string the SQL storage driver should use.                      |
| $HELM_MAX_HISTORY                  | set the maximum number of helm release history.                                   |
| $HELM_NAMESPACE                    | set the namespace used for the helm operations.                                   |
| $HELM_NO_PLUGINS                   | disable plugins. Set HELM_NO_PLUGINS=1 to disable plugins.                        |
| $HELM_PLUGINS                      | set the path to the plugins directory                                             |
| $HELM_REGISTRY_CONFIG              | set the path to the registry config file.                                         |
| $HELM_REPOSITORY_CACHE             | set the path to the repository cache directory                                    |
| $HELM_REPOSITORY_CONFIG            | set the path to the repositories file.                                            |
| $KUBECONFIG                        | set an alternative Kubernetes configuration file (default "~/.kube/config")       |
| $HELM_KUBEAPISERVER                | set the Kubernetes API Server Endpoint for authentication                         |
| $HELM_KUBECAFILE                   | set the Kubernetes certificate authority file.                                    |
| $HELM_KUBEASGROUPS                 | set the Groups to use for impersonation using a comma-separated list.             |
| $HELM_KUBEASUSER                   | set the Username to impersonate for the operation.                                |
| $HELM_KUBECONTEXT                  | set the name of the kubeconfig context.                                           |
| $HELM_KUBETOKEN                    | set the Bearer KubeToken used for authentication.                                 |

Helm stores cache, configuration, and data based on the following configuration order:

- If a HELM_*_HOME environment variable is set, it will be used
- Otherwise, on systems supporting the XDG base directory specification, the XDG variables will be used
- When no other location is set a default location will be used based on the operating system

By default, the default directories depend on the Operating System. The defaults are listed below:

| Operating System | Cache Path                | Configuration Path             | Data Path               |
|------------------|---------------------------|--------------------------------|-------------------------|
| Linux            | $HOME/.cache/helm         | $HOME/.config/helm             | $HOME/.local/share/helm |
| macOS            | $HOME/Library/Caches/helm | $HOME/Library/Preferences/helm | $HOME/Library/helm      |
| Windows          | %TEMP%\helm               | %APPDATA%\helm                 | %APPDATA%\helm          |

Usage:
  helm [command]

Available Commands:
  completion  generate autocompletion scripts for the specified shell
  create      create a new chart with the given name
  dependency  manage a chart's dependencies
  env         helm client environment information
  get         download extended information of a named release
  help        Help about any command
  history     fetch release history
  install     install a chart
  lint        examine a chart for possible issues
  list        list releases
  package     package a chart directory into a chart archive
  plugin      install, list, or uninstall Helm plugins
  pull        download a chart from a repository and (optionally) unpack it in local directory
  repo        add, list, remove, update, and index chart repositories
  rollback    roll back a release to a previous revision
  search      search for a keyword in charts
  show        show information of a chart
  status      display the status of the named release
  template    locally render templates
  test        run tests for a release
  uninstall   uninstall a release
  upgrade     upgrade a release
  verify      verify that a chart at the given path has been signed and is valid
  version     print the client version information

Flags:
      --debug                       enable verbose output
  -h, --help                        help for helm
      --kube-apiserver string       the address and the port for the Kubernetes API server
      --kube-as-group stringArray   group to impersonate for the operation, this flag can be repeated to specify multiple groups.
      --kube-as-user string         username to impersonate for the operation
      --kube-ca-file string         the certificate authority file for the Kubernetes API server connection
      --kube-context string         name of the kubeconfig context to use
      --kube-token string           bearer token used for authentication
      --kubeconfig string           path to the kubeconfig file
  -n, --namespace string            namespace scope for this request
      --registry-config string      path to the registry config file (default "/root/.config/helm/registry.json")
      --repository-cache string     path to the file containing cached repository indexes (default "/root/.cache/helm/repository")
      --repository-config string    path to the file containing repository names and URLs (default "/root/.config/helm/repositories.yaml")

Use "helm [command] --help" for more information about a command.
root@zhiyong-ksp1:/home/zhiyong#

可以看到KubeSphere已经很贴心地安装好helm,可以给非专业运维的开发人员省不少事情。

接下来就可以使用helm3安装了【Alluxio2.3之后不支持helm2】。当然也可以使用kubectl安装Alluxio,自行查看官网文档。

使用Helm部署Alluxio2.8.1

添加Alluxio helm chart的helm repro

root@zhiyong-ksp1:/home/zhiyong# helm repo add alluxio-charts https://alluxio-charts.storage.googleapis.com/openSource/2.8.1
"alluxio-charts" has been added to your repositories
root@zhiyong-ksp1:/home/zhiyong# helm list
NAME    NAMESPACE       REVISION        UPDATED STATUS  CHART   APP VERSION
root@zhiyong-ksp1:/home/zhiyong#

国人开源的组件还是充分考虑了国内特殊的网络环境,好评!!!一次成功。

查看配置

root@zhiyong-ksp1:/home/zhiyong# helm inspect values alluxio-charts/alluxio
#
# The Alluxio Open Foundation licenses this work under the Apache License, version 2.0
# (the "License"). You may not use this work except in compliance with the License, which is
# available at www.apache.org/licenses/LICENSE-2.0
#
# This software is distributed on an "AS IS" basis, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
# either express or implied, as more fully set forth in the License.
#
# See the NOTICE file distributed with this work for information regarding copyright ownership.
#

# This should not be modified in the usual case.
fullnameOverride: alluxio


## Common ##

# Docker Image
image: alluxio/alluxio
imageTag: 2.8.1
imagePullPolicy: IfNotPresent

# Security Context
user: 1000
group: 1000
fsGroup: 1000

# Service Account
#   If not specified, Kubernetes will assign the 'default'
#   ServiceAccount used for the namespace
serviceAccount:

# Image Pull Secret
#   The secrets will need to be created externally from
#   this Helm chart, but you can configure the Alluxio
#   Pods to use the following list of secrets
# eg:
# imagePullSecrets:
#   - ecr
#   - dev
imagePullSecrets:

# Site properties for all the components
properties:
  # alluxio.user.metrics.collection.enabled: 'true'
  alluxio.security.stale.channel.purge.interval: 365d

# Recommended JVM Heap options for running in Docker
# Ref: https://developers.redhat.com/blog/2017/03/14/java-inside-docker/
# These JVM options are common to all Alluxio services
# jvmOptions:
#   - "-XX:+UnlockExperimentalVMOptions"
#   - "-XX:+UseCGroupMemoryLimitForHeap"
#   - "-XX:MaxRAMFraction=2"

# Mount Persistent Volumes to all components
# mounts:
# - name: <persistentVolume claimName>
#   path: <mountPath>

# Use labels to run Alluxio on a subset of the K8s nodes
# nodeSelector: {}

# A list of K8s Node taints to allow scheduling on.
# See the Kubernetes docs for more info:
# - https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/
# eg: tolerations: [ {"key": "env", "operator": "Equal", "value": "prod", "effect": "NoSchedule"} ]
# tolerations: []

## Master ##

master:
  enabled: true
  count: 1 # Controls the number of StatefulSets. For multiMaster mode increase this to >1.
  replicas: 1 # Controls #replicas in a StatefulSet and should not be modified in the usual case.
  env:
    # Extra environment variables for the master pod
    # Example:
    # JAVA_HOME: /opt/java
  args: # Arguments to Docker entrypoint
    - master-only
    - --no-format
  # Properties for the master component
  properties:
    # Example: use ROCKS DB instead of Heap
    # alluxio.master.metastore: ROCKS
    # alluxio.master.metastore.dir: /metastore
  resources:
    # The default xmx is 8G
    limits:
      cpu: "4"
      memory: "8Gi"
    requests:
      cpu: "1"
      memory: "1Gi"
  ports:
    embedded: 19200
    rpc: 19998
    web: 19999
  hostPID: false
  hostNetwork: false
  shareProcessNamespace: false
  extraContainers: []
  extraVolumeMounts: []
  extraVolumes: []
  extraServicePorts: []
  # dnsPolicy will be ClusterFirstWithHostNet if hostNetwork: true
  # and ClusterFirst if hostNetwork: false
  # You can specify dnsPolicy here to override this inference
  # dnsPolicy: ClusterFirst
  # JVM options specific to the master container
  jvmOptions:
  nodeSelector: {}
  # When using HA Alluxio masters, the expected startup time
  # can take over 2-3 minutes (depending on leader elections,
  # journal catch-up, etc). In that case it is recommended
  # to allow for up to at least 3 minutes with the readinessProbe,
  # though higher values may be desired for some leniancy.
  # - Note that the livenessProbe does not wait for the
  #   readinessProbe to succeed first
  #
  # eg: 3 minute startupProbe and readinessProbe
  # readinessProbe:
  #   initialDelaySeconds: 30
  #   periodSeconds: 10
  #   timeoutSeconds: 1
  #   failureThreshold: 15
  #   successThreshold: 3
  # startupProbe:
  #   initialDelaySeconds: 60
  #   periodSeconds: 30
  #   timeoutSeconds: 5
  #   failureThreshold: 4
  readinessProbe:
    initialDelaySeconds: 10
    periodSeconds: 10
    timeoutSeconds: 1
    failureThreshold: 3
    successThreshold: 1
  livenessProbe:
    initialDelaySeconds: 15
    periodSeconds: 30
    timeoutSeconds: 5
    failureThreshold: 2
  # If you are using Kubernetes 1.18+ or have the feature gate
  # for it enabled, use startupProbe to prevent the livenessProbe
  # from running until the startupProbe has succeeded
  # startupProbe:
  #   initialDelaySeconds: 15
  #   periodSeconds: 30
  #   timeoutSeconds: 5
  #   failureThreshold: 2
  tolerations: []
  podAnnotations: {}
  # The ServiceAccount provided here will have precedence over
  # the global `serviceAccount`
  serviceAccount:

jobMaster:
  args:
    - job-master
  # Properties for the jobMaster component
  properties:
  resources:
    limits:
      cpu: "4"
      memory: "8Gi"
    requests:
      cpu: "1"
      memory: "1Gi"
  ports:
    embedded: 20003
    rpc: 20001
    web: 20002
  # JVM options specific to the jobMaster container
  jvmOptions:
  readinessProbe:
    initialDelaySeconds: 10
    periodSeconds: 10
    timeoutSeconds: 1
    failureThreshold: 3
    successThreshold: 1
  livenessProbe:
    initialDelaySeconds: 15
    periodSeconds: 30
    timeoutSeconds: 5
    failureThreshold: 2
  # If you are using Kubernetes 1.18+ or have the feature gate
  # for it enabled, use startupProbe to prevent the livenessProbe
  # from running until the startupProbe has succeeded
  # startupProbe:
  #   initialDelaySeconds: 15
  #   periodSeconds: 30
  #   timeoutSeconds: 5
  #   failureThreshold: 2

# Alluxio supports journal type of UFS and EMBEDDED
# UFS journal with HDFS example
# journal:
#   type: "UFS"
#   ufsType: "HDFS"
#   folder: "hdfs://{$hostname}:{$hostport}/journal"
# EMBEDDED journal to /journal example
# journal:
#   type: "EMBEDDED"
#   folder: "/journal"
journal:
  # [ Required values ]
  type: "UFS" # One of "UFS" or "EMBEDDED"
  folder: "/journal" # Master journal directory or equivalent storage path
  #
  # [ Conditionally required values ]
  #
  ## [ UFS-backed journal options ]
  ## - required when using a UFS-type journal (journal.type="UFS")
  ##
  ## ufsType is one of "local" or "HDFS"
  ## - "local" results in a PV being allocated to each Master Pod as the journal
  ## - "HDFS" results in no PV allocation, it is up to you to ensure you have
  ##   properly configured the required Alluxio properties for Alluxio to access
  ##   the HDFS URI designated as the journal folder
  ufsType: "local"
  #
  ## [ K8s volume options ]
  ## - required when using an EMBEDDED journal (journal.type="EMBEDDED")
  ## - required when using a local UFS journal (journal.type="UFS" and journal.ufsType="local")
  ##
  ## volumeType controls the type of journal volume.
  volumeType: persistentVolumeClaim # One of "persistentVolumeClaim" or "emptyDir"
  ## size sets the requested storage capacity for a persistentVolumeClaim,
  ## or the sizeLimit on an emptyDir PV.
  size: 1Gi
  ### Unique attributes to use when the journal is persistentVolumeClaim
  storageClass: "standard"
  accessModes:
    - ReadWriteOnce
  ### Unique attributes to use when the journal is emptyDir
  medium: ""
  #
  # [ Optional values ]
  format: # Configuration for journal formatting job
    runFormat: false # Change to true to format journal


# You can enable metastore to use ROCKS DB instead of Heap
# metastore:
#   volumeType: persistentVolumeClaim # Options: "persistentVolumeClaim" or "emptyDir"
#   size: 1Gi
#   mountPath: /metastore
# # Attributes to use when the metastore is persistentVolumeClaim
#   storageClass: "standard"
#   accessModes:
#    - ReadWriteOnce
# # Attributes to use when the metastore is emptyDir
#   medium: ""


## Worker ##

worker:
  enabled: true
  env:
    # Extra environment variables for the worker pod
    # Example:
    # JAVA_HOME: /opt/java
  args:
    - worker-only
    - --no-format
  # Properties for the worker component
  properties:
  resources:
    limits:
      cpu: "4"
      memory: "4Gi"
    requests:
      cpu: "1"
      memory: "2Gi"
  ports:
    rpc: 29999
    web: 30000
  # hostPID requires escalated privileges
  hostPID: false
  hostNetwork: false
  shareProcessNamespace: false
  extraContainers: []
  extraVolumeMounts: []
  extraVolumes: []
  # dnsPolicy will be ClusterFirstWithHostNet if hostNetwork: true
  # and ClusterFirst if hostNetwork: false
  # You can specify dnsPolicy here to override this inference
  # dnsPolicy: ClusterFirst
  # JVM options specific to the worker container
  jvmOptions:
  nodeSelector: {}
  readinessProbe:
    initialDelaySeconds: 10
    periodSeconds: 10
    timeoutSeconds: 1
    failureThreshold: 3
    successThreshold: 1
  livenessProbe:
    initialDelaySeconds: 15
    periodSeconds: 30
    timeoutSeconds: 5
    failureThreshold: 2
  # If you are using Kubernetes 1.18+ or have the feature gate
  # for it enabled, use startupProbe to prevent the livenessProbe
  # from running until the startupProbe has succeeded
  # startupProbe:
  #   initialDelaySeconds: 15
  #   periodSeconds: 30
  #   timeoutSeconds: 5
  #   failureThreshold: 2
  tolerations: []
  podAnnotations: {}
  # The ServiceAccount provided here will have precedence over
  # the global `serviceAccount`
  serviceAccount:
  # Setting fuseEnabled to true will embed Fuse in worker process. The worker pods will
  # launch the Alluxio workers using privileged containers with `SYS_ADMIN` capability.
  # Be sure to give root access to the pod by setting the global user/group/fsGroup
  # values to `0` to turn on Fuse in worker.
  fuseEnabled: false

jobWorker:
  args:
    - job-worker
  # Properties for the jobWorker component
  properties:
  resources:
    limits:
      cpu: "4"
      memory: "4Gi"
    requests:
      cpu: "1"
      memory: "1Gi"
  ports:
    rpc: 30001
    data: 30002
    web: 30003
  # JVM options specific to the jobWorker container
  jvmOptions:
  readinessProbe:
    initialDelaySeconds: 10
    periodSeconds: 10
    timeoutSeconds: 1
    failureThreshold: 3
    successThreshold: 1
  livenessProbe:
    initialDelaySeconds: 15
    periodSeconds: 30
    timeoutSeconds: 5
    failureThreshold: 2
  # If you are using Kubernetes 1.18+ or have the feature gate
  # for it enabled, use startupProbe to prevent the livenessProbe
  # from running until the startupProbe has succeeded
  # startupProbe:
  #   initialDelaySeconds: 15
  #   periodSeconds: 30
  #   timeoutSeconds: 5
  #   failureThreshold: 2

# Tiered Storage
# emptyDir example
#  - level: 0
#    alias: MEM
#    mediumtype: MEM
#    path: /dev/shm
#    type: emptyDir
#    quota: 1Gi
#
# hostPath example
#  - level: 0
#    alias: MEM
#    mediumtype: MEM
#    path: /dev/shm
#    type: hostPath
#    quota: 1Gi
#
# persistentVolumeClaim example
#  - level: 1
#    alias: SSD
#    mediumtype: SSD
#    type: persistentVolumeClaim
#    name: alluxio-ssd
#    path: /dev/ssd
#    quota: 10Gi
#
# multi-part mediumtype example
#  - level: 1
#    alias: SSD,HDD
#    mediumtype: SSD,HDD
#    type: persistentVolumeClaim
#    name: alluxio-ssd,alluxio-hdd
#    path: /dev/ssd,/dev/hdd
#    quota: 10Gi,10Gi
tieredstore:
  levels:
  - level: 0
    alias: MEM
    mediumtype: MEM
    path: /dev/shm
    type: emptyDir
    quota: 1Gi
    high: 0.95
    low: 0.7

## Proxy ##
proxy:
  enabled: false # Enable this to enable the proxy for REST API
  env:
  # Extra environment variables for the Proxy pod
  # Example:
  # JAVA_HOME: /opt/java
  args:
    - proxy
  # Properties for the proxy component
  properties:
  resources:
    requests:
      cpu: "0.5"
      memory: "1Gi"
    limits:
      cpu: "4"
      memory: "4Gi"
  ports:
    web: 39999
  hostNetwork: false
  # dnsPolicy will be ClusterFirstWithHostNet if hostNetwork: true
  # and ClusterFirst if hostNetwork: false
  # You can specify dnsPolicy here to override this inference
  # dnsPolicy: ClusterFirst
  # JVM options specific to proxy containers
  jvmOptions:
  nodeSelector: {}
  tolerations: []
  podAnnotations: {}
  # The ServiceAccount provided here will have precedence over
  # the global `serviceAccount`
  serviceAccount:

# Short circuit related properties
shortCircuit:
  enabled: true
  # The policy for short circuit can be "local" or "uuid",
  # local means the cache directory is in the same mount namespace,
  # uuid means interact with domain socket
  policy: uuid
  # volumeType controls the type of shortCircuit volume.
  # It can be "persistentVolumeClaim" or "hostPath"
  volumeType: persistentVolumeClaim
  size: 1Mi
  # Attributes to use if the domain socket volume is PVC
  pvcName: alluxio-worker-domain-socket
  accessModes:
    - ReadWriteOnce
  storageClass: standard
  # Attributes to use if the domain socket volume is hostPath
  hostPath: "/tmp/alluxio-domain" # The hostPath directory to use


## FUSE ##

fuse:
  env:
    # Extra environment variables for the fuse pod
    # Example:
    # JAVA_HOME: /opt/java
  # Change both to true to deploy FUSE
  enabled: false
  clientEnabled: false
  # Properties for the fuse component
  properties:
  # Customize the MaxDirectMemorySize
  # These options are specific to the FUSE daemon
  jvmOptions:
    - "-XX:MaxDirectMemorySize=2g"
  hostNetwork: true
  # hostPID requires escalated privileges
  hostPID: true
  dnsPolicy: ClusterFirstWithHostNet
  livenessProbeEnabled: true
  livenessProbe:
    initialDelaySeconds: 15
    periodSeconds: 30
    failureThreshold: 2
  user: 0
  group: 0
  fsGroup: 0
  # Default fuse mount options separated by commas, shared by all fuse containers
  mountOptions: allow_other
  # Default fuse mount point inside fuse container, shared by all fuse containers.
  # Non-empty value is required.
  mountPoint: /mnt/alluxio-fuse
  # Default alluxio path to be mounted, shared by all fuse containers.
  alluxioPath: /
  resources:
    requests:
      cpu: "0.5"
      memory: "1Gi"
    limits:
      cpu: "4"
      memory: "4Gi"
  nodeSelector: {}
  tolerations: []
  podAnnotations: {}
  # The ServiceAccount provided here will have precedence over
  # the global `serviceAccount`
  serviceAccount:


##  Secrets ##

# Format: (<name>:<mount path under /secrets/>):
# secrets:
#   master: # Shared by master and jobMaster containers
#     alluxio-hdfs-config: hdfsConfig
#   worker: # Shared by worker and jobWorker containers
#     alluxio-hdfs-config: hdfsConfig
#   logserver: # Used by the logserver container
#     alluxio-hdfs-config: hdfsConfig


##  ConfigMaps ##

# Format: (<name>:<mount path under /configmaps/>):
# configmaps:
#   master: # Shared by master and jobMaster containers
#     alluxio-hdfs-config: hdfsConfig
#   worker: # Shared by worker and jobWorker containers
#     alluxio-hdfs-config: hdfsConfig
#   logserver: # Used by the logserver container
#     alluxio-hdfs-config: hdfsConfig


##  Metrics System ##

# Settings for Alluxio metrics. Disabled by default.
metrics:
  enabled: false
  # Enable ConsoleSink by class name
  ConsoleSink:
    enabled: false
    # Polling period for ConsoleSink
    period: 10
    # Unit of poll period
    unit: seconds
  # Enable CsvSink by class name
  CsvSink:
    enabled: false
    # Polling period for CsvSink
    period: 1
    # Unit of poll period
    unit: seconds
    # Polling directory for CsvSink, ensure this directory exists!
    directory: /tmp/alluxio-metrics
  # Enable JmxSink by class name
  JmxSink:
    enabled: false
    # Jmx domain
    domain: org.alluxio
  # Enable GraphiteSink by class name
  GraphiteSink:
    enabled: false
    # Hostname of Graphite server
    host: NONE
    # Port of Graphite server
    port: NONE
    # Poll period
    period: 10
    # Unit of poll period
    unit: seconds
    # Prefix to prepend to metric name
    prefix: ""
  # Enable Slf4jSink by class name
  Slf4jSink:
    enabled: false
    # Poll period
    period: 10
    # Units of poll period
    unit: seconds
    # Contains all metrics
    filterClass: null
    # Contains all metrics
    filterRegex: null
  # Enable PrometheusMetricsServlet by class name
  PrometheusMetricsServlet:
    enabled: false
  # Pod annotations for Prometheus
  # podAnnotations:
  #   prometheus.io/scrape: "true"
  #   prometheus.io/port: "19999"
  #   prometheus.io/path: "/metrics/prometheus/"
  podAnnotations: {}

# Remote logging server
logserver:
  enabled: false
  replicas: 1
  env:
  # Extra environment variables for the logserver pod
  # Example:
  # JAVA_HOME: /opt/java
  args: # Arguments to Docker entrypoint
    - logserver
  # Properties for the logserver component
  properties:
  resources:
    # The default xmx is 8G
    limits:
      cpu: "4"
      memory: "8Gi"
    requests:
      cpu: "1"
      memory: "1Gi"
  ports:
    logging: 45600
  hostPID: false
  hostNetwork: false
  # dnsPolicy will be ClusterFirstWithHostNet if hostNetwork: true
  # and ClusterFirst if hostNetwork: false
  # You can specify dnsPolicy here to override this inference
  # dnsPolicy: ClusterFirst
  # JVM options specific to the logserver container
  jvmOptions:
  nodeSelector: {}
  tolerations: []
  # The strategy field corresponds to the .spec.strategy field for the deployment
  # This specifies the strategy used to replace old Pods by new ones
  # https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy
  # The default is Recreate which kills the existing Pod before creating a new one
  # Note: When using RWO PVCs, the strategy MUST be Recreate, because the PVC cannot
  # be passed from the old Pod to the new one
  # When using RWX PVCs, you can use RollingUpdate strategy to ensure zero down time
  # Example:
  # strategy:
  #   type: RollingUpdate
  #   rollingUpdate:
  #     maxUnavailable: 25%
  #     maxSurge: 1
  strategy:
    type: Recreate
  # volumeType controls the type of log volume.
  # It can be "persistentVolumeClaim" or "hostPath" or "emptyDir"
  volumeType: persistentVolumeClaim
  # Attributes to use if the log volume is PVC
  pvcName: alluxio-logserver-logs
  # Note: If using RWO, the strategy MUST be Recreate
  # If using RWX, the strategy can be RollingUpdate
  accessModes:
    - ReadWriteOnce
  storageClass: standard
  # If you are dynamically provisioning PVs, the selector on the PVC should be empty.
  # Ref: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#class-1
  selector: {}
  # If you are manually allocating PV for the logserver,
  # it is recommended to use selectors to make sure the PV and PVC match as expected.
  # You can specify selectors like below:
  # Example:
  # selector:
  #   matchLabels:
  #     role: alluxio-logserver
  #     app: alluxio
  #     chart: alluxio-<chart version>
  #     release: alluxio
  #     heritage: Helm
  #     dc: data-center-1
  #     region: us-east

  # Attributes to use if the log volume is hostPath
  hostPath: "/tmp/alluxio-logs" # The hostPath directory to use
  # Attributes to use when the log volume is emptyDir
  medium: ""
  size: 4Gi

# The pod's HostAliases. HostAliases is an optional list of hosts and IPs that will be injected into the pod's hosts file if specified.
# It is mainly to provide the external host addresses for services not in the K8s cluster, like HDFS.
# Example:
# hostAliases:
# - ip: "192.168.0.1"
#   hostnames:
#     - "example1.com"
#     - "example2.com"

# kubernetes CSI plugin
csi:
  enabled: false
  imagePullPolicy: IfNotPresent
  controllerPlugin:
    hostNetwork: true
    dnsPolicy: ClusterFirstWithHostNet
    provisioner:
      # for kubernetes 1.17 or above
      image: k8s.gcr.io/sig-storage/csi-provisioner:v2.0.5
      resources:
        limits:
          cpu: 100m
          memory: 300Mi
        requests:
          cpu: 10m
          memory: 20Mi
    controller:
      resources:
        limits:
          cpu: 200m
          memory: 200Mi
        requests:
          cpu: 10m
          memory: 20Mi
  # Run alluxio fuse process inside csi nodeserver container if mountInPod = false
  # Run alluxio fuse process inside a separate pod if mountInPod = true
  mountInPod: false
  nodePlugin:
    hostNetwork: true
    dnsPolicy: ClusterFirstWithHostNet
    nodeserver:
      resources:
        # fuse in nodeserver container needs more resources
        limits:
          cpu: "4"
          memory: "8Gi"
        requests:
          cpu: "1"
          memory: "1Gi"
    driverRegistrar:
      image: k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.0.0
      resources:
        limits:
          cpu: 100m
          memory: 100Mi
        requests:
          cpu: 10m
          memory: 20Mi
  # for csi client
  clientEnabled: false
  accessModes:
    - ReadWriteOnce
  quota: 100Gi
  mountPath: /data
  alluxioPath: /
  mountOptions:
    - direct_io
    - allow_other
    - entry_timeout=36000
    - attr_timeout=36000
    - max_readahead=0
  javaOptions: "-Dalluxio.user.metadata.cache.enabled=true "

root@zhiyong-ksp1:/home/zhiyong#

可以看出Alluxio的配置项很多,默认已经写好了一些配置,例如从DockerHub官方镜像库拉取2.8.1的镜像,配置了各种端口,还有request请求最少资源及limits限制最大资源。默认的持久化策略是将一个持久卷本地挂载在master Pod的位置 /journal。看起来一切都比较正常,并没有什么匪夷所思的骚操作,笔者感觉可以接受默认的配置。如果不能接受默认的配置,就需要进行修改。官网也列举了如下修改持久化策略的案例:

在这里插入图片描述

分别是:

Example: Amazon S3 as the under store

Example: Single Master and Journal in a Persistent Volume

Example: 下方举例说明如何将一个持久卷挂载在本地master pod

Example: HDFS as Journal

Example: Multi-master with Embedded Journal in Persistent Volumes

Example: Multi-master with Embedded Journal in emptyDir Volumes

Example: HDFS as the under store

Example: Off-heap Metastore Management in Persistent Volumes

Example: Off-heap Metastore Management in emptyDir Volumes

Example: Multiple Secrets

Examples: Alluxio Storage Management

可以根据需要进行修改。例如可以修改使用S3对象存储或者HDFS作为底层的持久化卷,使用tempdir这种临时卷存储【与pod生命周期相同】。

不再赘述。

修改配置

root@zhiyong-ksp1:~# mkdir -p /home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911
root@zhiyong-ksp1:~# cd /home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# vim alluxioconfig.yaml
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# ll
总用量 16
drwxr-xr-x 2 root root 4096 911 19:11 ./
drwxr-xr-x 3 root root 4096 911 18:56 ../
-rw-r--r-- 1 root root 7172 911 19:11 alluxioconfig.yaml

内容和上方看到的描述文件相同:

fullnameOverride: alluxio
image: alluxio/alluxio
imageTag: 2.8.1
imagePullPolicy: IfNotPresent
user: 1000
group: 1000
fsGroup: 1000
serviceAccount:
imagePullSecrets:
properties:
  alluxio.security.stale.channel.purge.interval: 365d
master:
  enabled: true
  count: 1 # Controls the number of StatefulSets. For multiMaster mode increase this to >1.
  replicas: 1 # Controls #replicas in a StatefulSet and should not be modified in the usual case.
  env:
  args: # Arguments to Docker entrypoint
    - master-only
    - --no-format
  properties:
  resources:
    # The default xmx is 8G
    limits:
      cpu: "4"
      memory: "8Gi"
    requests:
      cpu: "1"
      memory: "1Gi"
  ports:
    embedded: 19200
    rpc: 19998
    web: 19999
  hostPID: false
  hostNetwork: false
  shareProcessNamespace: false
  extraContainers: []
  extraVolumeMounts: []
  extraVolumes: []
  extraServicePorts: []
  jvmOptions:
  nodeSelector: {}
  readinessProbe:
    initialDelaySeconds: 10
    periodSeconds: 10
    timeoutSeconds: 1
    failureThreshold: 3
    successThreshold: 1
  livenessProbe:
    initialDelaySeconds: 15
    periodSeconds: 30
    timeoutSeconds: 5
    failureThreshold: 2
  tolerations: []
  podAnnotations: {}
  serviceAccount:

jobMaster:
  args:
    - job-master
  properties:
  resources:
    limits:
      cpu: "4"
      memory: "8Gi"
    requests:
      cpu: "1"
      memory: "1Gi"
  ports:
    embedded: 20003
    rpc: 20001
    web: 20002
  jvmOptions:
  readinessProbe:
    initialDelaySeconds: 10
    periodSeconds: 10
    timeoutSeconds: 1
    failureThreshold: 3
    successThreshold: 1
  livenessProbe:
    initialDelaySeconds: 15
    periodSeconds: 30
    timeoutSeconds: 5
    failureThreshold: 2
journal:
  type: "UFS" # One of "UFS" or "EMBEDDED"
  folder: "/journal" # Master journal directory or equivalent storage path
  ufsType: "local"
  volumeType: persistentVolumeClaim # One of "persistentVolumeClaim" or "emptyDir"
  size: 1Gi
  storageClass: "standard"
  accessModes:
    - ReadWriteOnce
  medium: ""
  format: # Configuration for journal formatting job
    runFormat: false # Change to true to format journal
worker:
  enabled: true
  env:
  args:
    - worker-only
    - --no-format
  properties:
  resources:
    limits:
      cpu: "4"
      memory: "4Gi"
    requests:
      cpu: "1"
      memory: "2Gi"
  ports:
    rpc: 29999
    web: 30000
  hostPID: false
  hostNetwork: false
  shareProcessNamespace: false
  extraContainers: []
  extraVolumeMounts: []
  extraVolumes: []
  jvmOptions:
  nodeSelector: {}
  readinessProbe:
    initialDelaySeconds: 10
    periodSeconds: 10
    timeoutSeconds: 1
    failureThreshold: 3
    successThreshold: 1
  livenessProbe:
    initialDelaySeconds: 15
    periodSeconds: 30
    timeoutSeconds: 5
    failureThreshold: 2
  tolerations: []
  podAnnotations: {}
  serviceAccount:
  fuseEnabled: false

jobWorker:
  args:
    - job-worker
  properties:
  resources:
    limits:
      cpu: "4"
      memory: "4Gi"
    requests:
      cpu: "1"
      memory: "1Gi"
  ports:
    rpc: 30001
    data: 30002
    web: 30003
  jvmOptions:
  readinessProbe:
    initialDelaySeconds: 10
    periodSeconds: 10
    timeoutSeconds: 1
    failureThreshold: 3
    successThreshold: 1
  livenessProbe:
    initialDelaySeconds: 15
    periodSeconds: 30
    timeoutSeconds: 5
    failureThreshold: 2
tieredstore:
  levels:
  - level: 0
    alias: MEM
    mediumtype: MEM
    path: /dev/shm
    type: emptyDir
    quota: 1Gi
    high: 0.95
    low: 0.7

proxy:
  enabled: false # Enable this to enable the proxy for REST API
  env:
  args:
    - proxy
  properties:
  resources:
    requests:
      cpu: "0.5"
      memory: "1Gi"
    limits:
      cpu: "4"
      memory: "4Gi"
  ports:
    web: 39999
  hostNetwork: false
  jvmOptions:
  nodeSelector: {}
  tolerations: []
  podAnnotations: {}
  serviceAccount:

shortCircuit:
  enabled: true
  policy: uuid
  volumeType: persistentVolumeClaim
  size: 1Mi
  pvcName: alluxio-worker-domain-socket
  accessModes:
    - ReadWriteOnce
  storageClass: standard
  hostPath: "/tmp/alluxio-domain" # The hostPath directory to use


## FUSE ##

fuse:
  env:
  enabled: false
  clientEnabled: false
  properties:
  jvmOptions:
    - "-XX:MaxDirectMemorySize=2g"
  hostNetwork: true
  hostPID: true
  dnsPolicy: ClusterFirstWithHostNet
  livenessProbeEnabled: true
  livenessProbe:
    initialDelaySeconds: 15
    periodSeconds: 30
    failureThreshold: 2
  user: 0
  group: 0
  fsGroup: 0
  mountOptions: allow_other
  mountPoint: /mnt/alluxio-fuse
  alluxioPath: /
  resources:
    requests:
      cpu: "0.5"
      memory: "1Gi"
    limits:
      cpu: "4"
      memory: "4Gi"
  nodeSelector: {}
  tolerations: []
  podAnnotations: {}
  serviceAccount:

metrics:
  enabled: false
  ConsoleSink:
    enabled: false
    period: 10
    unit: seconds
  CsvSink:
    enabled: false
    period: 1
    unit: seconds
    directory: /tmp/alluxio-metrics
  JmxSink:
    enabled: false
    domain: org.alluxio
  GraphiteSink:
    enabled: false
    host: NONE
    port: NONE
    period: 10
    unit: seconds
    prefix: ""
  Slf4jSink:
    enabled: false
    period: 10
    unit: seconds
    filterClass: null
    filterRegex: null
  PrometheusMetricsServlet:
    enabled: false
  podAnnotations: {}
logserver:
  enabled: false
  replicas: 1
  env:
  args: # Arguments to Docker entrypoint
    - logserver
  properties:
  resources:
    limits:
      cpu: "4"
      memory: "8Gi"
    requests:
      cpu: "1"
      memory: "1Gi"
  ports:
    logging: 45600
  hostPID: false
  hostNetwork: false
  jvmOptions:
  nodeSelector: {}
  tolerations: []
  strategy:
    type: Recreate
  volumeType: persistentVolumeClaim
  pvcName: alluxio-logserver-logs
  accessModes:
    - ReadWriteOnce
  storageClass: standard
  selector: {}

  hostPath: "/tmp/alluxio-logs" # The hostPath directory to use
  medium: ""
  size: 4Gi


csi:
  enabled: false
  imagePullPolicy: IfNotPresent
  controllerPlugin:
    hostNetwork: true
    dnsPolicy: ClusterFirstWithHostNet
    provisioner:
      image: k8s.gcr.io/sig-storage/csi-provisioner:v2.0.5
      resources:
        limits:
          cpu: 100m
          memory: 300Mi
        requests:
          cpu: 10m
          memory: 20Mi
    controller:
      resources:
        limits:
          cpu: 200m
          memory: 200Mi
        requests:
          cpu: 10m
          memory: 20Mi
  mountInPod: false
  nodePlugin:
    hostNetwork: true
    dnsPolicy: ClusterFirstWithHostNet
    nodeserver:
      resources:
        limits:
          cpu: "4"
          memory: "8Gi"
        requests:
          cpu: "1"
          memory: "1Gi"
    driverRegistrar:
      image: k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.0.0
      resources:
        limits:
          cpu: 100m
          memory: 100Mi
        requests:
          cpu: 10m
          memory: 20Mi
  clientEnabled: false
  accessModes:
    - ReadWriteOnce
  quota: 100Gi
  mountPath: /data
  alluxioPath: /
  mountOptions:
    - direct_io
    - allow_other
    - entry_timeout=36000
    - attr_timeout=36000
    - max_readahead=0
  javaOptions: "-Dalluxio.user.metadata.cache.enabled=true "

太长了,笔者删除了其中的注释内容。。。读者根据自己的需要修改配置即可【例如机器资源不足,需要减少配额;或者挂载HDFS作为底层存储】

安装

root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# helm install alluxio -f /home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911/alluxioconfig.yaml alluxio-charts/alluxio
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl get pods -owide --all-namespaces | grep alluxio
default                        alluxio-master-0                                                  0/2     Pending             0             3m34s   <none>          <none>         <none>           <none>
default                        alluxio-worker-rczrd                                              0/2     Pending             0             3m36s   <none>          <none>         <none>           <none>

可以看到此时alluxio的pod一如既往的Pending了。

定位Pod失败的原因

先查看2个失败的master有哪些日志:

root@zhiyong-ksp1:~# kubectl describe pod alluxio-master-0
Name:           alluxio-master-0
Namespace:      default
Priority:       0
Node:           <none>
Labels:         app=alluxio
                chart=alluxio-0.6.48
                controller-revision-hash=alluxio-master-5bb869cb7d
                heritage=Helm
                name=alluxio-master
                release=alluxio
                role=alluxio-master
                statefulset.kubernetes.io/pod-name=alluxio-master-0
Annotations:    <none>
Status:         Pending
IP:
IPs:            <none>
Controlled By:  StatefulSet/alluxio-master
Containers:
  alluxio-master:
    Image:       alluxio/alluxio:2.8.1
    Ports:       19998/TCP, 19999/TCP
    Host Ports:  0/TCP, 0/TCP
    Command:
      tini
      --
      /entrypoint.sh
    Args:
      master-only
      --no-format
    Limits:
      cpu:     4
      memory:  8Gi
    Requests:
      cpu:      1
      memory:   1Gi
    Liveness:   tcp-socket :rpc delay=15s timeout=5s period=30s #success=1 #failure=2
    Readiness:  tcp-socket :rpc delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      alluxio-config  ConfigMap  Optional: false
    Environment:
      ALLUXIO_MASTER_HOSTNAME:   (v1:status.podIP)
    Mounts:
      /journal from alluxio-journal (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hg5br (ro)
  alluxio-job-master:
    Image:       alluxio/alluxio:2.8.1
    Ports:       20001/TCP, 20002/TCP
    Host Ports:  0/TCP, 0/TCP
    Command:
      tini
      --
      /entrypoint.sh
    Args:
      job-master
    Limits:
      cpu:     4
      memory:  8Gi
    Requests:
      cpu:      1
      memory:   1Gi
    Liveness:   tcp-socket :job-rpc delay=15s timeout=5s period=30s #success=1 #failure=2
    Readiness:  tcp-socket :job-rpc delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      alluxio-config  ConfigMap  Optional: false
    Environment:
      ALLUXIO_MASTER_HOSTNAME:   (v1:status.podIP)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hg5br (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  alluxio-journal:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  alluxio-journal-alluxio-master-0
    ReadOnly:   false
  kube-api-access-hg5br:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age    From               Message
  ----     ------            ----   ----               -------
  Warning  FailedScheduling  6m42s  default-scheduler  0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
  Warning  FailedScheduling  97s    default-scheduler  0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.

可以看到持久卷pv没有绑定,导致Pod启动失败:

在这里插入图片描述

显然持久卷声明pvc已挂载,但是状态是等待中。。。

在这里插入图片描述

并且显示storageclass没有发现standard。。。

root@zhiyong-ksp1:~# kubectl get storageclass
NAME              PROVISIONER        RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
local (default)   openebs.io/local   Delete          WaitForFirstConsumer   false                  34d
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl describe storageclass local
Name:            local
IsDefaultClass:  Yes
Annotations:     cas.openebs.io/config=- name: StorageType
  value: "hostpath"
- name: BasePath
  value: "/var/openebs/local/"
,kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"cas.openebs.io/config":"- name: StorageType\n  value: \"hostpath\"\n- name: BasePath\n  value: \"/var/openebs/local/\"\n","openebs.io/cas-type":"local","storageclass.beta.kubernetes.io/is-default-class":"true","storageclass.kubesphere.io/supported-access-modes":"[\"ReadWriteOnce\"]"},"name":"local"},"provisioner":"openebs.io/local","reclaimPolicy":"Delete","volumeBindingMode":"WaitForFirstConsumer"}
,openebs.io/cas-type=local,storageclass.beta.kubernetes.io/is-default-class=true,storageclass.kubesphere.io/supported-access-modes=["ReadWriteOnce"]
Provisioner:           openebs.io/local
Parameters:            <none>
AllowVolumeExpansion:  <unset>
MountOptions:          <none>
ReclaimPolicy:         Delete
VolumeBindingMode:     WaitForFirstConsumer
Events:                <none>
root@zhiyong-ksp1:~# kubectl get pvc
NAME                               STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
alluxio-journal-alluxio-master-0   Pending                                      standard       21m
alluxio-worker-domain-socket       Pending                                      standard       21m
root@zhiyong-ksp1:~# kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                                              STORAGECLASS   REASON   AGE
pvc-23fc88d9-da65-47fc-80e6-b15976a8bcf4   20Gi       RWO            Delete           Bound    kubesphere-monitoring-system/prometheus-k8s-db-prometheus-k8s-0    local                   33d
pvc-402e21a4-a811-46a7-b75b-e295512bab25   4Gi        RWO            Delete           Bound    kubesphere-logging-system/data-elasticsearch-logging-discovery-0   local                   25d
pvc-5d4597cd-404d-4bd8-8b9a-71f32c44f1d1   8Gi        RWO            Delete           Bound    kubesphere-devops-system/devops-jenkins                            local                   24d
pvc-861a6ff8-7a6b-407e-bb73-aef721ef586d   20Gi       RWO            Delete           Bound    kubesphere-logging-system/data-elasticsearch-logging-data-0        local                   25d
pvc-a777e6f9-c564-419f-85fd-23bee491ef19   20Gi       RWO            Delete           Bound    kubesphere-system/minio                                            local                   24d
pvc-cad97ef9-8fed-4540-a9d5-b91331a193f8   2Gi        RWO            Delete           Bound    kubesphere-system/openldap-pvc-openldap-0                          local                   24d

确实是没有。

解决storageclass找不到standard的问题

在这里插入图片描述

将yaml中的:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: alluxio-journal-alluxio-master-0
  namespace: default
  labels:
    app: alluxio
    name: alluxio-master
    role: alluxio-master
  finalizers:
    - kubernetes.io/pvc-protection
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: standard
  volumeMode: Filesystem

这个找不到的内容为standard的配置项storageClassName的值修改为已存在的local会报错:

在这里插入图片描述

显然此时不能直接修改。于是只好改变方式,先卸载:

root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# helm list
NAME    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART           APP VERSION
alluxio default         1               2022-09-11 19:13:31.30716236 +0800 CST  deployed        alluxio-0.6.48
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# helm delete alluxio
release "alluxio" uninstalled
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# helm list
NAME    NAMESPACE       REVISION        UPDATED STATUS  CHART   APP VERSION

并删除之前的PVC:

在这里插入图片描述

查看当前部署情况:

root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl get deployment --all-namespaces
NAMESPACE                      NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
argocd                         devops-argocd-applicationset-controller   1/1     1            1           24d
argocd                         devops-argocd-dex-server                  1/1     1            1           24d
argocd                         devops-argocd-notifications-controller    1/1     1            1           24d
argocd                         devops-argocd-redis                       1/1     1            1           24d
argocd                         devops-argocd-repo-server                 1/1     1            1           24d
argocd                         devops-argocd-server                      1/1     1            1           24d
istio-system                   istiod-1-11-2                             1/1     1            1           25d
istio-system                   jaeger-collector                          1/1     1            1           25d
istio-system                   jaeger-operator                           1/1     1            1           25d
istio-system                   jaeger-query                              1/1     1            1           25d
istio-system                   kiali                                     1/1     1            1           25d
istio-system                   kiali-operator                            1/1     1            1           25d
kube-system                    calico-kube-controllers                   1/1     1            1           34d
kube-system                    coredns                                   2/2     2            2           34d
kube-system                    openebs-localpv-provisioner               1/1     1            1           34d
kubesphere-controls-system     default-http-backend                      1/1     1            1           34d
kubesphere-controls-system     kubectl-admin                             1/1     1            1           34d
kubesphere-devops-system       devops-apiserver                          1/1     1            1           24d
kubesphere-devops-system       devops-controller                         1/1     1            1           24d
kubesphere-devops-system       devops-jenkins                            1/1     1            1           24d
kubesphere-monitoring-system   kube-state-metrics                        1/1     1            1           34d
kubesphere-monitoring-system   notification-manager-deployment           1/1     1            1           34d
kubesphere-monitoring-system   notification-manager-operator             1/1     1            1           34d
kubesphere-monitoring-system   prometheus-operator                       1/1     1            1           34d
kubesphere-system              ks-apiserver                              1/1     1            1           34d
kubesphere-system              ks-console                                1/1     1            1           34d
kubesphere-system              ks-controller-manager                     1/1     1            1           34d
kubesphere-system              ks-installer                              1/1     1            1           34d
kubesphere-system              minio                                     1/1     1            1           24d
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911#

查看之前创建成功的pvc:

在这里插入图片描述

显然这个值为local的storageClass就是Filesystem类型的存储卷【基于openebs】,如果使用了NFS等其它类型的持久化,就需要自行手动创建storageclass。

root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# ll
总用量 16
drwxr-xr-x 2 root root 4096 911 19:12 ./
drwxr-xr-x 3 root root 4096 911 18:56 ../
-rw-r--r-- 1 root root 7172 911 19:11 alluxioconfig.yaml
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# vim alluxioconfig.yaml

目前需要手动创建standard的storageClass才能进行下一步。。。

创建storageClass

root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# vim standardstorageclass.yaml
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl  create -f standardstorageclass.yaml
storageclass.storage.k8s.io/standard created
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl get storageclass
NAME              PROVISIONER                    RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
local (default)   openebs.io/local               Delete          WaitForFirstConsumer   false                  34d
standard          kubernetes.io/no-provisioner   Delete          WaitForFirstConsumer   false                  15s
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911#

内容如下:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: standard
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer

指定为延迟绑定模式。此时可以看到创建了一个新的storageclass。

再次安装

root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# helm install alluxio -f /home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911/alluxioconfig.yaml alluxio-charts/alluxio
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl get pods -owide --all-namespaces | grep alluxio
default                        alluxio-master-0                                                  0/2     Pending             0             92s    <none>          <none>         <none>           <none>
default                        alluxio-worker-z6hkq                                              0/2     Pending             0             93s    <none>          <none>         <none>           <none>
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl describe pod alluxio-master-0
Name:           alluxio-master-0
Namespace:      default
Priority:       0
Node:           <none>
Labels:         app=alluxio
                chart=alluxio-0.6.48
                controller-revision-hash=alluxio-master-5bb869cb7d
                heritage=Helm
                name=alluxio-master
                release=alluxio
                role=alluxio-master
                statefulset.kubernetes.io/pod-name=alluxio-master-0
Annotations:    <none>
Status:         Pending
IP:
IPs:            <none>
Controlled By:  StatefulSet/alluxio-master
Containers:
  alluxio-master:
    Image:       alluxio/alluxio:2.8.1
    Ports:       19998/TCP, 19999/TCP
    Host Ports:  0/TCP, 0/TCP
    Command:
      tini
      --
      /entrypoint.sh
    Args:
      master-only
      --no-format
    Limits:
      cpu:     4
      memory:  8Gi
    Requests:
      cpu:      1
      memory:   1Gi
    Liveness:   tcp-socket :rpc delay=15s timeout=5s period=30s #success=1 #failure=2
    Readiness:  tcp-socket :rpc delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      alluxio-config  ConfigMap  Optional: false
    Environment:
      ALLUXIO_MASTER_HOSTNAME:   (v1:status.podIP)
    Mounts:
      /journal from alluxio-journal (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-88v9g (ro)
  alluxio-job-master:
    Image:       alluxio/alluxio:2.8.1
    Ports:       20001/TCP, 20002/TCP
    Host Ports:  0/TCP, 0/TCP
    Command:
      tini
      --
      /entrypoint.sh
    Args:
      job-master
    Limits:
      cpu:     4
      memory:  8Gi
    Requests:
      cpu:      1
      memory:   1Gi
    Liveness:   tcp-socket :job-rpc delay=15s timeout=5s period=30s #success=1 #failure=2
    Readiness:  tcp-socket :job-rpc delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      alluxio-config  ConfigMap  Optional: false
    Environment:
      ALLUXIO_MASTER_HOSTNAME:   (v1:status.podIP)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-88v9g (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  alluxio-journal:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  alluxio-journal-alluxio-master-0
    ReadOnly:   false
  kube-api-access-88v9g:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age    From               Message
  ----     ------            ----   ----               -------
  Warning  FailedScheduling  2m10s  default-scheduler  0/1 nodes are available: 1 node(s) didn't find available persistent volumes to bind. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911#

可以看出此时并没有严重的错误,只是没有可用的持久卷去绑定,导致pod一致起不来。接下来就是需要手动创建一些pv给pod使用。

创建pv

参照K8S的官网:https://kubernetes.io/docs/concepts/storage/persistent-volumes/

及Alluxio官网,默认每个日志卷应至少1GI,因为每个alluxio master Pod将有一个请求1Gi存储的PersistentVolumeClaim。。。

root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# vim alluxio-master-journal-pv.yaml
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl create -f alluxio-master-journal-pv.yaml
persistentvolume/alluxio-journal-0 created
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                                                              STORAGECLASS   REASON   AGE
alluxio-journal-0                          4Gi        RWO            Retain           Available                                                                      standard                9s
pvc-23fc88d9-da65-47fc-80e6-b15976a8bcf4   20Gi       RWO            Delete           Bound       kubesphere-monitoring-system/prometheus-k8s-db-prometheus-k8s-0    local                   34d
pvc-402e21a4-a811-46a7-b75b-e295512bab25   4Gi        RWO            Delete           Bound       kubesphere-logging-system/data-elasticsearch-logging-discovery-0   local                   25d
pvc-5d4597cd-404d-4bd8-8b9a-71f32c44f1d1   8Gi        RWO            Delete           Bound       kubesphere-devops-system/devops-jenkins                            local                   25d
pvc-861a6ff8-7a6b-407e-bb73-aef721ef586d   20Gi       RWO            Delete           Bound       kubesphere-logging-system/data-elasticsearch-logging-data-0        local                   25d
pvc-a777e6f9-c564-419f-85fd-23bee491ef19   20Gi       RWO            Delete           Bound       kubesphere-system/minio                                            local                   25d
pvc-cad97ef9-8fed-4540-a9d5-b91331a193f8   2Gi        RWO            Delete           Bound       kubesphere-system/openldap-pvc-openldap-0                          local                   25d
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911#

yaml的内容:

kind: PersistentVolume
apiVersion: v1
metadata:
  name: alluxio-journal-0
  labels:
    type: local
spec:
  storageClassName: standard
  capacity:
    storage: 4Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: /tmp/alluxio-journal-0

可用看到此时pv成功构建。稍等片刻:

root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                                              STORAGECLASS   REASON   AGE
alluxio-journal-0                          4Gi        RWO            Retain           Bound    default/alluxio-journal-alluxio-master-0                           standard                4m31s
pvc-23fc88d9-da65-47fc-80e6-b15976a8bcf4   20Gi       RWO            Delete           Bound    kubesphere-monitoring-system/prometheus-k8s-db-prometheus-k8s-0    local                   34d
pvc-402e21a4-a811-46a7-b75b-e295512bab25   4Gi        RWO            Delete           Bound    kubesphere-logging-system/data-elasticsearch-logging-discovery-0   local                   25d
pvc-5d4597cd-404d-4bd8-8b9a-71f32c44f1d1   8Gi        RWO            Delete           Bound    kubesphere-devops-system/devops-jenkins                            local                   25d
pvc-861a6ff8-7a6b-407e-bb73-aef721ef586d   20Gi       RWO            Delete           Bound    kubesphere-logging-system/data-elasticsearch-logging-data-0        local                   25d
pvc-a777e6f9-c564-419f-85fd-23bee491ef19   20Gi       RWO            Delete           Bound    kubesphere-system/minio                                            local                   25d
pvc-cad97ef9-8fed-4540-a9d5-b91331a193f8   2Gi        RWO            Delete           Bound    kubesphere-system/openldap-pvc-openldap-0                          local                   25d
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911#

可以看到这个pv已经被自动绑定:

在这里插入图片描述

但是4G有点小,先修改为20G:

root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl edit pv/alluxio-journal-0 -o yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/bound-by-controller: "yes"
  creationTimestamp: "2022-09-11T13:54:21Z"
  finalizers:
  - kubernetes.io/pv-protection
  labels:
    type: local
  name: alluxio-journal-0
  resourceVersion: "270280"
  uid: 9823b56d-79d2-4a97-be65-1f8c7770386d
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 20Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: alluxio-journal-alluxio-master-0
    namespace: default
    resourceVersion: "262629"
    uid: b06b8135-b7ca-4841-894c-0f0532fdcca9
  hostPath:
    path: /tmp/alluxio-journal-0
    type: ""
  persistentVolumeReclaimPolicy: Retain
  storageClassName: standard
  volumeMode: Filesystem
status:
  phase: Bound
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                                              STORAGECLASS   REASON   AGE
alluxio-journal-0                          20Gi       RWO            Retain           Bound    default/alluxio-journal-alluxio-master-0                           standard                9m
pvc-23fc88d9-da65-47fc-80e6-b15976a8bcf4   20Gi       RWO            Delete           Bound    kubesphere-monitoring-system/prometheus-k8s-db-prometheus-k8s-0    local                   34d
pvc-402e21a4-a811-46a7-b75b-e295512bab25   4Gi        RWO            Delete           Bound    kubesphere-logging-system/data-elasticsearch-logging-discovery-0   local                   25d
pvc-5d4597cd-404d-4bd8-8b9a-71f32c44f1d1   8Gi        RWO            Delete           Bound    kubesphere-devops-system/devops-jenkins                            local                   25d
pvc-861a6ff8-7a6b-407e-bb73-aef721ef586d   20Gi       RWO            Delete           Bound    kubesphere-logging-system/data-elasticsearch-logging-data-0        local                   25d
pvc-a777e6f9-c564-419f-85fd-23bee491ef19   20Gi       RWO            Delete           Bound    kubesphere-system/minio                                            local                   25d
pvc-cad97ef9-8fed-4540-a9d5-b91331a193f8   2Gi        RWO            Delete           Bound    kubesphere-system/openldap-pvc-openldap-0                          local                   25d
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911#

再来创建个worker的pv:

root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# vim alluxio-worker-journal-pv.yaml
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl create -f alluxio-worker-journal-pv.yaml
persistentvolume/alluxio-journal-1 created
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                                                              STORAGECLASS   REASON   AGE
alluxio-journal-0                          20Gi       RWO            Retain           Bound       default/alluxio-journal-alluxio-master-0                           standard                13m
alluxio-journal-1                          30Gi       RWO            Retain           Available                                                                      standard                6s
pvc-23fc88d9-da65-47fc-80e6-b15976a8bcf4   20Gi       RWO            Delete           Bound       kubesphere-monitoring-system/prometheus-k8s-db-prometheus-k8s-0    local                   34d
pvc-402e21a4-a811-46a7-b75b-e295512bab25   4Gi        RWO            Delete           Bound       kubesphere-logging-system/data-elasticsearch-logging-discovery-0   local                   25d
pvc-5d4597cd-404d-4bd8-8b9a-71f32c44f1d1   8Gi        RWO            Delete           Bound       kubesphere-devops-system/devops-jenkins                            local                   25d
pvc-861a6ff8-7a6b-407e-bb73-aef721ef586d   20Gi       RWO            Delete           Bound       kubesphere-logging-system/data-elasticsearch-logging-data-0        local                   25d
pvc-a777e6f9-c564-419f-85fd-23bee491ef19   20Gi       RWO            Delete           Bound       kubesphere-system/minio                                            local                   25d
pvc-cad97ef9-8fed-4540-a9d5-b91331a193f8   2Gi        RWO            Delete           Bound       kubesphere-system/openldap-pvc-openldap-0                          local                   25d
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911#

内容差不多:

kind: PersistentVolume
apiVersion: v1
metadata:
  name: alluxio-journal-1
  labels:
    type: local
spec:
  storageClassName: standard
  capacity:
    storage: 30Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: /tmp/alluxio-journal-1

内容参照:

Name:            alluxio-journal-2
Labels:          type=local
Annotations:     pv.kubernetes.io/bound-by-controller: yes
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    standard
Status:          Bound
Claim:           default/alluxio-worker-jjgcv
Reclaim Policy:  Retain
Access Modes:    RWO
VolumeMode:      Filesystem
Capacity:        30Gi
Node Affinity:   <none>
Message:
Source:
    Type:          HostPath (bare host directory volume)
    Path:          /tmp/alluxio-journal-2
    HostPathType:
Events:            <none>

解决Master的容器创建卡死的问题

root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl describe pod alluxio-master-0
Name:           alluxio-master-0
Namespace:      default
Priority:       0
Node:           zhiyong-ksp1/192.168.88.20
Start Time:     Sun, 11 Sep 2022 21:54:32 +0800
Labels:         app=alluxio
                chart=alluxio-0.6.48
                controller-revision-hash=alluxio-master-5bb869cb7d
                heritage=Helm
                name=alluxio-master
                release=alluxio
                role=alluxio-master
                statefulset.kubernetes.io/pod-name=alluxio-master-0
Annotations:    <none>
Status:         Pending
IP:
IPs:            <none>
Controlled By:  StatefulSet/alluxio-master
Containers:
  alluxio-master:
    Container ID:
    Image:         alluxio/alluxio:2.8.1
    Image ID:
    Ports:         19998/TCP, 19999/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      tini
      --
      /entrypoint.sh
    Args:
      master-only
      --no-format
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     4
      memory:  8Gi
    Requests:
      cpu:      1
      memory:   1Gi
    Liveness:   tcp-socket :rpc delay=15s timeout=5s period=30s #success=1 #failure=2
    Readiness:  tcp-socket :rpc delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      alluxio-config  ConfigMap  Optional: false
    Environment:
      ALLUXIO_MASTER_HOSTNAME:   (v1:status.podIP)
    Mounts:
      /journal from alluxio-journal (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-88v9g (ro)
  alluxio-job-master:
    Container ID:
    Image:         alluxio/alluxio:2.8.1
    Image ID:
    Ports:         20001/TCP, 20002/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      tini
      --
      /entrypoint.sh
    Args:
      job-master
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     4
      memory:  8Gi
    Requests:
      cpu:      1
      memory:   1Gi
    Liveness:   tcp-socket :job-rpc delay=15s timeout=5s period=30s #success=1 #failure=2
    Readiness:  tcp-socket :job-rpc delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      alluxio-config  ConfigMap  Optional: false
    Environment:
      ALLUXIO_MASTER_HOSTNAME:   (v1:status.podIP)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-88v9g (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  alluxio-journal:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  alluxio-journal-alluxio-master-0
    ReadOnly:   false
  kube-api-access-88v9g:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                   From               Message
  ----     ------                  ----                  ----               -------
  Warning  FailedScheduling        50m                   default-scheduler  0/1 nodes are available: 1 node(s) didn't find available persistent volumes to bind. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
  Warning  FailedScheduling        45m                   default-scheduler  0/1 nodes are available: 1 node(s) didn't find available persistent volumes to bind. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
  Normal   Scheduled               18m                   default-scheduler  Successfully assigned default/alluxio-master-0 to zhiyong-ksp1
  Warning  FailedCreatePodSandBox  18m                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "272bf5e805275bf3e235a55b275157b149cb27673c596e0b1488c469e74397b5": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
  Warning  FailedCreatePodSandBox  17m                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "4c1b7f2d9a9d55a096d1392b0e59b69e37eb03c00ba05dcc6066414d46889e24": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
  Warning  FailedCreatePodSandBox  17m                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "afcb920ff350a9c6ac5cb3eb4e588224dca137ecb5047213178536fcbd448edc": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
  Warning  FailedCreatePodSandBox  17m                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "e0895849bef84eab271ec86160fedcf46cbabdb1b5c792c5374d51e522c8b835": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
  Warning  FailedCreatePodSandBox  17m                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "42da9f5a7ca6025e22220169576232334560207843e89b4582ead360cf143336": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
  Warning  FailedCreatePodSandBox  16m                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "c9e94c30e641992fb63c82177cf2f6241f3b46549fcc65cd5bde953ecfddbae4": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
  Warning  FailedCreatePodSandBox  16m                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "1bb467f76b3dfb072b94b19a9ca05c0cc405436d0b7efd7bf5cfa4dd2b65022c": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
  Warning  FailedCreatePodSandBox  16m                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "cfa3efb7a306fdd02f3a0c9f5e13ef4345f77155c3c327fd311bef181529fc3e": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
  Warning  FailedCreatePodSandBox  16m                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "19ce9a3b5ebf896d3ce9042e71154cafb26219736ad1bd85ed02d17be5834cdf": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
  Warning  FailedCreatePodSandBox  2m54s (x60 over 16m)  kubelet            (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "ae696e7f906fa6412aed6585a1a2eb7a5f3b846e1d555cce755a184a307b519d": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911#

解决了pv的问题,现在又遇到了calico的问题。

上次也遇到过:https://lizhiyong.blog.csdn.net/article/details/126380224

root@zhiyong-ksp1:~# mkdir -p /fileback/20220911
root@zhiyong-ksp1:~# cd /etc/cni/net.d
root@zhiyong-ksp1:/etc/cni/net.d# ll
总用量 16
drwxr-xr-x 2 kube root 4096 817 01:47 ./
drwxr-xr-x 3 kube root 4096 88 10:02 ../
-rw-r--r-- 1 root root  663 817 01:47 10-calico.conflist
-rw------- 1 root root 2713 818 00:32 calico-kubeconfig
root@zhiyong-ksp1:/etc/cni/net.d# mv ./10-calico.conflist /fileback/20220911
root@zhiyong-ksp1:/etc/cni/net.d# mv ./calico-kubeconfig /fileback/20220911
root@zhiyong-ksp1:/etc/cni/net.d# ll
总用量 8
drwxr-xr-x 2 kube root 4096 911 22:20 ./
drwxr-xr-x 3 kube root 4096 88 10:02 ../
root@zhiyong-ksp1:/etc/cni/net.d# reboot

重启之后才会去不急不慢地拉取镜像:

root@zhiyong-ksp1:/home/zhiyong# kubectl describe pod alluxio-master-0
Name:           alluxio-master-0
Namespace:      default
Priority:       0
Node:           zhiyong-ksp1/192.168.88.20
Start Time:     Sun, 11 Sep 2022 21:54:32 +0800
Labels:         app=alluxio
                chart=alluxio-0.6.48
                controller-revision-hash=alluxio-master-5bb869cb7d
                heritage=Helm
                name=alluxio-master
                release=alluxio
                role=alluxio-master
                statefulset.kubernetes.io/pod-name=alluxio-master-0
Annotations:    cni.projectcalico.org/containerID: 3dbc1d4a3d1134300c1166722049eb9176c03b1f598f30f84fa1f890f8868043
                cni.projectcalico.org/podIP: 10.233.107.106/32
                cni.projectcalico.org/podIPs: 10.233.107.106/32
Status:         Pending
IP:
IPs:            <none>
Controlled By:  StatefulSet/alluxio-master
Containers:
  alluxio-master:
    Container ID:
    Image:         alluxio/alluxio:2.8.1
    Image ID:
    Ports:         19998/TCP, 19999/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      tini
      --
      /entrypoint.sh
    Args:
      master-only
      --no-format
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     4
      memory:  8Gi
    Requests:
      cpu:      1
      memory:   1Gi
    Liveness:   tcp-socket :rpc delay=15s timeout=5s period=30s #success=1 #failure=2
    Readiness:  tcp-socket :rpc delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      alluxio-config  ConfigMap  Optional: false
    Environment:
      ALLUXIO_MASTER_HOSTNAME:   (v1:status.podIP)
    Mounts:
      /journal from alluxio-journal (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-88v9g (ro)
  alluxio-job-master:
    Container ID:
    Image:         alluxio/alluxio:2.8.1
    Image ID:
    Ports:         20001/TCP, 20002/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      tini
      --
      /entrypoint.sh
    Args:
      job-master
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     4
      memory:  8Gi
    Requests:
      cpu:      1
      memory:   1Gi
    Liveness:   tcp-socket :job-rpc delay=15s timeout=5s period=30s #success=1 #failure=2
    Readiness:  tcp-socket :job-rpc delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      alluxio-config  ConfigMap  Optional: false
    Environment:
      ALLUXIO_MASTER_HOSTNAME:   (v1:status.podIP)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-88v9g (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  alluxio-journal:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  alluxio-journal-alluxio-master-0
    ReadOnly:   false
  kube-api-access-88v9g:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                     From               Message
  ----     ------                  ----                    ----               -------
  Warning  FailedScheduling        64m                     default-scheduler  0/1 nodes are available: 1 node(s) didn't find available persistent volumes to bind. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
  Warning  FailedScheduling        59m                     default-scheduler  0/1 nodes are available: 1 node(s) didn't find available persistent volumes to bind. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
  Normal   Scheduled               32m                     default-scheduler  Successfully assigned default/alluxio-master-0 to zhiyong-ksp1
  Warning  FailedCreatePodSandBox  32m                     kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "272bf5e805275bf3e235a55b275157b149cb27673c596e0b1488c469e74397b5": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
  Warning  FailedCreatePodSandBox  31m                     kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "4c1b7f2d9a9d55a096d1392b0e59b69e37eb03c00ba05dcc6066414d46889e24": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
  Warning  FailedCreatePodSandBox  31m                     kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "afcb920ff350a9c6ac5cb3eb4e588224dca137ecb5047213178536fcbd448edc": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
  Warning  FailedCreatePodSandBox  31m                     kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "e0895849bef84eab271ec86160fedcf46cbabdb1b5c792c5374d51e522c8b835": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
  Warning  FailedCreatePodSandBox  31m                     kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "42da9f5a7ca6025e22220169576232334560207843e89b4582ead360cf143336": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
  Warning  FailedCreatePodSandBox  30m                     kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "c9e94c30e641992fb63c82177cf2f6241f3b46549fcc65cd5bde953ecfddbae4": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
  Warning  FailedCreatePodSandBox  30m                     kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "1bb467f76b3dfb072b94b19a9ca05c0cc405436d0b7efd7bf5cfa4dd2b65022c": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
  Warning  FailedCreatePodSandBox  30m                     kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "cfa3efb7a306fdd02f3a0c9f5e13ef4345f77155c3c327fd311bef181529fc3e": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
  Warning  FailedCreatePodSandBox  30m                     kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "19ce9a3b5ebf896d3ce9042e71154cafb26219736ad1bd85ed02d17be5834cdf": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
  Warning  FailedCreatePodSandBox  6m57s (x104 over 30m)   kubelet            (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "f5be7b93412cadf9405f96816bb004fb14a8bac8a6c657354ef7d23a6137e96b": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
  Warning  NetworkNotReady         2m52s (x11 over 3m12s)  kubelet            network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
  Warning  FailedMount             2m52s (x5 over 2m59s)   kubelet            MountVolume.SetUp failed for volume "kube-api-access-88v9g" : object "default"/"kube-root-ca.crt" not registered
  Normal   Pulling                 2m43s                   kubelet            Pulling image "alluxio/alluxio:2.8.1"
root@zhiyong-ksp1:/home/zhiyong#

坑是真的坑!!!

重启后:

root@zhiyong-ksp1:/home/zhiyong# kubectl describe pod alluxio-master-0
Name:         alluxio-master-0
Namespace:    default
Priority:     0
Node:         zhiyong-ksp1/192.168.88.20
Start Time:   Mon, 12 Sep 2022 00:33:22 +0800
Labels:       app=alluxio
              chart=alluxio-0.6.48
              controller-revision-hash=alluxio-master-5bb869cb7d
              heritage=Helm
              name=alluxio-master
              release=alluxio
              role=alluxio-master
              statefulset.kubernetes.io/pod-name=alluxio-master-0
Annotations:  cni.projectcalico.org/containerID: 281deb90cae21158d2de6b32776d3c268b1d7af40341e4886a918a5cc28e4ba9
              cni.projectcalico.org/podIP: 10.233.107.165/32
              cni.projectcalico.org/podIPs: 10.233.107.165/32
Status:       Running
IP:           10.233.107.165
IPs:
  IP:           10.233.107.165
Controlled By:  StatefulSet/alluxio-master
Containers:
  alluxio-master:
    Container ID:  containerd://a360ca1dbf913afd4b1270b2da4fe86d6e797debba73c8dec75bf46470095c94
    Image:         alluxio/alluxio:2.8.1
    Image ID:      docker.io/alluxio/alluxio@sha256:a365600d65fe4c518e3df4272a25b842ded773b193ea146a202b15e853a65d39
    Ports:         19998/TCP, 19999/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      tini
      --
      /entrypoint.sh
    Args:
      master-only
      --no-format
    State:          Running
      Started:      Mon, 12 Sep 2022 05:46:17 +0800
    Last State:     Terminated
      Reason:       Unknown
      Exit Code:    255
      Started:      Mon, 12 Sep 2022 05:42:59 +0800
      Finished:     Mon, 12 Sep 2022 05:45:35 +0800
    Ready:          False
    Restart Count:  52
    Limits:
      cpu:     4
      memory:  8Gi
    Requests:
      cpu:      1
      memory:   1Gi
    Liveness:   tcp-socket :rpc delay=15s timeout=5s period=30s #success=1 #failure=2
    Readiness:  tcp-socket :rpc delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      alluxio-config  ConfigMap  Optional: false
    Environment:
      ALLUXIO_MASTER_HOSTNAME:   (v1:status.podIP)
    Mounts:
      /journal from alluxio-journal (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vslvq (ro)
  alluxio-job-master:
    Container ID:  containerd://730e943e3a5e8787ec15cd73ce8ef617af15cc4ba73ed1ee58add911bf2cb496
    Image:         alluxio/alluxio:2.8.1
    Image ID:      docker.io/alluxio/alluxio@sha256:a365600d65fe4c518e3df4272a25b842ded773b193ea146a202b15e853a65d39
    Ports:         20001/TCP, 20002/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      tini
      --
      /entrypoint.sh
    Args:
      job-master
    State:          Running
      Started:      Mon, 12 Sep 2022 05:46:21 +0800
    Last State:     Terminated
      Reason:       Unknown
      Exit Code:    255
      Started:      Mon, 12 Sep 2022 02:58:10 +0800
      Finished:     Mon, 12 Sep 2022 05:45:35 +0800
    Ready:          False
    Restart Count:  1
    Limits:
      cpu:     4
      memory:  8Gi
    Requests:
      cpu:      1
      memory:   1Gi
    Liveness:   tcp-socket :job-rpc delay=15s timeout=5s period=30s #success=1 #failure=2
    Readiness:  tcp-socket :job-rpc delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      alluxio-config  ConfigMap  Optional: false
    Environment:
      ALLUXIO_MASTER_HOSTNAME:   (v1:status.podIP)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vslvq (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  alluxio-journal:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  alluxio-journal-alluxio-master-0
    ReadOnly:   false
  kube-api-access-vslvq:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason          Age                     From     Message
  ----     ------          ----                    ----     -------
  Normal   Pulled          53m (x36 over 168m)     kubelet  Container image "alluxio/alluxio:2.8.1" already present on machine
  Warning  BackOff         8m40s (x548 over 163m)  kubelet  Back-off restarting failed container
  Warning  Unhealthy       3m49s (x290 over 168m)  kubelet  Readiness probe failed: dial tcp 10.233.107.112:19998: connect: connection refused
  Normal   SandboxChanged  52s                     kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal   Pulled          45s                     kubelet  Container image "alluxio/alluxio:2.8.1" already present on machine
  Normal   Created         45s                     kubelet  Created container alluxio-master
  Normal   Started         44s                     kubelet  Started container alluxio-master
  Normal   Pulled          44s                     kubelet  Container image "alluxio/alluxio:2.8.1" already present on machine
  Normal   Created         44s                     kubelet  Created container alluxio-job-master
  Normal   Started         40s                     kubelet  Started container alluxio-job-master
  Warning  Unhealthy       22s (x3 over 34s)       kubelet  Readiness probe failed: dial tcp 10.233.107.165:19998: connect: connection refused
root@zhiyong-ksp1:/home/zhiyong# kubectl logs -f alluxio-master-0
Defaulted container "alluxio-master" out of: alluxio-master, alluxio-job-master
2022-09-11 21:47:34,457 INFO  MetricsMasterFactory - Creating alluxio.master.metrics.MetricsMaster
2022-09-11 21:47:34,458 INFO  MetaMasterFactory - Creating alluxio.master.meta.MetaMaster
2022-09-11 21:47:34,457 INFO  TableMasterFactory - Creating alluxio.master.table.TableMaster
2022-09-11 21:47:34,457 INFO  BlockMasterFactory - Creating alluxio.master.block.BlockMaster
2022-09-11 21:47:34,458 INFO  FileSystemMasterFactory - Creating alluxio.master.file.FileSystemMaster
2022-09-11 21:47:34,503 INFO  ExtensionFactoryRegistry - Loading core jars from /opt/alluxio-2.8.1/lib
2022-09-11 21:47:34,624 INFO  ExtensionFactoryRegistry - Loading extension jars from /opt/alluxio-2.8.1/extensions
2022-09-11 21:47:34,684 INFO  MetricsSystem - Starting sinks with config: {}.
2022-09-11 21:47:34,686 INFO  MetricsHeartbeatContext - Created metrics heartbeat with ID app-1037551149078095174. This ID will be used for identifying info from the client. It can be set manually through the alluxio.user.app.id property
2022-09-11 21:47:34,704 INFO  TieredIdentityFactory - Initialized tiered identity TieredIdentity(node=10.233.107.165, rack=null)
2022-09-11 21:47:34,705 INFO  UnderDatabaseRegistry - Loading udb jars from /opt/alluxio-2.8.1/lib
2022-09-11 21:47:34,724 INFO  UnderDatabaseRegistry - Registered UDBs: hive,glue
2022-09-11 21:47:34,727 INFO  LayoutRegistry - Registered Table Layouts: hive
2022-09-11 21:47:34,855 INFO  RocksStore - Closing BlockStore rocks database
2022-09-11 21:47:34,898 INFO  RocksStore - Opened rocks database under path /opt/alluxio-2.8.1/metastore/blocks
2022-09-11 21:47:35,121 INFO  RocksStore - Closing InodeStore rocks database
2022-09-11 21:47:35,162 INFO  RocksStore - Opened rocks database under path /opt/alluxio-2.8.1/metastore/inodes
2022-09-11 21:47:35,211 INFO  RocksStore - Closing InodeStore rocks database
2022-09-11 21:47:35,252 INFO  RocksStore - Opened rocks database under path /opt/alluxio-2.8.1/metastore/inodes
2022-09-11 21:47:35,252 INFO  RocksStore - Cleared store at /opt/alluxio-2.8.1/metastore/inodes
2022-09-11 21:47:35,263 INFO  ProcessUtils - Starting Alluxio master @10.233.107.165:19998.
2022-09-11 21:47:35,263 INFO  ProcessUtils - Alluxio version: 2.8.1-5cda26a856fba1d1f42b39b7a8c761e50bbae8fe
2022-09-11 21:47:35,263 INFO  ProcessUtils - Java version: 1.8.0_275
2022-09-11 21:47:35,263 INFO  AlluxioMasterProcess - Starting...
2022-09-11 21:47:35,263 INFO  RocksStore - Closing BlockStore rocks database
2022-09-11 21:47:35,300 INFO  RocksStore - Opened rocks database under path /opt/alluxio-2.8.1/metastore/blocks
2022-09-11 21:47:35,300 INFO  RocksStore - Cleared store at /opt/alluxio-2.8.1/metastore/blocks
2022-09-11 21:47:35,305 INFO  RocksStore - Closing InodeStore rocks database
2022-09-11 21:47:35,308 INFO  UfsJournalCheckpointThread - BlockMaster: Journal checkpoint thread started.
2022-09-11 21:47:35,308 INFO  UfsJournalCheckpointThread - TableMaster: Journal checkpoint thread started.
2022-09-11 21:47:35,338 INFO  RocksStore - Opened rocks database under path /opt/alluxio-2.8.1/metastore/inodes
2022-09-11 21:47:35,338 INFO  RocksStore - Cleared store at /opt/alluxio-2.8.1/metastore/inodes
2022-09-11 21:47:35,340 INFO  UfsJournalCheckpointThread - FileSystemMaster: Journal checkpoint thread started.
2022-09-11 21:47:35,341 INFO  UfsJournalCheckpointThread - MetricsMaster: Journal checkpoint thread started.
2022-09-11 21:47:35,341 INFO  UfsJournalCheckpointThread - MetaMaster: Journal checkpoint thread started.
2022-09-11 21:47:35,341 INFO  UfsJournalCheckpointThread - BlockMaster: Journal checkpointer shutdown has been initiated.
2022-09-11 21:47:35,341 INFO  UfsJournalCheckpointThread - TableMaster: Journal checkpointer shutdown has been initiated.
2022-09-11 21:47:35,342 INFO  UfsJournalCheckpointThread - FileSystemMaster: Journal checkpointer shutdown has been initiated.
2022-09-11 21:47:35,342 INFO  UfsJournalCheckpointThread - MetaMaster: Journal checkpointer shutdown has been initiated.
2022-09-11 21:47:35,342 INFO  UfsJournalCheckpointThread - MetricsMaster: Journal checkpointer shutdown has been initiated.
2022-09-11 21:47:37,310 INFO  AbstractJournalProgressLogger - UfsJournal(/journal/TableMaster/v1)|current SN: 0|entries in last 2001ms=0
2022-09-11 21:47:37,342 INFO  AbstractJournalProgressLogger - UfsJournal(/journal/MetricsMaster/v1)|current SN: 0|entries in last 2000ms=0
2022-09-11 21:47:37,342 INFO  AbstractJournalProgressLogger - UfsJournal(/journal/MetaMaster/v1)|current SN: 0|entries in last 2000ms=0
2022-09-11 21:47:38,310 INFO  AbstractJournalProgressLogger - UfsJournal(/journal/BlockMaster/v1)|current SN: 0|entries in last 3000ms=0
2022-09-11 21:47:38,341 INFO  AbstractJournalProgressLogger - UfsJournal(/journal/FileSystemMaster/v1)|current SN: 0|entries in last 3000ms=0
2022-09-11 21:47:41,311 INFO  AbstractJournalProgressLogger - UfsJournal(/journal/TableMaster/v1)|current SN: 0|entries in last 4001ms=0
2022-09-11 21:47:41,312 INFO  UfsJournalCheckpointThread - BlockMaster: Journal checkpoint thread has been shutdown. No new logs have been found during the quiet period.
2022-09-11 21:47:41,312 INFO  UfsJournalCheckpointThread - TableMaster: Journal checkpoint thread has been shutdown. No new logs have been found during the quiet period.
2022-09-11 21:47:41,313 INFO  UfsJournalCheckpointThread - BlockMaster: Journal checkpointer shutdown complete
2022-09-11 21:47:41,313 INFO  UfsJournalCheckpointThread - TableMaster: Journal checkpointer shutdown complete
2022-09-11 21:47:41,329 INFO  UfsJournal - BlockMaster: journal switched to primary mode. location: /journal/BlockMaster/v1
2022-09-11 21:47:41,330 INFO  UfsJournal - TableMaster: journal switched to primary mode. location: /journal/TableMaster/v1
2022-09-11 21:47:41,342 INFO  UfsJournalCheckpointThread - FileSystemMaster: Journal checkpoint thread has been shutdown. No new logs have been found during the quiet period.
2022-09-11 21:47:41,343 INFO  UfsJournalCheckpointThread - FileSystemMaster: Journal checkpointer shutdown complete
2022-09-11 21:47:41,343 INFO  UfsJournalCheckpointThread - MetricsMaster: Journal checkpoint thread has been shutdown. No new logs have been found during the quiet period.
2022-09-11 21:47:41,343 INFO  UfsJournalCheckpointThread - MetaMaster: Journal checkpoint thread has been shutdown. No new logs have been found during the quiet period.
2022-09-11 21:47:41,344 INFO  UfsJournalCheckpointThread - MetricsMaster: Journal checkpointer shutdown complete
2022-09-11 21:47:41,344 INFO  UfsJournalCheckpointThread - MetaMaster: Journal checkpointer shutdown complete
2022-09-11 21:47:41,346 INFO  UfsJournal - FileSystemMaster: journal switched to primary mode. location: /journal/FileSystemMaster/v1
2022-09-11 21:47:41,347 INFO  UfsJournal - MetaMaster: journal switched to primary mode. location: /journal/MetaMaster/v1
2022-09-11 21:47:41,348 INFO  UfsJournal - MetricsMaster: journal switched to primary mode. location: /journal/MetricsMaster/v1
2022-09-11 21:47:41,390 INFO  AlluxioMasterProcess - Starting all masters as: leader.
2022-09-11 21:47:41,391 INFO  AbstractMaster - MetricsMaster: Starting primary master.
2022-09-11 21:47:41,397 INFO  MetricsSystem - Reset all metrics in the metrics system in 4ms
2022-09-11 21:47:41,398 INFO  MetricsStore - Cleared the metrics store and metrics system in 6 ms
2022-09-11 21:47:41,401 INFO  AbstractMaster - BlockMaster: Starting primary master.
2022-09-11 21:47:41,403 INFO  AbstractMaster - FileSystemMaster: Starting primary master.
2022-09-11 21:47:41,403 INFO  DefaultFileSystemMaster - Starting fs master as primary
2022-09-11 21:47:41,513 WARN  AsyncJournalWriter - Failed to flush journal entry: Unable to create parent directories for path /journal/BlockMaster/v1/logs/0x0-0x7fffffffffffffff
java.io.IOException: Unable to create parent directories for path /journal/BlockMaster/v1/logs/0x0-0x7fffffffffffffff
        at alluxio.underfs.local.LocalUnderFileSystem.createDirect(LocalUnderFileSystem.java:114)
        at alluxio.underfs.local.LocalUnderFileSystem.create(LocalUnderFileSystem.java:103)
        at alluxio.underfs.UnderFileSystemWithLogging$6.call(UnderFileSystemWithLogging.java:182)
        at alluxio.underfs.UnderFileSystemWithLogging$6.call(UnderFileSystemWithLogging.java:179)
        at alluxio.underfs.UnderFileSystemWithLogging.call(UnderFileSystemWithLogging.java:1237)
        at alluxio.underfs.UnderFileSystemWithLogging.create(UnderFileSystemWithLogging.java:179)
        at alluxio.master.journal.ufs.UfsJournalLogWriter.createNewLogFile(UfsJournalLogWriter.java:294)
        at alluxio.master.journal.ufs.UfsJournalLogWriter.maybeRotateLog(UfsJournalLogWriter.java:283)
        at alluxio.master.journal.ufs.UfsJournalLogWriter.write(UfsJournalLogWriter.java:118)
        at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:305)
        at java.lang.Thread.run(Thread.java:748)
2022-09-11 21:47:41,516 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Unable to create parent directories for path /journal/BlockMaster/v1/logs/0x0-0x7fffffffffffffff
        at alluxio.underfs.local.LocalUnderFileSystem.createDirect(LocalUnderFileSystem.java:114)
        at alluxio.underfs.local.LocalUnderFileSystem.create(LocalUnderFileSystem.java:103)
        at alluxio.underfs.UnderFileSystemWithLogging$6.call(UnderFileSystemWithLogging.java:182)
        at alluxio.underfs.UnderFileSystemWithLogging$6.call(UnderFileSystemWithLogging.java:179)
        at alluxio.underfs.UnderFileSystemWithLogging.call(UnderFileSystemWithLogging.java:1237)
        at alluxio.underfs.UnderFileSystemWithLogging.create(UnderFileSystemWithLogging.java:179)
        at alluxio.master.journal.ufs.UfsJournalLogWriter.createNewLogFile(UfsJournalLogWriter.java:294)
        at alluxio.master.journal.ufs.UfsJournalLogWriter.maybeRotateLog(UfsJournalLogWriter.java:283)
        at alluxio.master.journal.ufs.UfsJournalLogWriter.write(UfsJournalLogWriter.java:118)
        at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:305)
        at java.lang.Thread.run(Thread.java:748)

貌似是没有写权限。。。授权:

root@zhiyong-ksp1:/tmp# chmod 777 -R /tmp/alluxio-journal-0
root@zhiyong-ksp1:/tmp# ll | grep alluxio-journal-0
drwxrwxrwx  2 root    root    4096 912 05:46 alluxio-journal-0/

此时:

root@zhiyong-ksp1:~# kubectl get pods -owide --all-namespaces | grep alluxio
default                        alluxio-master-0                                                  2/2     Running     4 (2m22s ago)   6m22s   10.233.107.194   zhiyong-ksp1   <none>           <none>
default                        alluxio-worker-n8qh7                                              0/2     Pending     0               6m22s   <none>           <none>         <none>           <none>

可以看到Master的2个pod终于启动!!!

解决Worker节点pending的问题

root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl get pods -owide --all-namespaces | grep alluxio
default                        alluxio-master-0                                                  2/2     Running     4 (17m ago)    21m     10.233.107.194   zhiyong-ksp1   <none>           <none>
default                        alluxio-worker-n8qh7                                              0/2     Pending     0              21m     <none>           <none>         <none>           <none>
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl describe pod alluxio-worker-n8qh7
Name:           alluxio-worker-n8qh7
Namespace:      default
Priority:       0
Node:           <none>
Labels:         app=alluxio
                chart=alluxio-0.6.48
                controller-revision-hash=c6dcc876c
                heritage=Helm
                pod-template-generation=1
                release=alluxio
                role=alluxio-worker
Annotations:    <none>
Status:         Pending
IP:
IPs:            <none>
Controlled By:  DaemonSet/alluxio-worker
Containers:
  alluxio-worker:
    Image:       alluxio/alluxio:2.8.1
    Ports:       29999/TCP, 30000/TCP
    Host Ports:  0/TCP, 0/TCP
    Command:
      tini
      --
      /entrypoint.sh
    Args:
      worker-only
      --no-format
    Limits:
      cpu:     4
      memory:  4Gi
    Requests:
      cpu:      1
      memory:   2Gi
    Liveness:   tcp-socket :rpc delay=15s timeout=5s period=30s #success=1 #failure=2
    Readiness:  tcp-socket :rpc delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      alluxio-config  ConfigMap  Optional: false
    Environment:
      ALLUXIO_WORKER_HOSTNAME:             (v1:status.hostIP)
      ALLUXIO_WORKER_CONTAINER_HOSTNAME:   (v1:status.podIP)
    Mounts:
      /dev/shm from mem (rw)
      /opt/domain from alluxio-domain (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9wxfl (ro)
  alluxio-job-worker:
    Image:       alluxio/alluxio:2.8.1
    Ports:       30001/TCP, 30002/TCP, 30003/TCP
    Host Ports:  0/TCP, 0/TCP, 0/TCP
    Command:
      tini
      --
      /entrypoint.sh
    Args:
      job-worker
    Limits:
      cpu:     4
      memory:  4Gi
    Requests:
      cpu:      1
      memory:   1Gi
    Liveness:   tcp-socket :job-rpc delay=15s timeout=5s period=30s #success=1 #failure=2
    Readiness:  tcp-socket :job-rpc delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      alluxio-config  ConfigMap  Optional: false
    Environment:
      ALLUXIO_WORKER_HOSTNAME:             (v1:status.hostIP)
      ALLUXIO_WORKER_CONTAINER_HOSTNAME:   (v1:status.podIP)
    Mounts:
      /dev/shm from mem (rw)
      /opt/domain from alluxio-domain (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9wxfl (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  alluxio-domain:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  alluxio-worker-domain-socket
    ReadOnly:   false
  mem:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  1Gi
  kube-api-access-9wxfl:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  22m   default-scheduler  0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
  Warning  FailedScheduling  17m   default-scheduler  0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl logs -f alluxio-worker-n8qh7
Defaulted container "alluxio-worker" out of: alluxio-worker, alluxio-job-worker
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911#

貌似是存储卷没有挂载成功:

root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                                                              STORAGECLASS   REASON   AGE
alluxio-journal-0                          20Gi       RWO            Retain           Bound       default/alluxio-journal-alluxio-master-0                           standard                5h46m
alluxio-journal-1                          30Gi       RWO            Retain           Available                                                                      standard                6m40s
pvc-23fc88d9-da65-47fc-80e6-b15976a8bcf4   20Gi       RWO            Delete           Bound       kubesphere-monitoring-system/prometheus-k8s-db-prometheus-k8s-0    local                   34d
pvc-402e21a4-a811-46a7-b75b-e295512bab25   4Gi        RWO            Delete           Bound       kubesphere-logging-system/data-elasticsearch-logging-discovery-0   local                   26d
pvc-5d4597cd-404d-4bd8-8b9a-71f32c44f1d1   8Gi        RWO            Delete           Bound       kubesphere-devops-system/devops-jenkins                            local                   25d
pvc-861a6ff8-7a6b-407e-bb73-aef721ef586d   20Gi       RWO            Delete           Bound       kubesphere-logging-system/data-elasticsearch-logging-data-0        local                   26d
pvc-a777e6f9-c564-419f-85fd-23bee491ef19   20Gi       RWO            Delete           Bound       kubesphere-system/minio                                            local                   25d
pvc-cad97ef9-8fed-4540-a9d5-b91331a193f8   2Gi        RWO            Delete           Bound       kubesphere-system/openldap-pvc-openldap-0                          local                   25d
root@zhiyong-ksp1:/tmp# kubectl get pvc
NAME                               STATUS    VOLUME              CAPACITY   ACCESS MODES   STORAGECLASS   AGE
alluxio-journal-alluxio-master-0   Bound     alluxio-journal-0   20Gi       RWO            standard       6h
alluxio-worker-domain-socket       Pending                                                 standard       39m
root@zhiyong-ksp1:/tmp# kubectl describe pvc alluxio-worker-domain-socket
Name:          alluxio-worker-domain-socket
Namespace:     default
StorageClass:  standard
Status:        Pending
Volume:
Labels:        app=alluxio
               app.kubernetes.io/managed-by=Helm
               chart=alluxio-0.6.48
               heritage=Helm
               release=alluxio
               role=alluxio-worker
Annotations:   meta.helm.sh/release-name: alluxio
               meta.helm.sh/release-namespace: default
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Used By:       alluxio-worker-n8qh7
Events:
  Type     Reason              Age                    From                         Message
  ----     ------              ----                   ----                         -------
  Warning  ProvisioningFailed  4m45s (x142 over 39m)  persistentvolume-controller  storageclass.storage.k8s.io "standard" not found
root@zhiyong-ksp1:/tmp#

对比成功绑定的pv与未绑定的pv:

root@zhiyong-ksp1:/tmp# kubectl describe pv alluxio-journal-0
Name:            alluxio-journal-0
Labels:          type=local
Annotations:     pv.kubernetes.io/bound-by-controller: yes
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    standard
Status:          Bound
Claim:           default/alluxio-journal-alluxio-master-0
Reclaim Policy:  Retain
Access Modes:    RWO
VolumeMode:      Filesystem
Capacity:        20Gi
Node Affinity:   <none>
Message:
Source:
    Type:          HostPath (bare host directory volume)
    Path:          /tmp/alluxio-journal-0
    HostPathType:
Events:            <none>
root@zhiyong-ksp1:/tmp# kubectl describe pv alluxio-journal-1
Name:            alluxio-journal-1
Labels:          type=local
Annotations:     <none>
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    standard
Status:          Available
Claim:
Reclaim Policy:  Retain
Access Modes:    RWO
VolumeMode:      Filesystem
Capacity:        30Gi
Node Affinity:   <none>
Message:
Source:
    Type:          HostPath (bare host directory volume)
    Path:          /tmp/alluxio-journal-1
    HostPathType:
Events:            <none>
root@zhiyong-ksp1:/tmp# root@zhiyong-ksp1:/tmp# kubectl edit pv/alluxio-journal-0 -o yaml -ndefault
apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/bound-by-controller: "yes"
  creationTimestamp: "2022-09-11T16:31:05Z"
  finalizers:
  - kubernetes.io/pv-protection
  labels:
    type: local
  name: alluxio-journal-0
  resourceVersion: "228631"
  uid: 46366084-9f20-434c-9504-9585707c92ad
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 20Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: alluxio-journal-alluxio-master-0
    namespace: default
    resourceVersion: "228627"
    uid: 3d78a2dd-88e6-4245-b357-c445a580301a
  hostPath:
    path: /tmp/alluxio-journal-0
    type: ""
  persistentVolumeReclaimPolicy: Retain
  storageClassName: standard
  volumeMode: Filesystem
status:
  phase: Bound

root@zhiyong-ksp1:/tmp# kubectl edit pv/alluxio-journal-1 -o yaml -ndefault
apiVersion: v1
kind: PersistentVolume
metadata:
  creationTimestamp: "2022-09-11T22:11:14Z"
  finalizers:
  - kubernetes.io/pv-protection
  labels:
    type: local
  name: alluxio-journal-1
  resourceVersion: "293899"
  uid: f913bcfd-3916-4ef5-b99a-bb57db5c008f
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 30Gi
  hostPath:
    path: /tmp/alluxio-journal-1
    type: ""
  persistentVolumeReclaimPolicy: Retain
  storageClassName: standard
  volumeMode: Filesystem
status:
  phase: Available

可以发现挂载成功的pv多余:

  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: alluxio-journal-alluxio-master-0
    namespace: default
    resourceVersion: "228627"
    uid: 3d78a2dd-88e6-4245-b357-c445a580301a

当然可以仿照着加入这部分内容:

  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: alluxio-worker-domain-socket
    namespace: default

此时:

root@zhiyong-ksp1:/tmp# chmod 777 -R /tmp/alluxio-journal-1
root@zhiyong-ksp1:/tmp# ll
总用量 84
drwxrwxrwt 19 root    root    4096 912 07:08 ./
drwxr-xr-x 22 root    root    4096 817 01:42 ../
drwxrwxrwx  5 root    root    4096 912 05:58 alluxio-journal-0/
drwxrwxrwx  2 root    root    4096 912 07:08 alluxio-journal-1/
root@zhiyong-ksp1:/tmp# kubectl get pods -owide --all-namespaces | grep alluxio
default                        alluxio-master-0                                                  2/2     Running     4 (71m ago)    75m     10.233.107.194   zhiyong-ksp1   <none>           <none>
default                        alluxio-worker-n8qh7                                              2/2     Running     4 (87s ago)    75m     10.233.107.198   zhiyong-ksp1   <none>           <none>

可以看到Alluxio的Worker的2个Pod也成功运行。

格式化

由于部署时没有设置格式化:

helm install alluxio -f config.yaml --set journal.format.runFormat=true alluxio-charts/alluxio

目前已部署完毕,只好使用升级现有的helm部署来触发日志格式化 journal.format.runFormat = true

root@zhiyong-ksp1:/tmp# helm upgrade alluxio -f /home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911/alluxioconfig.yaml --set journal.format.runFormat=true alluxio-charts/alluxio

配置端口转发

root@zhiyong-ksp1:/tmp# kubectl port-forward alluxio-master-0 19999:19999
Forwarding from 127.0.0.1:19999 -> 19999
Forwarding from [::1]:19999 -> 19999
Handling connection for 19999
Handling connection for 19999
Handling connection for 19999
Handling connection for 19999
Handling connection for 19999
Handling connection for 19999
Handling connection for 19999
Handling connection for 19999

此时从Ubuntu的火狐敲网址:

127.0.0.1:19999

可以访问:

在这里插入图片描述

看到了熟悉的Alluxio web UI。

但是这样肯定是不对的,总不可能只用Ubuntu的火狐访问,还需要暴露端口,给外部宿主机访问。

配置端口暴露

简单粗暴的方式:使用nodeport。

root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# pwd
/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# vim alluxio-web-ui-service.yaml
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl apply -f /home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911/alluxio-web-ui-service.yaml
The Service "alluxio-web-ui-service" is invalid: spec.ports[0].nodePort: Invalid value: 19999: provided port is not in the valid range. The range of valid ports is 30000-32767
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl get nodes --show-labels
NAME           STATUS   ROLES                  AGE   VERSION   LABELS
zhiyong-ksp1   Ready    control-plane,worker   35d   v1.24.1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=zhiyong-ksp1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/worker=,node.kubernetes.io/exclude-from-external-load-balancers=
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl get pods --show-labels
NAME                   READY   STATUS    RESTARTS        AGE     LABELS
alluxio-master-0       2/2     Running   0               4h10m   app=alluxio,chart=alluxio-0.6.48,controller-revision-hash=alluxio-master-7c4cd554c8,heritage=Helm,name=alluxio-master,release=alluxio,role=alluxio-master,statefulset.kubernetes.io/pod-name=alluxio-master-0
alluxio-worker-n8qh7   2/2     Running   4 (4h17m ago)   5h31m   app=alluxio,chart=alluxio-0.6.48,controller-revision-hash=c6dcc876c,heritage=Helm,pod-template-generation=1,release=alluxio,role=alluxio-worker
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911#

内容:

apiVersion: v1
kind: Service
metadata:
 name: alluxio-web-ui-service
spec:
 type: NodePort
 ports:
 - port: 19999
   targetPort: 19999
   nodePort: 19999
 selector:
   app: alluxio

这里一定要有selector!!!不然endpoints是none,会导致server和pod挂在不上,从而connection refused:

root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# curl http://192.168.88.20:32634
curl: (7) Failed to connect to 192.168.88.20 port 32634: 拒绝连接

显然K8S默认不同意使用30000->32767以外的端口。。。

root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep service-cluster-ip-range
    - --service-cluster-ip-range=10.233.0.0/18
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# vim /etc/kubernetes/manifests/kube-apiserver.yaml

在上述配置的下一行:

    - --service-cluster-ip-range=10.233.0.0/18
    - --service-node-port-range=1-65535

添加一个新配置后,重启:

root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# systemctl daemon-reload
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# systemctl restart kubelet
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl apply -f /home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911/alluxio-web-ui-service.yaml
service/alluxio-web-ui-service created
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl get service
NAME                     TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)                                                       AGE
alluxio-master-0         ClusterIP   None          <none>        19998/TCP,19999/TCP,20001/TCP,20002/TCP,19200/TCP,20003/TCP   151m
alluxio-web-ui-service   NodePort    10.233.36.0   <none>        19999:19999/TCP                                               30s
kubernetes               ClusterIP   10.233.0.1    <none>        443/TCP                                                       34d
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911#

查看Alluxio默认的service:

root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl edit service/alluxio-master-0 -o yaml -ndefault

原始yaml:

apiVersion: v1
kind: Service
metadata:
  annotations:
    meta.helm.sh/release-name: alluxio
    meta.helm.sh/release-namespace: default
  creationTimestamp: "2022-09-11T21:53:58Z"
  labels:
    app: alluxio
    app.kubernetes.io/managed-by: Helm
    chart: alluxio-0.6.48
    heritage: Helm
    release: alluxio
    role: alluxio-master
  name: alluxio-master-0
  namespace: default
  resourceVersion: "290588"
  uid: aa0828ae-b8f4-4533-9b7b-cae8b3094ae7
spec:
  clusterIP: None
  clusterIPs:
  - None
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: rpc
    port: 19998
    protocol: TCP
    targetPort: 19998
  - name: web
    port: 19999
    protocol: TCP
    targetPort: 19999
  - name: job-rpc
    port: 20001
    protocol: TCP
    targetPort: 20001
  - name: job-web
    port: 20002
    protocol: TCP
    targetPort: 20002
  - name: embedded
    port: 19200
    protocol: TCP
    targetPort: 19200
  - name: job-embedded
    port: 20003
    protocol: TCP
    targetPort: 20003
  selector:
    app: alluxio
    release: alluxio
    role: alluxio-master
    statefulset.kubernetes.io/pod-name: alluxio-master-0
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

可以参照着全部暴露:

apiVersion: v1
kind: Service
metadata:
 name: alluxio-master-service
spec:
  type: NodePort  
  ports:
  - name: rpc
    port: 19998
    protocol: TCP
    targetPort: 19998
    nodePort: 19998
  - name: web
    port: 19999
    protocol: TCP
    targetPort: 19999
    nodePort: 19999
  - name: job-rpc
    port: 20001
    protocol: TCP
    targetPort: 20001
    nodePort: 20001
  - name: job-web
    port: 20002
    protocol: TCP
    targetPort: 20002
    nodePort: 20002
  - name: embedded
    port: 19200
    protocol: TCP
    targetPort: 19200
    nodePort: 19200
  - name: job-embedded
    port: 20003
    protocol: TCP
    targetPort: 20003
    nodePort: 20003
  selector:
    app: alluxio

命令行执行:

root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# vim alluxio-master-service.yaml
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl apply -f /home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911/alluxio-master-service.yaml
service/alluxio-master-service created
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl get service
NAME                     TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                                                                                           AGE
alluxio-master-0         ClusterIP   None           <none>        19998/TCP,19999/TCP,20001/TCP,20002/TCP,19200/TCP,20003/TCP                                       5h41m
alluxio-master-service   NodePort    10.233.59.39   <none>        19998:19998/TCP,19999:19999/TCP,20001:20001/TCP,20002:20002/TCP,19200:19200/TCP,20003:20003/TCP   3s
kubernetes               ClusterIP   10.233.0.1     <none>        443/TCP                                                                                           35d
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl describe service alluxio-master-service
Name:                     alluxio-master-service
Namespace:                default
Labels:                   <none>
Annotations:              <none>
Selector:                 app=alluxio
Type:                     NodePort
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.233.59.39
IPs:                      10.233.59.39
Port:                     rpc  19998/TCP
TargetPort:               19998/TCP
NodePort:                 rpc  19998/TCP
Endpoints:                10.233.107.198:19998,10.233.107.199:19998
Port:                     web  19999/TCP
TargetPort:               19999/TCP
NodePort:                 web  19999/TCP
Endpoints:                10.233.107.198:19999,10.233.107.199:19999
Port:                     job-rpc  20001/TCP
TargetPort:               20001/TCP
NodePort:                 job-rpc  20001/TCP
Endpoints:                10.233.107.198:20001,10.233.107.199:20001
Port:                     job-web  20002/TCP
TargetPort:               20002/TCP
NodePort:                 job-web  20002/TCP
Endpoints:                10.233.107.198:20002,10.233.107.199:20002
Port:                     embedded  19200/TCP
TargetPort:               19200/TCP
NodePort:                 embedded  19200/TCP
Endpoints:                10.233.107.198:19200,10.233.107.199:19200
Port:                     job-embedded  20003/TCP
TargetPort:               20003/TCP
NodePort:                 job-embedded  20003/TCP
Endpoints:                10.233.107.198:20003,10.233.107.199:20003
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911#

可以看到此时每个Endpoints都有挂载到虚拟IP,那么在宿主机敲:

192.168.88.20:19999

即可访问Alluxio的web UI:

在这里插入图片描述

说明此时其它Master的端口已经全部成功暴露。

启动FUSE守护进程

生产环境需要:

root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# helm upgrade alluxio -f /home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911/alluxioconfig.yaml --set fuse.enabled=true --set fuse.clientEnabled=true alluxio-charts/alluxio
Release "alluxio" has been upgraded. Happy Helming!
NAME: alluxio
LAST DEPLOYED: Mon Sep 12 11:41:25 2022
NAMESPACE: default
STATUS: deployed
REVISION: 3
TEST SUITE: None
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911#

自己玩玩其实有没有无所谓。。。做这一步的好处是pod的进程挂了守护进程可以自动重新拉起。

转载请注明出处:https://lizhiyong.blog.csdn.net/article/details/126815426

在这里插入图片描述

至此,Alluxio2.8.1 On K8S1.24部署成功!!!时运不济,命途多舛。。。

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值