在K8S1.24使用Helm3部署Alluxio2.8.1
前言
Alluxio官网:https://www.alluxio.io/
关于什么是Alluxio不再赘述,多去官网看看就明白了。笔者更关心的是它的功能。
我们去年就白piao了Alluxio,用作热数据缓存及统一文件层:https://mp.weixin.qq.com/s/kBetfi_LxAQGwgMBpI70ow
可以搜索标题:
【Alluxio&大型银行】科技赋能金融,兴业银行按下“大数据处理加速键”
先了解下Primo Ramdisk这款软件:https://www.romexsoftware.com/zh-cn/primo-ramdisk/overview.html
Alluxio用作内存盘,就类似于Primo Ramdisk,将持久化到硬盘及文件系统的数据预先读取到内存中,之后应用程序直接从Alluxio集群的内存中读取数据,那速度自然是爽的飞起。笔者之前有做了个4G的内存盘,专用于拷贝U盘文件。现在Win10开机为神马占用很多内存?其实就是预加载了硬盘的热数据【咳咳咳,此处点名批评细数SN550的冷数据门】,加速读写提升IO的同时,减少了对SSD的读写,提高了硬盘寿命。
使用Alluxio将热数据缓存到内存实现读写提速就是这个原理,由于加载到内存时数据离计算节点更近,还能显著减少对网络带宽的占用,降低交换机负载【对于云服务器ECS计量付费而言,节省的就是白花花的银子】。
使用Alluxio的这个特性,还可以给Spark等计算引擎做RSS,Alluxio正是做这个起家的【当它还是叫Tachyon时,就是给Spark做堆外缓存使用】:https://spark.apache.org/third-party-projects.html
在Spark的第三方项目中,至今也可以看到Alluxio的踪影:
另一个重要的功能就是统一文件层了。由于其可以兼容多种文件系统协议,不管是Amazon的S3对象存储,还是HDFS或者NFS等,都可以Mount到Alluxio,实现统一接口访问。屏蔽了不同文件系统的差异,处理异构数据源时还是能方便不少,开发人员也不必掌握很多种文件系统的API了,Alluxio的API一套即可通吃。
笔者已经有不少虚拟机了,本着方便挂起,随时使用的原则,还是搭单节点。这次部署在K8S上。
官网文档:https://docs.alluxio.io/os/user/stable/cn/deploy/Running-Alluxio-On-Kubernetes.html
后续主要参照这篇官网的文档安装Alluxio2.8.1 On K8S1.24。
当前环境
虚拟机及K8S环境:https://lizhiyong.blog.csdn.net/article/details/126236516
root@zhiyong-ksp1:/home/zhiyong# kubectl get pods -owide --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
argocd devops-argocd-application-controller-0 1/1 Running 0 3h27m 10.233.107.78 zhiyong-ksp1 <none> <none>
argocd devops-argocd-applicationset-controller-5864597bfc-pf8ht 1/1 Running 0 3h27m 10.233.107.79 zhiyong-ksp1 <none> <none>
argocd devops-argocd-dex-server-f885fb4b4-fkpls 1/1 Running 0 3h27m 10.233.107.77 zhiyong-ksp1 <none> <none>
argocd devops-argocd-notifications-controller-54b744556f-f4g24 1/1 Running 0 3h27m 10.233.107.74 zhiyong-ksp1 <none> <none>
argocd devops-argocd-redis-556fdd5876-xftmq 1/1 Running 0 3h27m 10.233.107.73 zhiyong-ksp1 <none> <none>
argocd devops-argocd-repo-server-5dbf9b87db-9tw2c 1/1 Running 0 3h27m 10.233.107.76 zhiyong-ksp1 <none> <none>
argocd devops-argocd-server-6f9898cc75-s7jkm 1/1 Running 0 3h27m 10.233.107.75 zhiyong-ksp1 <none> <none>
istio-system istiod-1-11-2-54dd699c87-99krn 1/1 Running 0 23h 10.233.107.41 zhiyong-ksp1 <none> <none>
istio-system jaeger-collector-67cfc55477-7757f 1/1 Running 5 (22h ago) 22h 10.233.107.61 zhiyong-ksp1 <none> <none>
istio-system jaeger-operator-fccc48b86-vtcr8 1/1 Running 0 23h 10.233.107.47 zhiyong-ksp1 <none> <none>
istio-system jaeger-query-8497bdbfd7-csbts 2/2 Running 0 22h 10.233.107.67 zhiyong-ksp1 <none> <none>
istio-system kiali-75c777bdf6-xhbq7 1/1 Running 0 22h 10.233.107.58 zhiyong-ksp1 <none> <none>
istio-system kiali-operator-c459985f7-sttfs 1/1 Running 0 23h 10.233.107.38 zhiyong-ksp1 <none> <none>
kube-system calico-kube-controllers-f9f9bbcc9-2v7lm 1/1 Running 2 (22h ago) 9d 10.233.107.45 zhiyong-ksp1 <none> <none>
kube-system calico-node-4mgc7 1/1 Running 2 (22h ago) 9d 192.168.88.20 zhiyong-ksp1 <none> <none>
kube-system coredns-f657fccfd-2gw7h 1/1 Running 2 (22h ago) 9d 10.233.107.39 zhiyong-ksp1 <none> <none>
kube-system coredns-f657fccfd-pflwf 1/1 Running 2 (22h ago) 9d 10.233.107.43 zhiyong-ksp1 <none> <none>
kube-system kube-apiserver-zhiyong-ksp1 1/1 Running 2 (22h ago) 9d 192.168.88.20 zhiyong-ksp1 <none> <none>
kube-system kube-controller-manager-zhiyong-ksp1 1/1 Running 2 (22h ago) 9d 192.168.88.20 zhiyong-ksp1 <none> <none>
kube-system kube-proxy-cn68l 1/1 Running 2 (22h ago) 9d 192.168.88.20 zhiyong-ksp1 <none> <none>
kube-system kube-scheduler-zhiyong-ksp1 1/1 Running 2 (22h ago) 9d 192.168.88.20 zhiyong-ksp1 <none> <none>
kube-system nodelocaldns-96gtw 1/1 Running 2 (22h ago) 9d 192.168.88.20 zhiyong-ksp1 <none> <none>
kube-system openebs-localpv-provisioner-68db4d895d-p9527 1/1 Running 1 (22h ago) 9d 10.233.107.40 zhiyong-ksp1 <none> <none>
kube-system snapshot-controller-0 1/1 Running 2 (22h ago) 9d 10.233.107.42 zhiyong-ksp1 <none> <none>
kubesphere-controls-system default-http-backend-587748d6b4-ccg59 1/1 Running 2 (22h ago) 9d 10.233.107.50 zhiyong-ksp1 <none> <none>
kubesphere-controls-system kubectl-admin-5d588c455b-82cnk 1/1 Running 2 (22h ago) 9d 10.233.107.48 zhiyong-ksp1 <none> <none>
kubesphere-devops-system devops-27679170-8nrzx 0/1 Completed 0 65m 10.233.107.90 zhiyong-ksp1 <none> <none>
kubesphere-devops-system devops-27679200-kdgvk 0/1 Completed 0 35m 10.233.107.91 zhiyong-ksp1 <none> <none>
kubesphere-devops-system devops-27679230-v9h2l 0/1 Completed 0 5m34s 10.233.107.92 zhiyong-ksp1 <none> <none>
kubesphere-devops-system devops-apiserver-6b468c95cb-9s7lz 1/1 Running 0 3h27m 10.233.107.82 zhiyong-ksp1 <none> <none>
kubesphere-devops-system devops-controller-667f8449d7-gjgj8 1/1 Running 0 3h27m 10.233.107.80 zhiyong-ksp1 <none> <none>
kubesphere-devops-system devops-jenkins-bf85c664c-c6qnq 1/1 Running 0 3h27m 10.233.107.84 zhiyong-ksp1 <none> <none>
kubesphere-devops-system s2ioperator-0 1/1 Running 0 3h27m 10.233.107.83 zhiyong-ksp1 <none> <none>
kubesphere-logging-system elasticsearch-logging-curator-elasticsearch-curator-2767784rhhk 0/1 Completed 0 23h 10.233.107.51 zhiyong-ksp1 <none> <none>
kubesphere-logging-system elasticsearch-logging-data-0 1/1 Running 0 23h 10.233.107.65 zhiyong-ksp1 <none> <none>
kubesphere-logging-system elasticsearch-logging-discovery-0 1/1 Running 0 23h 10.233.107.64 zhiyong-ksp1 <none> <none>
kubesphere-monitoring-system alertmanager-main-0 2/2 Running 4 (22h ago) 9d 10.233.107.56 zhiyong-ksp1 <none> <none>
kubesphere-monitoring-system kube-state-metrics-6d6786b44-bbb4f 3/3 Running 6 (22h ago) 9d 10.233.107.44 zhiyong-ksp1 <none> <none>
kubesphere-monitoring-system node-exporter-8sz74 2/2 Running 4 (22h ago) 9d 192.168.88.20 zhiyong-ksp1 <none> <none>
kubesphere-monitoring-system notification-manager-deployment-6f8c66ff88-pt4l8 2/2 Running 4 (22h ago) 9d 10.233.107.53 zhiyong-ksp1 <none> <none>
kubesphere-monitoring-system notification-manager-operator-6455b45546-nkmx8 2/2 Running 4 (22h ago) 9d 10.233.107.52 zhiyong-ksp1 <none> <none>
kubesphere-monitoring-system prometheus-k8s-0 2/2 Running 0 3h25m 10.233.107.85 zhiyong-ksp1 <none> <none>
kubesphere-monitoring-system prometheus-operator-66d997dccf-c968c 2/2 Running 4 (22h ago) 9d 10.233.107.37 zhiyong-ksp1 <none> <none>
kubesphere-system ks-apiserver-6b9bcb86f4-hsdzs 1/1 Running 2 (22h ago) 9d 10.233.107.55 zhiyong-ksp1 <none> <none>
kubesphere-system ks-console-599c49d8f6-ngb6b 1/1 Running 2 (22h ago) 9d 10.233.107.49 zhiyong-ksp1 <none> <none>
kubesphere-system ks-controller-manager-66747fcddc-r7cpt 1/1 Running 2 (22h ago) 9d 10.233.107.54 zhiyong-ksp1 <none> <none>
kubesphere-system ks-installer-5fd8bd46b8-dzhbb 1/1 Running 2 (22h ago) 9d 10.233.107.46 zhiyong-ksp1 <none> <none>
kubesphere-system minio-746f646bfb-hcf5c 1/1 Running 0 3h32m 10.233.107.71 zhiyong-ksp1 <none> <none>
kubesphere-system openldap-0 1/1 Running 1 (3h30m ago) 3h32m 10.233.107.69 zhiyong-ksp1 <none> <none>
root@zhiyong-ksp1:/home/zhiyong#
可以看到Pod们目前状态相当正常。再来看看helm:
root@zhiyong-ksp1:/home/zhiyong# helm
The Kubernetes package manager
Common actions for Helm:
- helm search: search for charts
- helm pull: download a chart to your local directory to view
- helm install: upload the chart to Kubernetes
- helm list: list releases of charts
Environment variables:
| Name | Description |
|------------------------------------|-----------------------------------------------------------------------------------|
| $HELM_CACHE_HOME | set an alternative location for storing cached files. |
| $HELM_CONFIG_HOME | set an alternative location for storing Helm configuration. |
| $HELM_DATA_HOME | set an alternative location for storing Helm data. |
| $HELM_DEBUG | indicate whether or not Helm is running in Debug mode |
| $HELM_DRIVER | set the backend storage driver. Values are: configmap, secret, memory, postgres |
| $HELM_DRIVER_SQL_CONNECTION_STRING | set the connection string the SQL storage driver should use. |
| $HELM_MAX_HISTORY | set the maximum number of helm release history. |
| $HELM_NAMESPACE | set the namespace used for the helm operations. |
| $HELM_NO_PLUGINS | disable plugins. Set HELM_NO_PLUGINS=1 to disable plugins. |
| $HELM_PLUGINS | set the path to the plugins directory |
| $HELM_REGISTRY_CONFIG | set the path to the registry config file. |
| $HELM_REPOSITORY_CACHE | set the path to the repository cache directory |
| $HELM_REPOSITORY_CONFIG | set the path to the repositories file. |
| $KUBECONFIG | set an alternative Kubernetes configuration file (default "~/.kube/config") |
| $HELM_KUBEAPISERVER | set the Kubernetes API Server Endpoint for authentication |
| $HELM_KUBECAFILE | set the Kubernetes certificate authority file. |
| $HELM_KUBEASGROUPS | set the Groups to use for impersonation using a comma-separated list. |
| $HELM_KUBEASUSER | set the Username to impersonate for the operation. |
| $HELM_KUBECONTEXT | set the name of the kubeconfig context. |
| $HELM_KUBETOKEN | set the Bearer KubeToken used for authentication. |
Helm stores cache, configuration, and data based on the following configuration order:
- If a HELM_*_HOME environment variable is set, it will be used
- Otherwise, on systems supporting the XDG base directory specification, the XDG variables will be used
- When no other location is set a default location will be used based on the operating system
By default, the default directories depend on the Operating System. The defaults are listed below:
| Operating System | Cache Path | Configuration Path | Data Path |
|------------------|---------------------------|--------------------------------|-------------------------|
| Linux | $HOME/.cache/helm | $HOME/.config/helm | $HOME/.local/share/helm |
| macOS | $HOME/Library/Caches/helm | $HOME/Library/Preferences/helm | $HOME/Library/helm |
| Windows | %TEMP%\helm | %APPDATA%\helm | %APPDATA%\helm |
Usage:
helm [command]
Available Commands:
completion generate autocompletion scripts for the specified shell
create create a new chart with the given name
dependency manage a chart's dependencies
env helm client environment information
get download extended information of a named release
help Help about any command
history fetch release history
install install a chart
lint examine a chart for possible issues
list list releases
package package a chart directory into a chart archive
plugin install, list, or uninstall Helm plugins
pull download a chart from a repository and (optionally) unpack it in local directory
repo add, list, remove, update, and index chart repositories
rollback roll back a release to a previous revision
search search for a keyword in charts
show show information of a chart
status display the status of the named release
template locally render templates
test run tests for a release
uninstall uninstall a release
upgrade upgrade a release
verify verify that a chart at the given path has been signed and is valid
version print the client version information
Flags:
--debug enable verbose output
-h, --help help for helm
--kube-apiserver string the address and the port for the Kubernetes API server
--kube-as-group stringArray group to impersonate for the operation, this flag can be repeated to specify multiple groups.
--kube-as-user string username to impersonate for the operation
--kube-ca-file string the certificate authority file for the Kubernetes API server connection
--kube-context string name of the kubeconfig context to use
--kube-token string bearer token used for authentication
--kubeconfig string path to the kubeconfig file
-n, --namespace string namespace scope for this request
--registry-config string path to the registry config file (default "/root/.config/helm/registry.json")
--repository-cache string path to the file containing cached repository indexes (default "/root/.cache/helm/repository")
--repository-config string path to the file containing repository names and URLs (default "/root/.config/helm/repositories.yaml")
Use "helm [command] --help" for more information about a command.
root@zhiyong-ksp1:/home/zhiyong#
可以看到KubeSphere已经很贴心地安装好helm,可以给非专业运维的开发人员省不少事情。
接下来就可以使用helm3安装了【Alluxio2.3之后不支持helm2】。当然也可以使用kubectl安装Alluxio,自行查看官网文档。
使用Helm部署Alluxio2.8.1
添加Alluxio helm chart的helm repro
root@zhiyong-ksp1:/home/zhiyong# helm repo add alluxio-charts https://alluxio-charts.storage.googleapis.com/openSource/2.8.1
"alluxio-charts" has been added to your repositories
root@zhiyong-ksp1:/home/zhiyong# helm list
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
root@zhiyong-ksp1:/home/zhiyong#
国人开源的组件还是充分考虑了国内特殊的网络环境,好评!!!一次成功。
查看配置
root@zhiyong-ksp1:/home/zhiyong# helm inspect values alluxio-charts/alluxio
#
# The Alluxio Open Foundation licenses this work under the Apache License, version 2.0
# (the "License"). You may not use this work except in compliance with the License, which is
# available at www.apache.org/licenses/LICENSE-2.0
#
# This software is distributed on an "AS IS" basis, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
# either express or implied, as more fully set forth in the License.
#
# See the NOTICE file distributed with this work for information regarding copyright ownership.
#
# This should not be modified in the usual case.
fullnameOverride: alluxio
## Common ##
# Docker Image
image: alluxio/alluxio
imageTag: 2.8.1
imagePullPolicy: IfNotPresent
# Security Context
user: 1000
group: 1000
fsGroup: 1000
# Service Account
# If not specified, Kubernetes will assign the 'default'
# ServiceAccount used for the namespace
serviceAccount:
# Image Pull Secret
# The secrets will need to be created externally from
# this Helm chart, but you can configure the Alluxio
# Pods to use the following list of secrets
# eg:
# imagePullSecrets:
# - ecr
# - dev
imagePullSecrets:
# Site properties for all the components
properties:
# alluxio.user.metrics.collection.enabled: 'true'
alluxio.security.stale.channel.purge.interval: 365d
# Recommended JVM Heap options for running in Docker
# Ref: https://developers.redhat.com/blog/2017/03/14/java-inside-docker/
# These JVM options are common to all Alluxio services
# jvmOptions:
# - "-XX:+UnlockExperimentalVMOptions"
# - "-XX:+UseCGroupMemoryLimitForHeap"
# - "-XX:MaxRAMFraction=2"
# Mount Persistent Volumes to all components
# mounts:
# - name: <persistentVolume claimName>
# path: <mountPath>
# Use labels to run Alluxio on a subset of the K8s nodes
# nodeSelector: {}
# A list of K8s Node taints to allow scheduling on.
# See the Kubernetes docs for more info:
# - https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/
# eg: tolerations: [ {"key": "env", "operator": "Equal", "value": "prod", "effect": "NoSchedule"} ]
# tolerations: []
## Master ##
master:
enabled: true
count: 1 # Controls the number of StatefulSets. For multiMaster mode increase this to >1.
replicas: 1 # Controls #replicas in a StatefulSet and should not be modified in the usual case.
env:
# Extra environment variables for the master pod
# Example:
# JAVA_HOME: /opt/java
args: # Arguments to Docker entrypoint
- master-only
- --no-format
# Properties for the master component
properties:
# Example: use ROCKS DB instead of Heap
# alluxio.master.metastore: ROCKS
# alluxio.master.metastore.dir: /metastore
resources:
# The default xmx is 8G
limits:
cpu: "4"
memory: "8Gi"
requests:
cpu: "1"
memory: "1Gi"
ports:
embedded: 19200
rpc: 19998
web: 19999
hostPID: false
hostNetwork: false
shareProcessNamespace: false
extraContainers: []
extraVolumeMounts: []
extraVolumes: []
extraServicePorts: []
# dnsPolicy will be ClusterFirstWithHostNet if hostNetwork: true
# and ClusterFirst if hostNetwork: false
# You can specify dnsPolicy here to override this inference
# dnsPolicy: ClusterFirst
# JVM options specific to the master container
jvmOptions:
nodeSelector: {}
# When using HA Alluxio masters, the expected startup time
# can take over 2-3 minutes (depending on leader elections,
# journal catch-up, etc). In that case it is recommended
# to allow for up to at least 3 minutes with the readinessProbe,
# though higher values may be desired for some leniancy.
# - Note that the livenessProbe does not wait for the
# readinessProbe to succeed first
#
# eg: 3 minute startupProbe and readinessProbe
# readinessProbe:
# initialDelaySeconds: 30
# periodSeconds: 10
# timeoutSeconds: 1
# failureThreshold: 15
# successThreshold: 3
# startupProbe:
# initialDelaySeconds: 60
# periodSeconds: 30
# timeoutSeconds: 5
# failureThreshold: 4
readinessProbe:
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 1
failureThreshold: 3
successThreshold: 1
livenessProbe:
initialDelaySeconds: 15
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 2
# If you are using Kubernetes 1.18+ or have the feature gate
# for it enabled, use startupProbe to prevent the livenessProbe
# from running until the startupProbe has succeeded
# startupProbe:
# initialDelaySeconds: 15
# periodSeconds: 30
# timeoutSeconds: 5
# failureThreshold: 2
tolerations: []
podAnnotations: {}
# The ServiceAccount provided here will have precedence over
# the global `serviceAccount`
serviceAccount:
jobMaster:
args:
- job-master
# Properties for the jobMaster component
properties:
resources:
limits:
cpu: "4"
memory: "8Gi"
requests:
cpu: "1"
memory: "1Gi"
ports:
embedded: 20003
rpc: 20001
web: 20002
# JVM options specific to the jobMaster container
jvmOptions:
readinessProbe:
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 1
failureThreshold: 3
successThreshold: 1
livenessProbe:
initialDelaySeconds: 15
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 2
# If you are using Kubernetes 1.18+ or have the feature gate
# for it enabled, use startupProbe to prevent the livenessProbe
# from running until the startupProbe has succeeded
# startupProbe:
# initialDelaySeconds: 15
# periodSeconds: 30
# timeoutSeconds: 5
# failureThreshold: 2
# Alluxio supports journal type of UFS and EMBEDDED
# UFS journal with HDFS example
# journal:
# type: "UFS"
# ufsType: "HDFS"
# folder: "hdfs://{$hostname}:{$hostport}/journal"
# EMBEDDED journal to /journal example
# journal:
# type: "EMBEDDED"
# folder: "/journal"
journal:
# [ Required values ]
type: "UFS" # One of "UFS" or "EMBEDDED"
folder: "/journal" # Master journal directory or equivalent storage path
#
# [ Conditionally required values ]
#
## [ UFS-backed journal options ]
## - required when using a UFS-type journal (journal.type="UFS")
##
## ufsType is one of "local" or "HDFS"
## - "local" results in a PV being allocated to each Master Pod as the journal
## - "HDFS" results in no PV allocation, it is up to you to ensure you have
## properly configured the required Alluxio properties for Alluxio to access
## the HDFS URI designated as the journal folder
ufsType: "local"
#
## [ K8s volume options ]
## - required when using an EMBEDDED journal (journal.type="EMBEDDED")
## - required when using a local UFS journal (journal.type="UFS" and journal.ufsType="local")
##
## volumeType controls the type of journal volume.
volumeType: persistentVolumeClaim # One of "persistentVolumeClaim" or "emptyDir"
## size sets the requested storage capacity for a persistentVolumeClaim,
## or the sizeLimit on an emptyDir PV.
size: 1Gi
### Unique attributes to use when the journal is persistentVolumeClaim
storageClass: "standard"
accessModes:
- ReadWriteOnce
### Unique attributes to use when the journal is emptyDir
medium: ""
#
# [ Optional values ]
format: # Configuration for journal formatting job
runFormat: false # Change to true to format journal
# You can enable metastore to use ROCKS DB instead of Heap
# metastore:
# volumeType: persistentVolumeClaim # Options: "persistentVolumeClaim" or "emptyDir"
# size: 1Gi
# mountPath: /metastore
# # Attributes to use when the metastore is persistentVolumeClaim
# storageClass: "standard"
# accessModes:
# - ReadWriteOnce
# # Attributes to use when the metastore is emptyDir
# medium: ""
## Worker ##
worker:
enabled: true
env:
# Extra environment variables for the worker pod
# Example:
# JAVA_HOME: /opt/java
args:
- worker-only
- --no-format
# Properties for the worker component
properties:
resources:
limits:
cpu: "4"
memory: "4Gi"
requests:
cpu: "1"
memory: "2Gi"
ports:
rpc: 29999
web: 30000
# hostPID requires escalated privileges
hostPID: false
hostNetwork: false
shareProcessNamespace: false
extraContainers: []
extraVolumeMounts: []
extraVolumes: []
# dnsPolicy will be ClusterFirstWithHostNet if hostNetwork: true
# and ClusterFirst if hostNetwork: false
# You can specify dnsPolicy here to override this inference
# dnsPolicy: ClusterFirst
# JVM options specific to the worker container
jvmOptions:
nodeSelector: {}
readinessProbe:
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 1
failureThreshold: 3
successThreshold: 1
livenessProbe:
initialDelaySeconds: 15
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 2
# If you are using Kubernetes 1.18+ or have the feature gate
# for it enabled, use startupProbe to prevent the livenessProbe
# from running until the startupProbe has succeeded
# startupProbe:
# initialDelaySeconds: 15
# periodSeconds: 30
# timeoutSeconds: 5
# failureThreshold: 2
tolerations: []
podAnnotations: {}
# The ServiceAccount provided here will have precedence over
# the global `serviceAccount`
serviceAccount:
# Setting fuseEnabled to true will embed Fuse in worker process. The worker pods will
# launch the Alluxio workers using privileged containers with `SYS_ADMIN` capability.
# Be sure to give root access to the pod by setting the global user/group/fsGroup
# values to `0` to turn on Fuse in worker.
fuseEnabled: false
jobWorker:
args:
- job-worker
# Properties for the jobWorker component
properties:
resources:
limits:
cpu: "4"
memory: "4Gi"
requests:
cpu: "1"
memory: "1Gi"
ports:
rpc: 30001
data: 30002
web: 30003
# JVM options specific to the jobWorker container
jvmOptions:
readinessProbe:
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 1
failureThreshold: 3
successThreshold: 1
livenessProbe:
initialDelaySeconds: 15
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 2
# If you are using Kubernetes 1.18+ or have the feature gate
# for it enabled, use startupProbe to prevent the livenessProbe
# from running until the startupProbe has succeeded
# startupProbe:
# initialDelaySeconds: 15
# periodSeconds: 30
# timeoutSeconds: 5
# failureThreshold: 2
# Tiered Storage
# emptyDir example
# - level: 0
# alias: MEM
# mediumtype: MEM
# path: /dev/shm
# type: emptyDir
# quota: 1Gi
#
# hostPath example
# - level: 0
# alias: MEM
# mediumtype: MEM
# path: /dev/shm
# type: hostPath
# quota: 1Gi
#
# persistentVolumeClaim example
# - level: 1
# alias: SSD
# mediumtype: SSD
# type: persistentVolumeClaim
# name: alluxio-ssd
# path: /dev/ssd
# quota: 10Gi
#
# multi-part mediumtype example
# - level: 1
# alias: SSD,HDD
# mediumtype: SSD,HDD
# type: persistentVolumeClaim
# name: alluxio-ssd,alluxio-hdd
# path: /dev/ssd,/dev/hdd
# quota: 10Gi,10Gi
tieredstore:
levels:
- level: 0
alias: MEM
mediumtype: MEM
path: /dev/shm
type: emptyDir
quota: 1Gi
high: 0.95
low: 0.7
## Proxy ##
proxy:
enabled: false # Enable this to enable the proxy for REST API
env:
# Extra environment variables for the Proxy pod
# Example:
# JAVA_HOME: /opt/java
args:
- proxy
# Properties for the proxy component
properties:
resources:
requests:
cpu: "0.5"
memory: "1Gi"
limits:
cpu: "4"
memory: "4Gi"
ports:
web: 39999
hostNetwork: false
# dnsPolicy will be ClusterFirstWithHostNet if hostNetwork: true
# and ClusterFirst if hostNetwork: false
# You can specify dnsPolicy here to override this inference
# dnsPolicy: ClusterFirst
# JVM options specific to proxy containers
jvmOptions:
nodeSelector: {}
tolerations: []
podAnnotations: {}
# The ServiceAccount provided here will have precedence over
# the global `serviceAccount`
serviceAccount:
# Short circuit related properties
shortCircuit:
enabled: true
# The policy for short circuit can be "local" or "uuid",
# local means the cache directory is in the same mount namespace,
# uuid means interact with domain socket
policy: uuid
# volumeType controls the type of shortCircuit volume.
# It can be "persistentVolumeClaim" or "hostPath"
volumeType: persistentVolumeClaim
size: 1Mi
# Attributes to use if the domain socket volume is PVC
pvcName: alluxio-worker-domain-socket
accessModes:
- ReadWriteOnce
storageClass: standard
# Attributes to use if the domain socket volume is hostPath
hostPath: "/tmp/alluxio-domain" # The hostPath directory to use
## FUSE ##
fuse:
env:
# Extra environment variables for the fuse pod
# Example:
# JAVA_HOME: /opt/java
# Change both to true to deploy FUSE
enabled: false
clientEnabled: false
# Properties for the fuse component
properties:
# Customize the MaxDirectMemorySize
# These options are specific to the FUSE daemon
jvmOptions:
- "-XX:MaxDirectMemorySize=2g"
hostNetwork: true
# hostPID requires escalated privileges
hostPID: true
dnsPolicy: ClusterFirstWithHostNet
livenessProbeEnabled: true
livenessProbe:
initialDelaySeconds: 15
periodSeconds: 30
failureThreshold: 2
user: 0
group: 0
fsGroup: 0
# Default fuse mount options separated by commas, shared by all fuse containers
mountOptions: allow_other
# Default fuse mount point inside fuse container, shared by all fuse containers.
# Non-empty value is required.
mountPoint: /mnt/alluxio-fuse
# Default alluxio path to be mounted, shared by all fuse containers.
alluxioPath: /
resources:
requests:
cpu: "0.5"
memory: "1Gi"
limits:
cpu: "4"
memory: "4Gi"
nodeSelector: {}
tolerations: []
podAnnotations: {}
# The ServiceAccount provided here will have precedence over
# the global `serviceAccount`
serviceAccount:
## Secrets ##
# Format: (<name>:<mount path under /secrets/>):
# secrets:
# master: # Shared by master and jobMaster containers
# alluxio-hdfs-config: hdfsConfig
# worker: # Shared by worker and jobWorker containers
# alluxio-hdfs-config: hdfsConfig
# logserver: # Used by the logserver container
# alluxio-hdfs-config: hdfsConfig
## ConfigMaps ##
# Format: (<name>:<mount path under /configmaps/>):
# configmaps:
# master: # Shared by master and jobMaster containers
# alluxio-hdfs-config: hdfsConfig
# worker: # Shared by worker and jobWorker containers
# alluxio-hdfs-config: hdfsConfig
# logserver: # Used by the logserver container
# alluxio-hdfs-config: hdfsConfig
## Metrics System ##
# Settings for Alluxio metrics. Disabled by default.
metrics:
enabled: false
# Enable ConsoleSink by class name
ConsoleSink:
enabled: false
# Polling period for ConsoleSink
period: 10
# Unit of poll period
unit: seconds
# Enable CsvSink by class name
CsvSink:
enabled: false
# Polling period for CsvSink
period: 1
# Unit of poll period
unit: seconds
# Polling directory for CsvSink, ensure this directory exists!
directory: /tmp/alluxio-metrics
# Enable JmxSink by class name
JmxSink:
enabled: false
# Jmx domain
domain: org.alluxio
# Enable GraphiteSink by class name
GraphiteSink:
enabled: false
# Hostname of Graphite server
host: NONE
# Port of Graphite server
port: NONE
# Poll period
period: 10
# Unit of poll period
unit: seconds
# Prefix to prepend to metric name
prefix: ""
# Enable Slf4jSink by class name
Slf4jSink:
enabled: false
# Poll period
period: 10
# Units of poll period
unit: seconds
# Contains all metrics
filterClass: null
# Contains all metrics
filterRegex: null
# Enable PrometheusMetricsServlet by class name
PrometheusMetricsServlet:
enabled: false
# Pod annotations for Prometheus
# podAnnotations:
# prometheus.io/scrape: "true"
# prometheus.io/port: "19999"
# prometheus.io/path: "/metrics/prometheus/"
podAnnotations: {}
# Remote logging server
logserver:
enabled: false
replicas: 1
env:
# Extra environment variables for the logserver pod
# Example:
# JAVA_HOME: /opt/java
args: # Arguments to Docker entrypoint
- logserver
# Properties for the logserver component
properties:
resources:
# The default xmx is 8G
limits:
cpu: "4"
memory: "8Gi"
requests:
cpu: "1"
memory: "1Gi"
ports:
logging: 45600
hostPID: false
hostNetwork: false
# dnsPolicy will be ClusterFirstWithHostNet if hostNetwork: true
# and ClusterFirst if hostNetwork: false
# You can specify dnsPolicy here to override this inference
# dnsPolicy: ClusterFirst
# JVM options specific to the logserver container
jvmOptions:
nodeSelector: {}
tolerations: []
# The strategy field corresponds to the .spec.strategy field for the deployment
# This specifies the strategy used to replace old Pods by new ones
# https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy
# The default is Recreate which kills the existing Pod before creating a new one
# Note: When using RWO PVCs, the strategy MUST be Recreate, because the PVC cannot
# be passed from the old Pod to the new one
# When using RWX PVCs, you can use RollingUpdate strategy to ensure zero down time
# Example:
# strategy:
# type: RollingUpdate
# rollingUpdate:
# maxUnavailable: 25%
# maxSurge: 1
strategy:
type: Recreate
# volumeType controls the type of log volume.
# It can be "persistentVolumeClaim" or "hostPath" or "emptyDir"
volumeType: persistentVolumeClaim
# Attributes to use if the log volume is PVC
pvcName: alluxio-logserver-logs
# Note: If using RWO, the strategy MUST be Recreate
# If using RWX, the strategy can be RollingUpdate
accessModes:
- ReadWriteOnce
storageClass: standard
# If you are dynamically provisioning PVs, the selector on the PVC should be empty.
# Ref: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#class-1
selector: {}
# If you are manually allocating PV for the logserver,
# it is recommended to use selectors to make sure the PV and PVC match as expected.
# You can specify selectors like below:
# Example:
# selector:
# matchLabels:
# role: alluxio-logserver
# app: alluxio
# chart: alluxio-<chart version>
# release: alluxio
# heritage: Helm
# dc: data-center-1
# region: us-east
# Attributes to use if the log volume is hostPath
hostPath: "/tmp/alluxio-logs" # The hostPath directory to use
# Attributes to use when the log volume is emptyDir
medium: ""
size: 4Gi
# The pod's HostAliases. HostAliases is an optional list of hosts and IPs that will be injected into the pod's hosts file if specified.
# It is mainly to provide the external host addresses for services not in the K8s cluster, like HDFS.
# Example:
# hostAliases:
# - ip: "192.168.0.1"
# hostnames:
# - "example1.com"
# - "example2.com"
# kubernetes CSI plugin
csi:
enabled: false
imagePullPolicy: IfNotPresent
controllerPlugin:
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
provisioner:
# for kubernetes 1.17 or above
image: k8s.gcr.io/sig-storage/csi-provisioner:v2.0.5
resources:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 10m
memory: 20Mi
controller:
resources:
limits:
cpu: 200m
memory: 200Mi
requests:
cpu: 10m
memory: 20Mi
# Run alluxio fuse process inside csi nodeserver container if mountInPod = false
# Run alluxio fuse process inside a separate pod if mountInPod = true
mountInPod: false
nodePlugin:
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
nodeserver:
resources:
# fuse in nodeserver container needs more resources
limits:
cpu: "4"
memory: "8Gi"
requests:
cpu: "1"
memory: "1Gi"
driverRegistrar:
image: k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.0.0
resources:
limits:
cpu: 100m
memory: 100Mi
requests:
cpu: 10m
memory: 20Mi
# for csi client
clientEnabled: false
accessModes:
- ReadWriteOnce
quota: 100Gi
mountPath: /data
alluxioPath: /
mountOptions:
- direct_io
- allow_other
- entry_timeout=36000
- attr_timeout=36000
- max_readahead=0
javaOptions: "-Dalluxio.user.metadata.cache.enabled=true "
root@zhiyong-ksp1:/home/zhiyong#
可以看出Alluxio的配置项很多,默认已经写好了一些配置,例如从DockerHub官方镜像库拉取2.8.1的镜像,配置了各种端口,还有request请求最少资源及limits限制最大资源。默认的持久化策略是将一个持久卷本地挂载在master Pod的位置 /journal
。看起来一切都比较正常,并没有什么匪夷所思的骚操作,笔者感觉可以接受默认的配置。如果不能接受默认的配置,就需要进行修改。官网也列举了如下修改持久化策略的案例:
分别是:
Example: Amazon S3 as the under store
Example: Single Master and Journal in a Persistent Volume
Example: 下方举例说明如何将一个持久卷挂载在本地master pod
Example: HDFS as Journal
Example: Multi-master with Embedded Journal in Persistent Volumes
Example: Multi-master with Embedded Journal in emptyDir Volumes
Example: HDFS as the under store
Example: Off-heap Metastore Management in Persistent Volumes
Example: Off-heap Metastore Management in emptyDir
Volumes
Example: Multiple Secrets
Examples: Alluxio Storage Management
可以根据需要进行修改。例如可以修改使用S3对象存储或者HDFS作为底层的持久化卷,使用tempdir这种临时卷存储【与pod生命周期相同】。
不再赘述。
修改配置
root@zhiyong-ksp1:~# mkdir -p /home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911
root@zhiyong-ksp1:~# cd /home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# vim alluxioconfig.yaml
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# ll
总用量 16
drwxr-xr-x 2 root root 4096 9月 11 19:11 ./
drwxr-xr-x 3 root root 4096 9月 11 18:56 ../
-rw-r--r-- 1 root root 7172 9月 11 19:11 alluxioconfig.yaml
内容和上方看到的描述文件相同:
fullnameOverride: alluxio
image: alluxio/alluxio
imageTag: 2.8.1
imagePullPolicy: IfNotPresent
user: 1000
group: 1000
fsGroup: 1000
serviceAccount:
imagePullSecrets:
properties:
alluxio.security.stale.channel.purge.interval: 365d
master:
enabled: true
count: 1 # Controls the number of StatefulSets. For multiMaster mode increase this to >1.
replicas: 1 # Controls #replicas in a StatefulSet and should not be modified in the usual case.
env:
args: # Arguments to Docker entrypoint
- master-only
- --no-format
properties:
resources:
# The default xmx is 8G
limits:
cpu: "4"
memory: "8Gi"
requests:
cpu: "1"
memory: "1Gi"
ports:
embedded: 19200
rpc: 19998
web: 19999
hostPID: false
hostNetwork: false
shareProcessNamespace: false
extraContainers: []
extraVolumeMounts: []
extraVolumes: []
extraServicePorts: []
jvmOptions:
nodeSelector: {}
readinessProbe:
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 1
failureThreshold: 3
successThreshold: 1
livenessProbe:
initialDelaySeconds: 15
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 2
tolerations: []
podAnnotations: {}
serviceAccount:
jobMaster:
args:
- job-master
properties:
resources:
limits:
cpu: "4"
memory: "8Gi"
requests:
cpu: "1"
memory: "1Gi"
ports:
embedded: 20003
rpc: 20001
web: 20002
jvmOptions:
readinessProbe:
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 1
failureThreshold: 3
successThreshold: 1
livenessProbe:
initialDelaySeconds: 15
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 2
journal:
type: "UFS" # One of "UFS" or "EMBEDDED"
folder: "/journal" # Master journal directory or equivalent storage path
ufsType: "local"
volumeType: persistentVolumeClaim # One of "persistentVolumeClaim" or "emptyDir"
size: 1Gi
storageClass: "standard"
accessModes:
- ReadWriteOnce
medium: ""
format: # Configuration for journal formatting job
runFormat: false # Change to true to format journal
worker:
enabled: true
env:
args:
- worker-only
- --no-format
properties:
resources:
limits:
cpu: "4"
memory: "4Gi"
requests:
cpu: "1"
memory: "2Gi"
ports:
rpc: 29999
web: 30000
hostPID: false
hostNetwork: false
shareProcessNamespace: false
extraContainers: []
extraVolumeMounts: []
extraVolumes: []
jvmOptions:
nodeSelector: {}
readinessProbe:
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 1
failureThreshold: 3
successThreshold: 1
livenessProbe:
initialDelaySeconds: 15
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 2
tolerations: []
podAnnotations: {}
serviceAccount:
fuseEnabled: false
jobWorker:
args:
- job-worker
properties:
resources:
limits:
cpu: "4"
memory: "4Gi"
requests:
cpu: "1"
memory: "1Gi"
ports:
rpc: 30001
data: 30002
web: 30003
jvmOptions:
readinessProbe:
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 1
failureThreshold: 3
successThreshold: 1
livenessProbe:
initialDelaySeconds: 15
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 2
tieredstore:
levels:
- level: 0
alias: MEM
mediumtype: MEM
path: /dev/shm
type: emptyDir
quota: 1Gi
high: 0.95
low: 0.7
proxy:
enabled: false # Enable this to enable the proxy for REST API
env:
args:
- proxy
properties:
resources:
requests:
cpu: "0.5"
memory: "1Gi"
limits:
cpu: "4"
memory: "4Gi"
ports:
web: 39999
hostNetwork: false
jvmOptions:
nodeSelector: {}
tolerations: []
podAnnotations: {}
serviceAccount:
shortCircuit:
enabled: true
policy: uuid
volumeType: persistentVolumeClaim
size: 1Mi
pvcName: alluxio-worker-domain-socket
accessModes:
- ReadWriteOnce
storageClass: standard
hostPath: "/tmp/alluxio-domain" # The hostPath directory to use
## FUSE ##
fuse:
env:
enabled: false
clientEnabled: false
properties:
jvmOptions:
- "-XX:MaxDirectMemorySize=2g"
hostNetwork: true
hostPID: true
dnsPolicy: ClusterFirstWithHostNet
livenessProbeEnabled: true
livenessProbe:
initialDelaySeconds: 15
periodSeconds: 30
failureThreshold: 2
user: 0
group: 0
fsGroup: 0
mountOptions: allow_other
mountPoint: /mnt/alluxio-fuse
alluxioPath: /
resources:
requests:
cpu: "0.5"
memory: "1Gi"
limits:
cpu: "4"
memory: "4Gi"
nodeSelector: {}
tolerations: []
podAnnotations: {}
serviceAccount:
metrics:
enabled: false
ConsoleSink:
enabled: false
period: 10
unit: seconds
CsvSink:
enabled: false
period: 1
unit: seconds
directory: /tmp/alluxio-metrics
JmxSink:
enabled: false
domain: org.alluxio
GraphiteSink:
enabled: false
host: NONE
port: NONE
period: 10
unit: seconds
prefix: ""
Slf4jSink:
enabled: false
period: 10
unit: seconds
filterClass: null
filterRegex: null
PrometheusMetricsServlet:
enabled: false
podAnnotations: {}
logserver:
enabled: false
replicas: 1
env:
args: # Arguments to Docker entrypoint
- logserver
properties:
resources:
limits:
cpu: "4"
memory: "8Gi"
requests:
cpu: "1"
memory: "1Gi"
ports:
logging: 45600
hostPID: false
hostNetwork: false
jvmOptions:
nodeSelector: {}
tolerations: []
strategy:
type: Recreate
volumeType: persistentVolumeClaim
pvcName: alluxio-logserver-logs
accessModes:
- ReadWriteOnce
storageClass: standard
selector: {}
hostPath: "/tmp/alluxio-logs" # The hostPath directory to use
medium: ""
size: 4Gi
csi:
enabled: false
imagePullPolicy: IfNotPresent
controllerPlugin:
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
provisioner:
image: k8s.gcr.io/sig-storage/csi-provisioner:v2.0.5
resources:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 10m
memory: 20Mi
controller:
resources:
limits:
cpu: 200m
memory: 200Mi
requests:
cpu: 10m
memory: 20Mi
mountInPod: false
nodePlugin:
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
nodeserver:
resources:
limits:
cpu: "4"
memory: "8Gi"
requests:
cpu: "1"
memory: "1Gi"
driverRegistrar:
image: k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.0.0
resources:
limits:
cpu: 100m
memory: 100Mi
requests:
cpu: 10m
memory: 20Mi
clientEnabled: false
accessModes:
- ReadWriteOnce
quota: 100Gi
mountPath: /data
alluxioPath: /
mountOptions:
- direct_io
- allow_other
- entry_timeout=36000
- attr_timeout=36000
- max_readahead=0
javaOptions: "-Dalluxio.user.metadata.cache.enabled=true "
太长了,笔者删除了其中的注释内容。。。读者根据自己的需要修改配置即可【例如机器资源不足,需要减少配额;或者挂载HDFS作为底层存储】
安装
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# helm install alluxio -f /home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911/alluxioconfig.yaml alluxio-charts/alluxio
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl get pods -owide --all-namespaces | grep alluxio
default alluxio-master-0 0/2 Pending 0 3m34s <none> <none> <none> <none>
default alluxio-worker-rczrd 0/2 Pending 0 3m36s <none> <none> <none> <none>
可以看到此时alluxio的pod一如既往的Pending了。
定位Pod失败的原因
先查看2个失败的master有哪些日志:
root@zhiyong-ksp1:~# kubectl describe pod alluxio-master-0
Name: alluxio-master-0
Namespace: default
Priority: 0
Node: <none>
Labels: app=alluxio
chart=alluxio-0.6.48
controller-revision-hash=alluxio-master-5bb869cb7d
heritage=Helm
name=alluxio-master
release=alluxio
role=alluxio-master
statefulset.kubernetes.io/pod-name=alluxio-master-0
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/alluxio-master
Containers:
alluxio-master:
Image: alluxio/alluxio:2.8.1
Ports: 19998/TCP, 19999/TCP
Host Ports: 0/TCP, 0/TCP
Command:
tini
--
/entrypoint.sh
Args:
master-only
--no-format
Limits:
cpu: 4
memory: 8Gi
Requests:
cpu: 1
memory: 1Gi
Liveness: tcp-socket :rpc delay=15s timeout=5s period=30s #success=1 #failure=2
Readiness: tcp-socket :rpc delay=10s timeout=1s period=10s #success=1 #failure=3
Environment Variables from:
alluxio-config ConfigMap Optional: false
Environment:
ALLUXIO_MASTER_HOSTNAME: (v1:status.podIP)
Mounts:
/journal from alluxio-journal (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hg5br (ro)
alluxio-job-master:
Image: alluxio/alluxio:2.8.1
Ports: 20001/TCP, 20002/TCP
Host Ports: 0/TCP, 0/TCP
Command:
tini
--
/entrypoint.sh
Args:
job-master
Limits:
cpu: 4
memory: 8Gi
Requests:
cpu: 1
memory: 1Gi
Liveness: tcp-socket :job-rpc delay=15s timeout=5s period=30s #success=1 #failure=2
Readiness: tcp-socket :job-rpc delay=10s timeout=1s period=10s #success=1 #failure=3
Environment Variables from:
alluxio-config ConfigMap Optional: false
Environment:
ALLUXIO_MASTER_HOSTNAME: (v1:status.podIP)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hg5br (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
alluxio-journal:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: alluxio-journal-alluxio-master-0
ReadOnly: false
kube-api-access-hg5br:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 6m42s default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
Warning FailedScheduling 97s default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
可以看到持久卷pv没有绑定,导致Pod启动失败:
显然持久卷声明pvc已挂载,但是状态是等待中。。。
并且显示storageclass没有发现standard。。。
root@zhiyong-ksp1:~# kubectl get storageclass
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
local (default) openebs.io/local Delete WaitForFirstConsumer false 34d
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl describe storageclass local
Name: local
IsDefaultClass: Yes
Annotations: cas.openebs.io/config=- name: StorageType
value: "hostpath"
- name: BasePath
value: "/var/openebs/local/"
,kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"cas.openebs.io/config":"- name: StorageType\n value: \"hostpath\"\n- name: BasePath\n value: \"/var/openebs/local/\"\n","openebs.io/cas-type":"local","storageclass.beta.kubernetes.io/is-default-class":"true","storageclass.kubesphere.io/supported-access-modes":"[\"ReadWriteOnce\"]"},"name":"local"},"provisioner":"openebs.io/local","reclaimPolicy":"Delete","volumeBindingMode":"WaitForFirstConsumer"}
,openebs.io/cas-type=local,storageclass.beta.kubernetes.io/is-default-class=true,storageclass.kubesphere.io/supported-access-modes=["ReadWriteOnce"]
Provisioner: openebs.io/local
Parameters: <none>
AllowVolumeExpansion: <unset>
MountOptions: <none>
ReclaimPolicy: Delete
VolumeBindingMode: WaitForFirstConsumer
Events: <none>
root@zhiyong-ksp1:~# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
alluxio-journal-alluxio-master-0 Pending standard 21m
alluxio-worker-domain-socket Pending standard 21m
root@zhiyong-ksp1:~# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-23fc88d9-da65-47fc-80e6-b15976a8bcf4 20Gi RWO Delete Bound kubesphere-monitoring-system/prometheus-k8s-db-prometheus-k8s-0 local 33d
pvc-402e21a4-a811-46a7-b75b-e295512bab25 4Gi RWO Delete Bound kubesphere-logging-system/data-elasticsearch-logging-discovery-0 local 25d
pvc-5d4597cd-404d-4bd8-8b9a-71f32c44f1d1 8Gi RWO Delete Bound kubesphere-devops-system/devops-jenkins local 24d
pvc-861a6ff8-7a6b-407e-bb73-aef721ef586d 20Gi RWO Delete Bound kubesphere-logging-system/data-elasticsearch-logging-data-0 local 25d
pvc-a777e6f9-c564-419f-85fd-23bee491ef19 20Gi RWO Delete Bound kubesphere-system/minio local 24d
pvc-cad97ef9-8fed-4540-a9d5-b91331a193f8 2Gi RWO Delete Bound kubesphere-system/openldap-pvc-openldap-0 local 24d
确实是没有。
解决storageclass找不到standard的问题
将yaml中的:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: alluxio-journal-alluxio-master-0
namespace: default
labels:
app: alluxio
name: alluxio-master
role: alluxio-master
finalizers:
- kubernetes.io/pvc-protection
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: standard
volumeMode: Filesystem
这个找不到的内容为standard的配置项storageClassName的值修改为已存在的local会报错:
显然此时不能直接修改。于是只好改变方式,先卸载:
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# helm list
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
alluxio default 1 2022-09-11 19:13:31.30716236 +0800 CST deployed alluxio-0.6.48
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# helm delete alluxio
release "alluxio" uninstalled
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# helm list
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
并删除之前的PVC:
查看当前部署情况:
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl get deployment --all-namespaces
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
argocd devops-argocd-applicationset-controller 1/1 1 1 24d
argocd devops-argocd-dex-server 1/1 1 1 24d
argocd devops-argocd-notifications-controller 1/1 1 1 24d
argocd devops-argocd-redis 1/1 1 1 24d
argocd devops-argocd-repo-server 1/1 1 1 24d
argocd devops-argocd-server 1/1 1 1 24d
istio-system istiod-1-11-2 1/1 1 1 25d
istio-system jaeger-collector 1/1 1 1 25d
istio-system jaeger-operator 1/1 1 1 25d
istio-system jaeger-query 1/1 1 1 25d
istio-system kiali 1/1 1 1 25d
istio-system kiali-operator 1/1 1 1 25d
kube-system calico-kube-controllers 1/1 1 1 34d
kube-system coredns 2/2 2 2 34d
kube-system openebs-localpv-provisioner 1/1 1 1 34d
kubesphere-controls-system default-http-backend 1/1 1 1 34d
kubesphere-controls-system kubectl-admin 1/1 1 1 34d
kubesphere-devops-system devops-apiserver 1/1 1 1 24d
kubesphere-devops-system devops-controller 1/1 1 1 24d
kubesphere-devops-system devops-jenkins 1/1 1 1 24d
kubesphere-monitoring-system kube-state-metrics 1/1 1 1 34d
kubesphere-monitoring-system notification-manager-deployment 1/1 1 1 34d
kubesphere-monitoring-system notification-manager-operator 1/1 1 1 34d
kubesphere-monitoring-system prometheus-operator 1/1 1 1 34d
kubesphere-system ks-apiserver 1/1 1 1 34d
kubesphere-system ks-console 1/1 1 1 34d
kubesphere-system ks-controller-manager 1/1 1 1 34d
kubesphere-system ks-installer 1/1 1 1 34d
kubesphere-system minio 1/1 1 1 24d
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911#
查看之前创建成功的pvc:
显然这个值为local的storageClass就是Filesystem类型的存储卷【基于openebs】,如果使用了NFS等其它类型的持久化,就需要自行手动创建storageclass。
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# ll
总用量 16
drwxr-xr-x 2 root root 4096 9月 11 19:12 ./
drwxr-xr-x 3 root root 4096 9月 11 18:56 ../
-rw-r--r-- 1 root root 7172 9月 11 19:11 alluxioconfig.yaml
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# vim alluxioconfig.yaml
目前需要手动创建standard的storageClass才能进行下一步。。。
创建storageClass
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# vim standardstorageclass.yaml
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl create -f standardstorageclass.yaml
storageclass.storage.k8s.io/standard created
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl get storageclass
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
local (default) openebs.io/local Delete WaitForFirstConsumer false 34d
standard kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 15s
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911#
内容如下:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: standard
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
指定为延迟绑定模式。此时可以看到创建了一个新的storageclass。
再次安装
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# helm install alluxio -f /home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911/alluxioconfig.yaml alluxio-charts/alluxio
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl get pods -owide --all-namespaces | grep alluxio
default alluxio-master-0 0/2 Pending 0 92s <none> <none> <none> <none>
default alluxio-worker-z6hkq 0/2 Pending 0 93s <none> <none> <none> <none>
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl describe pod alluxio-master-0
Name: alluxio-master-0
Namespace: default
Priority: 0
Node: <none>
Labels: app=alluxio
chart=alluxio-0.6.48
controller-revision-hash=alluxio-master-5bb869cb7d
heritage=Helm
name=alluxio-master
release=alluxio
role=alluxio-master
statefulset.kubernetes.io/pod-name=alluxio-master-0
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/alluxio-master
Containers:
alluxio-master:
Image: alluxio/alluxio:2.8.1
Ports: 19998/TCP, 19999/TCP
Host Ports: 0/TCP, 0/TCP
Command:
tini
--
/entrypoint.sh
Args:
master-only
--no-format
Limits:
cpu: 4
memory: 8Gi
Requests:
cpu: 1
memory: 1Gi
Liveness: tcp-socket :rpc delay=15s timeout=5s period=30s #success=1 #failure=2
Readiness: tcp-socket :rpc delay=10s timeout=1s period=10s #success=1 #failure=3
Environment Variables from:
alluxio-config ConfigMap Optional: false
Environment:
ALLUXIO_MASTER_HOSTNAME: (v1:status.podIP)
Mounts:
/journal from alluxio-journal (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-88v9g (ro)
alluxio-job-master:
Image: alluxio/alluxio:2.8.1
Ports: 20001/TCP, 20002/TCP
Host Ports: 0/TCP, 0/TCP
Command:
tini
--
/entrypoint.sh
Args:
job-master
Limits:
cpu: 4
memory: 8Gi
Requests:
cpu: 1
memory: 1Gi
Liveness: tcp-socket :job-rpc delay=15s timeout=5s period=30s #success=1 #failure=2
Readiness: tcp-socket :job-rpc delay=10s timeout=1s period=10s #success=1 #failure=3
Environment Variables from:
alluxio-config ConfigMap Optional: false
Environment:
ALLUXIO_MASTER_HOSTNAME: (v1:status.podIP)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-88v9g (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
alluxio-journal:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: alluxio-journal-alluxio-master-0
ReadOnly: false
kube-api-access-88v9g:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m10s default-scheduler 0/1 nodes are available: 1 node(s) didn't find available persistent volumes to bind. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911#
可以看出此时并没有严重的错误,只是没有可用的持久卷去绑定,导致pod一致起不来。接下来就是需要手动创建一些pv给pod使用。
创建pv
参照K8S的官网:https://kubernetes.io/docs/concepts/storage/persistent-volumes/
及Alluxio官网,默认每个日志卷应至少1GI,因为每个alluxio master Pod将有一个请求1Gi存储的PersistentVolumeClaim。。。
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# vim alluxio-master-journal-pv.yaml
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl create -f alluxio-master-journal-pv.yaml
persistentvolume/alluxio-journal-0 created
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
alluxio-journal-0 4Gi RWO Retain Available standard 9s
pvc-23fc88d9-da65-47fc-80e6-b15976a8bcf4 20Gi RWO Delete Bound kubesphere-monitoring-system/prometheus-k8s-db-prometheus-k8s-0 local 34d
pvc-402e21a4-a811-46a7-b75b-e295512bab25 4Gi RWO Delete Bound kubesphere-logging-system/data-elasticsearch-logging-discovery-0 local 25d
pvc-5d4597cd-404d-4bd8-8b9a-71f32c44f1d1 8Gi RWO Delete Bound kubesphere-devops-system/devops-jenkins local 25d
pvc-861a6ff8-7a6b-407e-bb73-aef721ef586d 20Gi RWO Delete Bound kubesphere-logging-system/data-elasticsearch-logging-data-0 local 25d
pvc-a777e6f9-c564-419f-85fd-23bee491ef19 20Gi RWO Delete Bound kubesphere-system/minio local 25d
pvc-cad97ef9-8fed-4540-a9d5-b91331a193f8 2Gi RWO Delete Bound kubesphere-system/openldap-pvc-openldap-0 local 25d
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911#
yaml的内容:
kind: PersistentVolume
apiVersion: v1
metadata:
name: alluxio-journal-0
labels:
type: local
spec:
storageClassName: standard
capacity:
storage: 4Gi
accessModes:
- ReadWriteOnce
hostPath:
path: /tmp/alluxio-journal-0
可用看到此时pv成功构建。稍等片刻:
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
alluxio-journal-0 4Gi RWO Retain Bound default/alluxio-journal-alluxio-master-0 standard 4m31s
pvc-23fc88d9-da65-47fc-80e6-b15976a8bcf4 20Gi RWO Delete Bound kubesphere-monitoring-system/prometheus-k8s-db-prometheus-k8s-0 local 34d
pvc-402e21a4-a811-46a7-b75b-e295512bab25 4Gi RWO Delete Bound kubesphere-logging-system/data-elasticsearch-logging-discovery-0 local 25d
pvc-5d4597cd-404d-4bd8-8b9a-71f32c44f1d1 8Gi RWO Delete Bound kubesphere-devops-system/devops-jenkins local 25d
pvc-861a6ff8-7a6b-407e-bb73-aef721ef586d 20Gi RWO Delete Bound kubesphere-logging-system/data-elasticsearch-logging-data-0 local 25d
pvc-a777e6f9-c564-419f-85fd-23bee491ef19 20Gi RWO Delete Bound kubesphere-system/minio local 25d
pvc-cad97ef9-8fed-4540-a9d5-b91331a193f8 2Gi RWO Delete Bound kubesphere-system/openldap-pvc-openldap-0 local 25d
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911#
可以看到这个pv已经被自动绑定:
但是4G有点小,先修改为20G:
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl edit pv/alluxio-journal-0 -o yaml
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
pv.kubernetes.io/bound-by-controller: "yes"
creationTimestamp: "2022-09-11T13:54:21Z"
finalizers:
- kubernetes.io/pv-protection
labels:
type: local
name: alluxio-journal-0
resourceVersion: "270280"
uid: 9823b56d-79d2-4a97-be65-1f8c7770386d
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 20Gi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: alluxio-journal-alluxio-master-0
namespace: default
resourceVersion: "262629"
uid: b06b8135-b7ca-4841-894c-0f0532fdcca9
hostPath:
path: /tmp/alluxio-journal-0
type: ""
persistentVolumeReclaimPolicy: Retain
storageClassName: standard
volumeMode: Filesystem
status:
phase: Bound
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
alluxio-journal-0 20Gi RWO Retain Bound default/alluxio-journal-alluxio-master-0 standard 9m
pvc-23fc88d9-da65-47fc-80e6-b15976a8bcf4 20Gi RWO Delete Bound kubesphere-monitoring-system/prometheus-k8s-db-prometheus-k8s-0 local 34d
pvc-402e21a4-a811-46a7-b75b-e295512bab25 4Gi RWO Delete Bound kubesphere-logging-system/data-elasticsearch-logging-discovery-0 local 25d
pvc-5d4597cd-404d-4bd8-8b9a-71f32c44f1d1 8Gi RWO Delete Bound kubesphere-devops-system/devops-jenkins local 25d
pvc-861a6ff8-7a6b-407e-bb73-aef721ef586d 20Gi RWO Delete Bound kubesphere-logging-system/data-elasticsearch-logging-data-0 local 25d
pvc-a777e6f9-c564-419f-85fd-23bee491ef19 20Gi RWO Delete Bound kubesphere-system/minio local 25d
pvc-cad97ef9-8fed-4540-a9d5-b91331a193f8 2Gi RWO Delete Bound kubesphere-system/openldap-pvc-openldap-0 local 25d
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911#
再来创建个worker的pv:
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# vim alluxio-worker-journal-pv.yaml
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl create -f alluxio-worker-journal-pv.yaml
persistentvolume/alluxio-journal-1 created
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
alluxio-journal-0 20Gi RWO Retain Bound default/alluxio-journal-alluxio-master-0 standard 13m
alluxio-journal-1 30Gi RWO Retain Available standard 6s
pvc-23fc88d9-da65-47fc-80e6-b15976a8bcf4 20Gi RWO Delete Bound kubesphere-monitoring-system/prometheus-k8s-db-prometheus-k8s-0 local 34d
pvc-402e21a4-a811-46a7-b75b-e295512bab25 4Gi RWO Delete Bound kubesphere-logging-system/data-elasticsearch-logging-discovery-0 local 25d
pvc-5d4597cd-404d-4bd8-8b9a-71f32c44f1d1 8Gi RWO Delete Bound kubesphere-devops-system/devops-jenkins local 25d
pvc-861a6ff8-7a6b-407e-bb73-aef721ef586d 20Gi RWO Delete Bound kubesphere-logging-system/data-elasticsearch-logging-data-0 local 25d
pvc-a777e6f9-c564-419f-85fd-23bee491ef19 20Gi RWO Delete Bound kubesphere-system/minio local 25d
pvc-cad97ef9-8fed-4540-a9d5-b91331a193f8 2Gi RWO Delete Bound kubesphere-system/openldap-pvc-openldap-0 local 25d
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911#
内容差不多:
kind: PersistentVolume
apiVersion: v1
metadata:
name: alluxio-journal-1
labels:
type: local
spec:
storageClassName: standard
capacity:
storage: 30Gi
accessModes:
- ReadWriteOnce
hostPath:
path: /tmp/alluxio-journal-1
内容参照:
Name: alluxio-journal-2
Labels: type=local
Annotations: pv.kubernetes.io/bound-by-controller: yes
Finalizers: [kubernetes.io/pv-protection]
StorageClass: standard
Status: Bound
Claim: default/alluxio-worker-jjgcv
Reclaim Policy: Retain
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 30Gi
Node Affinity: <none>
Message:
Source:
Type: HostPath (bare host directory volume)
Path: /tmp/alluxio-journal-2
HostPathType:
Events: <none>
解决Master的容器创建卡死的问题
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl describe pod alluxio-master-0
Name: alluxio-master-0
Namespace: default
Priority: 0
Node: zhiyong-ksp1/192.168.88.20
Start Time: Sun, 11 Sep 2022 21:54:32 +0800
Labels: app=alluxio
chart=alluxio-0.6.48
controller-revision-hash=alluxio-master-5bb869cb7d
heritage=Helm
name=alluxio-master
release=alluxio
role=alluxio-master
statefulset.kubernetes.io/pod-name=alluxio-master-0
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/alluxio-master
Containers:
alluxio-master:
Container ID:
Image: alluxio/alluxio:2.8.1
Image ID:
Ports: 19998/TCP, 19999/TCP
Host Ports: 0/TCP, 0/TCP
Command:
tini
--
/entrypoint.sh
Args:
master-only
--no-format
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
cpu: 4
memory: 8Gi
Requests:
cpu: 1
memory: 1Gi
Liveness: tcp-socket :rpc delay=15s timeout=5s period=30s #success=1 #failure=2
Readiness: tcp-socket :rpc delay=10s timeout=1s period=10s #success=1 #failure=3
Environment Variables from:
alluxio-config ConfigMap Optional: false
Environment:
ALLUXIO_MASTER_HOSTNAME: (v1:status.podIP)
Mounts:
/journal from alluxio-journal (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-88v9g (ro)
alluxio-job-master:
Container ID:
Image: alluxio/alluxio:2.8.1
Image ID:
Ports: 20001/TCP, 20002/TCP
Host Ports: 0/TCP, 0/TCP
Command:
tini
--
/entrypoint.sh
Args:
job-master
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
cpu: 4
memory: 8Gi
Requests:
cpu: 1
memory: 1Gi
Liveness: tcp-socket :job-rpc delay=15s timeout=5s period=30s #success=1 #failure=2
Readiness: tcp-socket :job-rpc delay=10s timeout=1s period=10s #success=1 #failure=3
Environment Variables from:
alluxio-config ConfigMap Optional: false
Environment:
ALLUXIO_MASTER_HOSTNAME: (v1:status.podIP)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-88v9g (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
alluxio-journal:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: alluxio-journal-alluxio-master-0
ReadOnly: false
kube-api-access-88v9g:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 50m default-scheduler 0/1 nodes are available: 1 node(s) didn't find available persistent volumes to bind. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
Warning FailedScheduling 45m default-scheduler 0/1 nodes are available: 1 node(s) didn't find available persistent volumes to bind. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
Normal Scheduled 18m default-scheduler Successfully assigned default/alluxio-master-0 to zhiyong-ksp1
Warning FailedCreatePodSandBox 18m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "272bf5e805275bf3e235a55b275157b149cb27673c596e0b1488c469e74397b5": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 17m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "4c1b7f2d9a9d55a096d1392b0e59b69e37eb03c00ba05dcc6066414d46889e24": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 17m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "afcb920ff350a9c6ac5cb3eb4e588224dca137ecb5047213178536fcbd448edc": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 17m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "e0895849bef84eab271ec86160fedcf46cbabdb1b5c792c5374d51e522c8b835": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 17m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "42da9f5a7ca6025e22220169576232334560207843e89b4582ead360cf143336": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 16m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "c9e94c30e641992fb63c82177cf2f6241f3b46549fcc65cd5bde953ecfddbae4": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 16m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "1bb467f76b3dfb072b94b19a9ca05c0cc405436d0b7efd7bf5cfa4dd2b65022c": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 16m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "cfa3efb7a306fdd02f3a0c9f5e13ef4345f77155c3c327fd311bef181529fc3e": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 16m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "19ce9a3b5ebf896d3ce9042e71154cafb26219736ad1bd85ed02d17be5834cdf": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 2m54s (x60 over 16m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "ae696e7f906fa6412aed6585a1a2eb7a5f3b846e1d555cce755a184a307b519d": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911#
解决了pv的问题,现在又遇到了calico的问题。
上次也遇到过:https://lizhiyong.blog.csdn.net/article/details/126380224
root@zhiyong-ksp1:~# mkdir -p /fileback/20220911
root@zhiyong-ksp1:~# cd /etc/cni/net.d
root@zhiyong-ksp1:/etc/cni/net.d# ll
总用量 16
drwxr-xr-x 2 kube root 4096 8月 17 01:47 ./
drwxr-xr-x 3 kube root 4096 8月 8 10:02 ../
-rw-r--r-- 1 root root 663 8月 17 01:47 10-calico.conflist
-rw------- 1 root root 2713 8月 18 00:32 calico-kubeconfig
root@zhiyong-ksp1:/etc/cni/net.d# mv ./10-calico.conflist /fileback/20220911
root@zhiyong-ksp1:/etc/cni/net.d# mv ./calico-kubeconfig /fileback/20220911
root@zhiyong-ksp1:/etc/cni/net.d# ll
总用量 8
drwxr-xr-x 2 kube root 4096 9月 11 22:20 ./
drwxr-xr-x 3 kube root 4096 8月 8 10:02 ../
root@zhiyong-ksp1:/etc/cni/net.d# reboot
重启之后才会去不急不慢地拉取镜像:
root@zhiyong-ksp1:/home/zhiyong# kubectl describe pod alluxio-master-0
Name: alluxio-master-0
Namespace: default
Priority: 0
Node: zhiyong-ksp1/192.168.88.20
Start Time: Sun, 11 Sep 2022 21:54:32 +0800
Labels: app=alluxio
chart=alluxio-0.6.48
controller-revision-hash=alluxio-master-5bb869cb7d
heritage=Helm
name=alluxio-master
release=alluxio
role=alluxio-master
statefulset.kubernetes.io/pod-name=alluxio-master-0
Annotations: cni.projectcalico.org/containerID: 3dbc1d4a3d1134300c1166722049eb9176c03b1f598f30f84fa1f890f8868043
cni.projectcalico.org/podIP: 10.233.107.106/32
cni.projectcalico.org/podIPs: 10.233.107.106/32
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/alluxio-master
Containers:
alluxio-master:
Container ID:
Image: alluxio/alluxio:2.8.1
Image ID:
Ports: 19998/TCP, 19999/TCP
Host Ports: 0/TCP, 0/TCP
Command:
tini
--
/entrypoint.sh
Args:
master-only
--no-format
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
cpu: 4
memory: 8Gi
Requests:
cpu: 1
memory: 1Gi
Liveness: tcp-socket :rpc delay=15s timeout=5s period=30s #success=1 #failure=2
Readiness: tcp-socket :rpc delay=10s timeout=1s period=10s #success=1 #failure=3
Environment Variables from:
alluxio-config ConfigMap Optional: false
Environment:
ALLUXIO_MASTER_HOSTNAME: (v1:status.podIP)
Mounts:
/journal from alluxio-journal (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-88v9g (ro)
alluxio-job-master:
Container ID:
Image: alluxio/alluxio:2.8.1
Image ID:
Ports: 20001/TCP, 20002/TCP
Host Ports: 0/TCP, 0/TCP
Command:
tini
--
/entrypoint.sh
Args:
job-master
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
cpu: 4
memory: 8Gi
Requests:
cpu: 1
memory: 1Gi
Liveness: tcp-socket :job-rpc delay=15s timeout=5s period=30s #success=1 #failure=2
Readiness: tcp-socket :job-rpc delay=10s timeout=1s period=10s #success=1 #failure=3
Environment Variables from:
alluxio-config ConfigMap Optional: false
Environment:
ALLUXIO_MASTER_HOSTNAME: (v1:status.podIP)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-88v9g (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
alluxio-journal:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: alluxio-journal-alluxio-master-0
ReadOnly: false
kube-api-access-88v9g:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 64m default-scheduler 0/1 nodes are available: 1 node(s) didn't find available persistent volumes to bind. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
Warning FailedScheduling 59m default-scheduler 0/1 nodes are available: 1 node(s) didn't find available persistent volumes to bind. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
Normal Scheduled 32m default-scheduler Successfully assigned default/alluxio-master-0 to zhiyong-ksp1
Warning FailedCreatePodSandBox 32m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "272bf5e805275bf3e235a55b275157b149cb27673c596e0b1488c469e74397b5": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 31m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "4c1b7f2d9a9d55a096d1392b0e59b69e37eb03c00ba05dcc6066414d46889e24": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 31m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "afcb920ff350a9c6ac5cb3eb4e588224dca137ecb5047213178536fcbd448edc": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 31m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "e0895849bef84eab271ec86160fedcf46cbabdb1b5c792c5374d51e522c8b835": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 31m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "42da9f5a7ca6025e22220169576232334560207843e89b4582ead360cf143336": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 30m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "c9e94c30e641992fb63c82177cf2f6241f3b46549fcc65cd5bde953ecfddbae4": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 30m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "1bb467f76b3dfb072b94b19a9ca05c0cc405436d0b7efd7bf5cfa4dd2b65022c": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 30m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "cfa3efb7a306fdd02f3a0c9f5e13ef4345f77155c3c327fd311bef181529fc3e": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 30m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "19ce9a3b5ebf896d3ce9042e71154cafb26219736ad1bd85ed02d17be5834cdf": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 6m57s (x104 over 30m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "f5be7b93412cadf9405f96816bb004fb14a8bac8a6c657354ef7d23a6137e96b": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning NetworkNotReady 2m52s (x11 over 3m12s) kubelet network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
Warning FailedMount 2m52s (x5 over 2m59s) kubelet MountVolume.SetUp failed for volume "kube-api-access-88v9g" : object "default"/"kube-root-ca.crt" not registered
Normal Pulling 2m43s kubelet Pulling image "alluxio/alluxio:2.8.1"
root@zhiyong-ksp1:/home/zhiyong#
坑是真的坑!!!
重启后:
root@zhiyong-ksp1:/home/zhiyong# kubectl describe pod alluxio-master-0
Name: alluxio-master-0
Namespace: default
Priority: 0
Node: zhiyong-ksp1/192.168.88.20
Start Time: Mon, 12 Sep 2022 00:33:22 +0800
Labels: app=alluxio
chart=alluxio-0.6.48
controller-revision-hash=alluxio-master-5bb869cb7d
heritage=Helm
name=alluxio-master
release=alluxio
role=alluxio-master
statefulset.kubernetes.io/pod-name=alluxio-master-0
Annotations: cni.projectcalico.org/containerID: 281deb90cae21158d2de6b32776d3c268b1d7af40341e4886a918a5cc28e4ba9
cni.projectcalico.org/podIP: 10.233.107.165/32
cni.projectcalico.org/podIPs: 10.233.107.165/32
Status: Running
IP: 10.233.107.165
IPs:
IP: 10.233.107.165
Controlled By: StatefulSet/alluxio-master
Containers:
alluxio-master:
Container ID: containerd://a360ca1dbf913afd4b1270b2da4fe86d6e797debba73c8dec75bf46470095c94
Image: alluxio/alluxio:2.8.1
Image ID: docker.io/alluxio/alluxio@sha256:a365600d65fe4c518e3df4272a25b842ded773b193ea146a202b15e853a65d39
Ports: 19998/TCP, 19999/TCP
Host Ports: 0/TCP, 0/TCP
Command:
tini
--
/entrypoint.sh
Args:
master-only
--no-format
State: Running
Started: Mon, 12 Sep 2022 05:46:17 +0800
Last State: Terminated
Reason: Unknown
Exit Code: 255
Started: Mon, 12 Sep 2022 05:42:59 +0800
Finished: Mon, 12 Sep 2022 05:45:35 +0800
Ready: False
Restart Count: 52
Limits:
cpu: 4
memory: 8Gi
Requests:
cpu: 1
memory: 1Gi
Liveness: tcp-socket :rpc delay=15s timeout=5s period=30s #success=1 #failure=2
Readiness: tcp-socket :rpc delay=10s timeout=1s period=10s #success=1 #failure=3
Environment Variables from:
alluxio-config ConfigMap Optional: false
Environment:
ALLUXIO_MASTER_HOSTNAME: (v1:status.podIP)
Mounts:
/journal from alluxio-journal (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vslvq (ro)
alluxio-job-master:
Container ID: containerd://730e943e3a5e8787ec15cd73ce8ef617af15cc4ba73ed1ee58add911bf2cb496
Image: alluxio/alluxio:2.8.1
Image ID: docker.io/alluxio/alluxio@sha256:a365600d65fe4c518e3df4272a25b842ded773b193ea146a202b15e853a65d39
Ports: 20001/TCP, 20002/TCP
Host Ports: 0/TCP, 0/TCP
Command:
tini
--
/entrypoint.sh
Args:
job-master
State: Running
Started: Mon, 12 Sep 2022 05:46:21 +0800
Last State: Terminated
Reason: Unknown
Exit Code: 255
Started: Mon, 12 Sep 2022 02:58:10 +0800
Finished: Mon, 12 Sep 2022 05:45:35 +0800
Ready: False
Restart Count: 1
Limits:
cpu: 4
memory: 8Gi
Requests:
cpu: 1
memory: 1Gi
Liveness: tcp-socket :job-rpc delay=15s timeout=5s period=30s #success=1 #failure=2
Readiness: tcp-socket :job-rpc delay=10s timeout=1s period=10s #success=1 #failure=3
Environment Variables from:
alluxio-config ConfigMap Optional: false
Environment:
ALLUXIO_MASTER_HOSTNAME: (v1:status.podIP)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vslvq (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
alluxio-journal:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: alluxio-journal-alluxio-master-0
ReadOnly: false
kube-api-access-vslvq:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 53m (x36 over 168m) kubelet Container image "alluxio/alluxio:2.8.1" already present on machine
Warning BackOff 8m40s (x548 over 163m) kubelet Back-off restarting failed container
Warning Unhealthy 3m49s (x290 over 168m) kubelet Readiness probe failed: dial tcp 10.233.107.112:19998: connect: connection refused
Normal SandboxChanged 52s kubelet Pod sandbox changed, it will be killed and re-created.
Normal Pulled 45s kubelet Container image "alluxio/alluxio:2.8.1" already present on machine
Normal Created 45s kubelet Created container alluxio-master
Normal Started 44s kubelet Started container alluxio-master
Normal Pulled 44s kubelet Container image "alluxio/alluxio:2.8.1" already present on machine
Normal Created 44s kubelet Created container alluxio-job-master
Normal Started 40s kubelet Started container alluxio-job-master
Warning Unhealthy 22s (x3 over 34s) kubelet Readiness probe failed: dial tcp 10.233.107.165:19998: connect: connection refused
root@zhiyong-ksp1:/home/zhiyong# kubectl logs -f alluxio-master-0
Defaulted container "alluxio-master" out of: alluxio-master, alluxio-job-master
2022-09-11 21:47:34,457 INFO MetricsMasterFactory - Creating alluxio.master.metrics.MetricsMaster
2022-09-11 21:47:34,458 INFO MetaMasterFactory - Creating alluxio.master.meta.MetaMaster
2022-09-11 21:47:34,457 INFO TableMasterFactory - Creating alluxio.master.table.TableMaster
2022-09-11 21:47:34,457 INFO BlockMasterFactory - Creating alluxio.master.block.BlockMaster
2022-09-11 21:47:34,458 INFO FileSystemMasterFactory - Creating alluxio.master.file.FileSystemMaster
2022-09-11 21:47:34,503 INFO ExtensionFactoryRegistry - Loading core jars from /opt/alluxio-2.8.1/lib
2022-09-11 21:47:34,624 INFO ExtensionFactoryRegistry - Loading extension jars from /opt/alluxio-2.8.1/extensions
2022-09-11 21:47:34,684 INFO MetricsSystem - Starting sinks with config: {}.
2022-09-11 21:47:34,686 INFO MetricsHeartbeatContext - Created metrics heartbeat with ID app-1037551149078095174. This ID will be used for identifying info from the client. It can be set manually through the alluxio.user.app.id property
2022-09-11 21:47:34,704 INFO TieredIdentityFactory - Initialized tiered identity TieredIdentity(node=10.233.107.165, rack=null)
2022-09-11 21:47:34,705 INFO UnderDatabaseRegistry - Loading udb jars from /opt/alluxio-2.8.1/lib
2022-09-11 21:47:34,724 INFO UnderDatabaseRegistry - Registered UDBs: hive,glue
2022-09-11 21:47:34,727 INFO LayoutRegistry - Registered Table Layouts: hive
2022-09-11 21:47:34,855 INFO RocksStore - Closing BlockStore rocks database
2022-09-11 21:47:34,898 INFO RocksStore - Opened rocks database under path /opt/alluxio-2.8.1/metastore/blocks
2022-09-11 21:47:35,121 INFO RocksStore - Closing InodeStore rocks database
2022-09-11 21:47:35,162 INFO RocksStore - Opened rocks database under path /opt/alluxio-2.8.1/metastore/inodes
2022-09-11 21:47:35,211 INFO RocksStore - Closing InodeStore rocks database
2022-09-11 21:47:35,252 INFO RocksStore - Opened rocks database under path /opt/alluxio-2.8.1/metastore/inodes
2022-09-11 21:47:35,252 INFO RocksStore - Cleared store at /opt/alluxio-2.8.1/metastore/inodes
2022-09-11 21:47:35,263 INFO ProcessUtils - Starting Alluxio master @10.233.107.165:19998.
2022-09-11 21:47:35,263 INFO ProcessUtils - Alluxio version: 2.8.1-5cda26a856fba1d1f42b39b7a8c761e50bbae8fe
2022-09-11 21:47:35,263 INFO ProcessUtils - Java version: 1.8.0_275
2022-09-11 21:47:35,263 INFO AlluxioMasterProcess - Starting...
2022-09-11 21:47:35,263 INFO RocksStore - Closing BlockStore rocks database
2022-09-11 21:47:35,300 INFO RocksStore - Opened rocks database under path /opt/alluxio-2.8.1/metastore/blocks
2022-09-11 21:47:35,300 INFO RocksStore - Cleared store at /opt/alluxio-2.8.1/metastore/blocks
2022-09-11 21:47:35,305 INFO RocksStore - Closing InodeStore rocks database
2022-09-11 21:47:35,308 INFO UfsJournalCheckpointThread - BlockMaster: Journal checkpoint thread started.
2022-09-11 21:47:35,308 INFO UfsJournalCheckpointThread - TableMaster: Journal checkpoint thread started.
2022-09-11 21:47:35,338 INFO RocksStore - Opened rocks database under path /opt/alluxio-2.8.1/metastore/inodes
2022-09-11 21:47:35,338 INFO RocksStore - Cleared store at /opt/alluxio-2.8.1/metastore/inodes
2022-09-11 21:47:35,340 INFO UfsJournalCheckpointThread - FileSystemMaster: Journal checkpoint thread started.
2022-09-11 21:47:35,341 INFO UfsJournalCheckpointThread - MetricsMaster: Journal checkpoint thread started.
2022-09-11 21:47:35,341 INFO UfsJournalCheckpointThread - MetaMaster: Journal checkpoint thread started.
2022-09-11 21:47:35,341 INFO UfsJournalCheckpointThread - BlockMaster: Journal checkpointer shutdown has been initiated.
2022-09-11 21:47:35,341 INFO UfsJournalCheckpointThread - TableMaster: Journal checkpointer shutdown has been initiated.
2022-09-11 21:47:35,342 INFO UfsJournalCheckpointThread - FileSystemMaster: Journal checkpointer shutdown has been initiated.
2022-09-11 21:47:35,342 INFO UfsJournalCheckpointThread - MetaMaster: Journal checkpointer shutdown has been initiated.
2022-09-11 21:47:35,342 INFO UfsJournalCheckpointThread - MetricsMaster: Journal checkpointer shutdown has been initiated.
2022-09-11 21:47:37,310 INFO AbstractJournalProgressLogger - UfsJournal(/journal/TableMaster/v1)|current SN: 0|entries in last 2001ms=0
2022-09-11 21:47:37,342 INFO AbstractJournalProgressLogger - UfsJournal(/journal/MetricsMaster/v1)|current SN: 0|entries in last 2000ms=0
2022-09-11 21:47:37,342 INFO AbstractJournalProgressLogger - UfsJournal(/journal/MetaMaster/v1)|current SN: 0|entries in last 2000ms=0
2022-09-11 21:47:38,310 INFO AbstractJournalProgressLogger - UfsJournal(/journal/BlockMaster/v1)|current SN: 0|entries in last 3000ms=0
2022-09-11 21:47:38,341 INFO AbstractJournalProgressLogger - UfsJournal(/journal/FileSystemMaster/v1)|current SN: 0|entries in last 3000ms=0
2022-09-11 21:47:41,311 INFO AbstractJournalProgressLogger - UfsJournal(/journal/TableMaster/v1)|current SN: 0|entries in last 4001ms=0
2022-09-11 21:47:41,312 INFO UfsJournalCheckpointThread - BlockMaster: Journal checkpoint thread has been shutdown. No new logs have been found during the quiet period.
2022-09-11 21:47:41,312 INFO UfsJournalCheckpointThread - TableMaster: Journal checkpoint thread has been shutdown. No new logs have been found during the quiet period.
2022-09-11 21:47:41,313 INFO UfsJournalCheckpointThread - BlockMaster: Journal checkpointer shutdown complete
2022-09-11 21:47:41,313 INFO UfsJournalCheckpointThread - TableMaster: Journal checkpointer shutdown complete
2022-09-11 21:47:41,329 INFO UfsJournal - BlockMaster: journal switched to primary mode. location: /journal/BlockMaster/v1
2022-09-11 21:47:41,330 INFO UfsJournal - TableMaster: journal switched to primary mode. location: /journal/TableMaster/v1
2022-09-11 21:47:41,342 INFO UfsJournalCheckpointThread - FileSystemMaster: Journal checkpoint thread has been shutdown. No new logs have been found during the quiet period.
2022-09-11 21:47:41,343 INFO UfsJournalCheckpointThread - FileSystemMaster: Journal checkpointer shutdown complete
2022-09-11 21:47:41,343 INFO UfsJournalCheckpointThread - MetricsMaster: Journal checkpoint thread has been shutdown. No new logs have been found during the quiet period.
2022-09-11 21:47:41,343 INFO UfsJournalCheckpointThread - MetaMaster: Journal checkpoint thread has been shutdown. No new logs have been found during the quiet period.
2022-09-11 21:47:41,344 INFO UfsJournalCheckpointThread - MetricsMaster: Journal checkpointer shutdown complete
2022-09-11 21:47:41,344 INFO UfsJournalCheckpointThread - MetaMaster: Journal checkpointer shutdown complete
2022-09-11 21:47:41,346 INFO UfsJournal - FileSystemMaster: journal switched to primary mode. location: /journal/FileSystemMaster/v1
2022-09-11 21:47:41,347 INFO UfsJournal - MetaMaster: journal switched to primary mode. location: /journal/MetaMaster/v1
2022-09-11 21:47:41,348 INFO UfsJournal - MetricsMaster: journal switched to primary mode. location: /journal/MetricsMaster/v1
2022-09-11 21:47:41,390 INFO AlluxioMasterProcess - Starting all masters as: leader.
2022-09-11 21:47:41,391 INFO AbstractMaster - MetricsMaster: Starting primary master.
2022-09-11 21:47:41,397 INFO MetricsSystem - Reset all metrics in the metrics system in 4ms
2022-09-11 21:47:41,398 INFO MetricsStore - Cleared the metrics store and metrics system in 6 ms
2022-09-11 21:47:41,401 INFO AbstractMaster - BlockMaster: Starting primary master.
2022-09-11 21:47:41,403 INFO AbstractMaster - FileSystemMaster: Starting primary master.
2022-09-11 21:47:41,403 INFO DefaultFileSystemMaster - Starting fs master as primary
2022-09-11 21:47:41,513 WARN AsyncJournalWriter - Failed to flush journal entry: Unable to create parent directories for path /journal/BlockMaster/v1/logs/0x0-0x7fffffffffffffff
java.io.IOException: Unable to create parent directories for path /journal/BlockMaster/v1/logs/0x0-0x7fffffffffffffff
at alluxio.underfs.local.LocalUnderFileSystem.createDirect(LocalUnderFileSystem.java:114)
at alluxio.underfs.local.LocalUnderFileSystem.create(LocalUnderFileSystem.java:103)
at alluxio.underfs.UnderFileSystemWithLogging$6.call(UnderFileSystemWithLogging.java:182)
at alluxio.underfs.UnderFileSystemWithLogging$6.call(UnderFileSystemWithLogging.java:179)
at alluxio.underfs.UnderFileSystemWithLogging.call(UnderFileSystemWithLogging.java:1237)
at alluxio.underfs.UnderFileSystemWithLogging.create(UnderFileSystemWithLogging.java:179)
at alluxio.master.journal.ufs.UfsJournalLogWriter.createNewLogFile(UfsJournalLogWriter.java:294)
at alluxio.master.journal.ufs.UfsJournalLogWriter.maybeRotateLog(UfsJournalLogWriter.java:283)
at alluxio.master.journal.ufs.UfsJournalLogWriter.write(UfsJournalLogWriter.java:118)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:305)
at java.lang.Thread.run(Thread.java:748)
2022-09-11 21:47:41,516 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Unable to create parent directories for path /journal/BlockMaster/v1/logs/0x0-0x7fffffffffffffff
at alluxio.underfs.local.LocalUnderFileSystem.createDirect(LocalUnderFileSystem.java:114)
at alluxio.underfs.local.LocalUnderFileSystem.create(LocalUnderFileSystem.java:103)
at alluxio.underfs.UnderFileSystemWithLogging$6.call(UnderFileSystemWithLogging.java:182)
at alluxio.underfs.UnderFileSystemWithLogging$6.call(UnderFileSystemWithLogging.java:179)
at alluxio.underfs.UnderFileSystemWithLogging.call(UnderFileSystemWithLogging.java:1237)
at alluxio.underfs.UnderFileSystemWithLogging.create(UnderFileSystemWithLogging.java:179)
at alluxio.master.journal.ufs.UfsJournalLogWriter.createNewLogFile(UfsJournalLogWriter.java:294)
at alluxio.master.journal.ufs.UfsJournalLogWriter.maybeRotateLog(UfsJournalLogWriter.java:283)
at alluxio.master.journal.ufs.UfsJournalLogWriter.write(UfsJournalLogWriter.java:118)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:305)
at java.lang.Thread.run(Thread.java:748)
貌似是没有写权限。。。授权:
root@zhiyong-ksp1:/tmp# chmod 777 -R /tmp/alluxio-journal-0
root@zhiyong-ksp1:/tmp# ll | grep alluxio-journal-0
drwxrwxrwx 2 root root 4096 9月 12 05:46 alluxio-journal-0/
此时:
root@zhiyong-ksp1:~# kubectl get pods -owide --all-namespaces | grep alluxio
default alluxio-master-0 2/2 Running 4 (2m22s ago) 6m22s 10.233.107.194 zhiyong-ksp1 <none> <none>
default alluxio-worker-n8qh7 0/2 Pending 0 6m22s <none> <none> <none> <none>
可以看到Master的2个pod终于启动!!!
解决Worker节点pending的问题
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl get pods -owide --all-namespaces | grep alluxio
default alluxio-master-0 2/2 Running 4 (17m ago) 21m 10.233.107.194 zhiyong-ksp1 <none> <none>
default alluxio-worker-n8qh7 0/2 Pending 0 21m <none> <none> <none> <none>
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl describe pod alluxio-worker-n8qh7
Name: alluxio-worker-n8qh7
Namespace: default
Priority: 0
Node: <none>
Labels: app=alluxio
chart=alluxio-0.6.48
controller-revision-hash=c6dcc876c
heritage=Helm
pod-template-generation=1
release=alluxio
role=alluxio-worker
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: DaemonSet/alluxio-worker
Containers:
alluxio-worker:
Image: alluxio/alluxio:2.8.1
Ports: 29999/TCP, 30000/TCP
Host Ports: 0/TCP, 0/TCP
Command:
tini
--
/entrypoint.sh
Args:
worker-only
--no-format
Limits:
cpu: 4
memory: 4Gi
Requests:
cpu: 1
memory: 2Gi
Liveness: tcp-socket :rpc delay=15s timeout=5s period=30s #success=1 #failure=2
Readiness: tcp-socket :rpc delay=10s timeout=1s period=10s #success=1 #failure=3
Environment Variables from:
alluxio-config ConfigMap Optional: false
Environment:
ALLUXIO_WORKER_HOSTNAME: (v1:status.hostIP)
ALLUXIO_WORKER_CONTAINER_HOSTNAME: (v1:status.podIP)
Mounts:
/dev/shm from mem (rw)
/opt/domain from alluxio-domain (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9wxfl (ro)
alluxio-job-worker:
Image: alluxio/alluxio:2.8.1
Ports: 30001/TCP, 30002/TCP, 30003/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Command:
tini
--
/entrypoint.sh
Args:
job-worker
Limits:
cpu: 4
memory: 4Gi
Requests:
cpu: 1
memory: 1Gi
Liveness: tcp-socket :job-rpc delay=15s timeout=5s period=30s #success=1 #failure=2
Readiness: tcp-socket :job-rpc delay=10s timeout=1s period=10s #success=1 #failure=3
Environment Variables from:
alluxio-config ConfigMap Optional: false
Environment:
ALLUXIO_WORKER_HOSTNAME: (v1:status.hostIP)
ALLUXIO_WORKER_CONTAINER_HOSTNAME: (v1:status.podIP)
Mounts:
/dev/shm from mem (rw)
/opt/domain from alluxio-domain (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9wxfl (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
alluxio-domain:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: alluxio-worker-domain-socket
ReadOnly: false
mem:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: 1Gi
kube-api-access-9wxfl:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 22m default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
Warning FailedScheduling 17m default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl logs -f alluxio-worker-n8qh7
Defaulted container "alluxio-worker" out of: alluxio-worker, alluxio-job-worker
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911#
貌似是存储卷没有挂载成功:
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
alluxio-journal-0 20Gi RWO Retain Bound default/alluxio-journal-alluxio-master-0 standard 5h46m
alluxio-journal-1 30Gi RWO Retain Available standard 6m40s
pvc-23fc88d9-da65-47fc-80e6-b15976a8bcf4 20Gi RWO Delete Bound kubesphere-monitoring-system/prometheus-k8s-db-prometheus-k8s-0 local 34d
pvc-402e21a4-a811-46a7-b75b-e295512bab25 4Gi RWO Delete Bound kubesphere-logging-system/data-elasticsearch-logging-discovery-0 local 26d
pvc-5d4597cd-404d-4bd8-8b9a-71f32c44f1d1 8Gi RWO Delete Bound kubesphere-devops-system/devops-jenkins local 25d
pvc-861a6ff8-7a6b-407e-bb73-aef721ef586d 20Gi RWO Delete Bound kubesphere-logging-system/data-elasticsearch-logging-data-0 local 26d
pvc-a777e6f9-c564-419f-85fd-23bee491ef19 20Gi RWO Delete Bound kubesphere-system/minio local 25d
pvc-cad97ef9-8fed-4540-a9d5-b91331a193f8 2Gi RWO Delete Bound kubesphere-system/openldap-pvc-openldap-0 local 25d
root@zhiyong-ksp1:/tmp# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
alluxio-journal-alluxio-master-0 Bound alluxio-journal-0 20Gi RWO standard 6h
alluxio-worker-domain-socket Pending standard 39m
root@zhiyong-ksp1:/tmp# kubectl describe pvc alluxio-worker-domain-socket
Name: alluxio-worker-domain-socket
Namespace: default
StorageClass: standard
Status: Pending
Volume:
Labels: app=alluxio
app.kubernetes.io/managed-by=Helm
chart=alluxio-0.6.48
heritage=Helm
release=alluxio
role=alluxio-worker
Annotations: meta.helm.sh/release-name: alluxio
meta.helm.sh/release-namespace: default
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: alluxio-worker-n8qh7
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ProvisioningFailed 4m45s (x142 over 39m) persistentvolume-controller storageclass.storage.k8s.io "standard" not found
root@zhiyong-ksp1:/tmp#
对比成功绑定的pv与未绑定的pv:
root@zhiyong-ksp1:/tmp# kubectl describe pv alluxio-journal-0
Name: alluxio-journal-0
Labels: type=local
Annotations: pv.kubernetes.io/bound-by-controller: yes
Finalizers: [kubernetes.io/pv-protection]
StorageClass: standard
Status: Bound
Claim: default/alluxio-journal-alluxio-master-0
Reclaim Policy: Retain
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 20Gi
Node Affinity: <none>
Message:
Source:
Type: HostPath (bare host directory volume)
Path: /tmp/alluxio-journal-0
HostPathType:
Events: <none>
root@zhiyong-ksp1:/tmp# kubectl describe pv alluxio-journal-1
Name: alluxio-journal-1
Labels: type=local
Annotations: <none>
Finalizers: [kubernetes.io/pv-protection]
StorageClass: standard
Status: Available
Claim:
Reclaim Policy: Retain
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 30Gi
Node Affinity: <none>
Message:
Source:
Type: HostPath (bare host directory volume)
Path: /tmp/alluxio-journal-1
HostPathType:
Events: <none>
root@zhiyong-ksp1:/tmp# root@zhiyong-ksp1:/tmp# kubectl edit pv/alluxio-journal-0 -o yaml -ndefault
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
pv.kubernetes.io/bound-by-controller: "yes"
creationTimestamp: "2022-09-11T16:31:05Z"
finalizers:
- kubernetes.io/pv-protection
labels:
type: local
name: alluxio-journal-0
resourceVersion: "228631"
uid: 46366084-9f20-434c-9504-9585707c92ad
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 20Gi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: alluxio-journal-alluxio-master-0
namespace: default
resourceVersion: "228627"
uid: 3d78a2dd-88e6-4245-b357-c445a580301a
hostPath:
path: /tmp/alluxio-journal-0
type: ""
persistentVolumeReclaimPolicy: Retain
storageClassName: standard
volumeMode: Filesystem
status:
phase: Bound
root@zhiyong-ksp1:/tmp# kubectl edit pv/alluxio-journal-1 -o yaml -ndefault
apiVersion: v1
kind: PersistentVolume
metadata:
creationTimestamp: "2022-09-11T22:11:14Z"
finalizers:
- kubernetes.io/pv-protection
labels:
type: local
name: alluxio-journal-1
resourceVersion: "293899"
uid: f913bcfd-3916-4ef5-b99a-bb57db5c008f
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 30Gi
hostPath:
path: /tmp/alluxio-journal-1
type: ""
persistentVolumeReclaimPolicy: Retain
storageClassName: standard
volumeMode: Filesystem
status:
phase: Available
可以发现挂载成功的pv多余:
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: alluxio-journal-alluxio-master-0
namespace: default
resourceVersion: "228627"
uid: 3d78a2dd-88e6-4245-b357-c445a580301a
当然可以仿照着加入这部分内容:
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: alluxio-worker-domain-socket
namespace: default
此时:
root@zhiyong-ksp1:/tmp# chmod 777 -R /tmp/alluxio-journal-1
root@zhiyong-ksp1:/tmp# ll
总用量 84
drwxrwxrwt 19 root root 4096 9月 12 07:08 ./
drwxr-xr-x 22 root root 4096 8月 17 01:42 ../
drwxrwxrwx 5 root root 4096 9月 12 05:58 alluxio-journal-0/
drwxrwxrwx 2 root root 4096 9月 12 07:08 alluxio-journal-1/
root@zhiyong-ksp1:/tmp# kubectl get pods -owide --all-namespaces | grep alluxio
default alluxio-master-0 2/2 Running 4 (71m ago) 75m 10.233.107.194 zhiyong-ksp1 <none> <none>
default alluxio-worker-n8qh7 2/2 Running 4 (87s ago) 75m 10.233.107.198 zhiyong-ksp1 <none> <none>
可以看到Alluxio的Worker的2个Pod也成功运行。
格式化
由于部署时没有设置格式化:
helm install alluxio -f config.yaml --set journal.format.runFormat=true alluxio-charts/alluxio
目前已部署完毕,只好使用升级现有的helm部署来触发日志格式化 journal.format.runFormat = true
。
root@zhiyong-ksp1:/tmp# helm upgrade alluxio -f /home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911/alluxioconfig.yaml --set journal.format.runFormat=true alluxio-charts/alluxio
配置端口转发
root@zhiyong-ksp1:/tmp# kubectl port-forward alluxio-master-0 19999:19999
Forwarding from 127.0.0.1:19999 -> 19999
Forwarding from [::1]:19999 -> 19999
Handling connection for 19999
Handling connection for 19999
Handling connection for 19999
Handling connection for 19999
Handling connection for 19999
Handling connection for 19999
Handling connection for 19999
Handling connection for 19999
此时从Ubuntu的火狐敲网址:
127.0.0.1:19999
可以访问:
看到了熟悉的Alluxio web UI。
但是这样肯定是不对的,总不可能只用Ubuntu的火狐访问,还需要暴露端口,给外部宿主机访问。
配置端口暴露
简单粗暴的方式:使用nodeport。
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# pwd
/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# vim alluxio-web-ui-service.yaml
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl apply -f /home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911/alluxio-web-ui-service.yaml
The Service "alluxio-web-ui-service" is invalid: spec.ports[0].nodePort: Invalid value: 19999: provided port is not in the valid range. The range of valid ports is 30000-32767
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
zhiyong-ksp1 Ready control-plane,worker 35d v1.24.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=zhiyong-ksp1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/worker=,node.kubernetes.io/exclude-from-external-load-balancers=
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl get pods --show-labels
NAME READY STATUS RESTARTS AGE LABELS
alluxio-master-0 2/2 Running 0 4h10m app=alluxio,chart=alluxio-0.6.48,controller-revision-hash=alluxio-master-7c4cd554c8,heritage=Helm,name=alluxio-master,release=alluxio,role=alluxio-master,statefulset.kubernetes.io/pod-name=alluxio-master-0
alluxio-worker-n8qh7 2/2 Running 4 (4h17m ago) 5h31m app=alluxio,chart=alluxio-0.6.48,controller-revision-hash=c6dcc876c,heritage=Helm,pod-template-generation=1,release=alluxio,role=alluxio-worker
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911#
内容:
apiVersion: v1
kind: Service
metadata:
name: alluxio-web-ui-service
spec:
type: NodePort
ports:
- port: 19999
targetPort: 19999
nodePort: 19999
selector:
app: alluxio
这里一定要有selector!!!不然endpoints是none,会导致server和pod挂在不上,从而connection refused:
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# curl http://192.168.88.20:32634
curl: (7) Failed to connect to 192.168.88.20 port 32634: 拒绝连接
显然K8S默认不同意使用30000->32767以外的端口。。。
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep service-cluster-ip-range
- --service-cluster-ip-range=10.233.0.0/18
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# vim /etc/kubernetes/manifests/kube-apiserver.yaml
在上述配置的下一行:
- --service-cluster-ip-range=10.233.0.0/18
- --service-node-port-range=1-65535
添加一个新配置后,重启:
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# systemctl daemon-reload
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# systemctl restart kubelet
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl apply -f /home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911/alluxio-web-ui-service.yaml
service/alluxio-web-ui-service created
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alluxio-master-0 ClusterIP None <none> 19998/TCP,19999/TCP,20001/TCP,20002/TCP,19200/TCP,20003/TCP 151m
alluxio-web-ui-service NodePort 10.233.36.0 <none> 19999:19999/TCP 30s
kubernetes ClusterIP 10.233.0.1 <none> 443/TCP 34d
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911#
查看Alluxio默认的service:
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl edit service/alluxio-master-0 -o yaml -ndefault
原始yaml:
apiVersion: v1
kind: Service
metadata:
annotations:
meta.helm.sh/release-name: alluxio
meta.helm.sh/release-namespace: default
creationTimestamp: "2022-09-11T21:53:58Z"
labels:
app: alluxio
app.kubernetes.io/managed-by: Helm
chart: alluxio-0.6.48
heritage: Helm
release: alluxio
role: alluxio-master
name: alluxio-master-0
namespace: default
resourceVersion: "290588"
uid: aa0828ae-b8f4-4533-9b7b-cae8b3094ae7
spec:
clusterIP: None
clusterIPs:
- None
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- name: rpc
port: 19998
protocol: TCP
targetPort: 19998
- name: web
port: 19999
protocol: TCP
targetPort: 19999
- name: job-rpc
port: 20001
protocol: TCP
targetPort: 20001
- name: job-web
port: 20002
protocol: TCP
targetPort: 20002
- name: embedded
port: 19200
protocol: TCP
targetPort: 19200
- name: job-embedded
port: 20003
protocol: TCP
targetPort: 20003
selector:
app: alluxio
release: alluxio
role: alluxio-master
statefulset.kubernetes.io/pod-name: alluxio-master-0
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
可以参照着全部暴露:
apiVersion: v1
kind: Service
metadata:
name: alluxio-master-service
spec:
type: NodePort
ports:
- name: rpc
port: 19998
protocol: TCP
targetPort: 19998
nodePort: 19998
- name: web
port: 19999
protocol: TCP
targetPort: 19999
nodePort: 19999
- name: job-rpc
port: 20001
protocol: TCP
targetPort: 20001
nodePort: 20001
- name: job-web
port: 20002
protocol: TCP
targetPort: 20002
nodePort: 20002
- name: embedded
port: 19200
protocol: TCP
targetPort: 19200
nodePort: 19200
- name: job-embedded
port: 20003
protocol: TCP
targetPort: 20003
nodePort: 20003
selector:
app: alluxio
命令行执行:
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# vim alluxio-master-service.yaml
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl apply -f /home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911/alluxio-master-service.yaml
service/alluxio-master-service created
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alluxio-master-0 ClusterIP None <none> 19998/TCP,19999/TCP,20001/TCP,20002/TCP,19200/TCP,20003/TCP 5h41m
alluxio-master-service NodePort 10.233.59.39 <none> 19998:19998/TCP,19999:19999/TCP,20001:20001/TCP,20002:20002/TCP,19200:19200/TCP,20003:20003/TCP 3s
kubernetes ClusterIP 10.233.0.1 <none> 443/TCP 35d
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# kubectl describe service alluxio-master-service
Name: alluxio-master-service
Namespace: default
Labels: <none>
Annotations: <none>
Selector: app=alluxio
Type: NodePort
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.233.59.39
IPs: 10.233.59.39
Port: rpc 19998/TCP
TargetPort: 19998/TCP
NodePort: rpc 19998/TCP
Endpoints: 10.233.107.198:19998,10.233.107.199:19998
Port: web 19999/TCP
TargetPort: 19999/TCP
NodePort: web 19999/TCP
Endpoints: 10.233.107.198:19999,10.233.107.199:19999
Port: job-rpc 20001/TCP
TargetPort: 20001/TCP
NodePort: job-rpc 20001/TCP
Endpoints: 10.233.107.198:20001,10.233.107.199:20001
Port: job-web 20002/TCP
TargetPort: 20002/TCP
NodePort: job-web 20002/TCP
Endpoints: 10.233.107.198:20002,10.233.107.199:20002
Port: embedded 19200/TCP
TargetPort: 19200/TCP
NodePort: embedded 19200/TCP
Endpoints: 10.233.107.198:19200,10.233.107.199:19200
Port: job-embedded 20003/TCP
TargetPort: 20003/TCP
NodePort: job-embedded 20003/TCP
Endpoints: 10.233.107.198:20003,10.233.107.199:20003
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911#
可以看到此时每个Endpoints都有挂载到虚拟IP,那么在宿主机敲:
192.168.88.20:19999
即可访问Alluxio的web UI:
说明此时其它Master的端口已经全部成功暴露。
启动FUSE守护进程
生产环境需要:
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911# helm upgrade alluxio -f /home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911/alluxioconfig.yaml --set fuse.enabled=true --set fuse.clientEnabled=true alluxio-charts/alluxio
Release "alluxio" has been upgraded. Happy Helming!
NAME: alluxio
LAST DEPLOYED: Mon Sep 12 11:41:25 2022
NAMESPACE: default
STATUS: deployed
REVISION: 3
TEST SUITE: None
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall/alluxio2.8.1/day20220911#
自己玩玩其实有没有无所谓。。。做这一步的好处是pod的进程挂了守护进程可以自动重新拉起。
转载请注明出处:https://lizhiyong.blog.csdn.net/article/details/126815426
至此,Alluxio2.8.1 On K8S1.24部署成功!!!时运不济,命途多舛。。。