对于刚接触flink的小伙伴,很多都不太理解flink的参数命令,我也是,这里我记录一下,尽量通俗易懂
装了flink 之后,要是不太记得参数命令可以直接敲命令,大部分直接看后面的翻译就能理解
flink run --help
Syntax: run [OPTIONS] <jar-file> <arguments> | ||
"run" action options: | description | 理解 |
-c,--class <classname> | Class with the program entry point "main()" method). Only needed if the AR file does not specify the class in its manifest. | 运行一个jar包的时候,没有指定jar包入口时,需要指定运行哪一个mainclass eg: -c com.test.StreamingJob |
-C,--classpath <url> | Adds a URL to each user code classloader on all nodes in the cluster. The paths must specify a protocol (e.g. file://) and be accessible on all nodes (e.g. by means of a NFS share). You can use this option multiple times for specifying more than one URL. The protocol must be supported by the {@link java.net.URLClassLoader}. | 添加一个URL文件能够给集群中的所有节点都可以访问的到,URL的形式可以是file://...,指定classpath路径 eg:flink -C /usr/local/flink/examples/streaming/SocketWindowWordCount.jar |
-d,--detached | If present,runs the job in detached mode | 客户端提交作业的时候断开,session会话也会断开,加上这个参数后,会继续保持会话 eg: flink run -d |
-n,--allowNonRestoredState | Allow to skip savepoint state that cannot be restored. You need to allow this if you removed an operator from your program that was part of the program when the savepoint was triggered. | 跳过无法恢复的savepoint数据 eg: flink run -n |
-p,--parallelism <parallelism> | The parallelism with which to run the program. Optional flag to override the default value specified in the configuration. | 设置每个任务的并行度,默认是1,消费上游的消息慢的话可以调大并行值 eg: flink run -p 5 |
-py,--python <pythonFile> | Python script with the program entry point. The dependent resources can be configured with the `--pyFiles` option. | 针对python脚本。指定脚本路径 eg: flink run -py /usr/local/python/test.py |
-pyarch,--pyArchives <arg> | Add python archive files for job. The archive files will be extracted to the working directory of python UDF worker. Currently only zip-format is supported. For each archive file, a target directory be specified. If the target directory name is specified, the archive file will be extracted to a name can directory with the specified name. Otherwise, the archive file will be extracted to a directory with the same name of the archive file. The files uploaded via this option are accessible via relative path. '#' could be used as the separator of the archive file path and the target directory name. Comma (',') could be used as the separator to specify multiple archive files. This option can be used to upload the virtual environment, the data files used in Python UDF (e.g.: --pyArchives file:///tmp/py37.zip,file:///tmp/data. zip#data --pyExecutable py37.zip/py37/bin/python). The data files could be accessed in Python UDF, e.g.: f = open('data/data.txt', 'r'). | 指定一个压缩文件供python 函数使用,目前仅支持zip文件 e.g.: --pyArchives file:///tmp/py37.zip,file:///tmp/data. zip#data --pyExecutable py37.zip/py37/bin/python |
-pyexec,--pyExecutable <arg> | Specify the path of the python interpreter used to execute the python UDF worker (e.g.: --pyExecutable /usr/local/bin/python3). The python UDF worker depends on Python 3.5+, Apache Beam (version == 2.15.0), Pip (version >= 7.1.0) and SetupTools (version >= 37.0.0). Please ensure that the specified environment meets the above requirements. | flink run -m localhost:8081 -pyarch venv.zip -pyexec venv.zip/venv/bin/python3 -py test_split_label.py |
-pyfs,--pyFiles <pythonFiles> | Attach custom python files for job. These files will be added to the PYTHONPATH of both the local client and the remote python UDF worker. The standard python resource file suffixes such as .py/.egg/.zip or directory are all supported. Comma (',') could be used as the separator to specify multiple files (e.g.: --pyFiles file:///tmp/myresource.zip, hdfs:///$na menode_address/myresource2.zip). | eg:-pyFiles file:///tmp/myresource.zip |
-pym,--pyModule <pythonModule> | Python module with the program entry point. This option must be used in conjunction with `--pyFiles`. | 需要结合pyFiles使用 |
-pyreq,--pyRequirements <arg> | Specify a requirements.txt file which defines the third-party dependencies. These dependencies will be installed and added to the PYTHONPATH of the python UDF worker. A directory which contains the installation packages of these dependencies could be specified optionally. Use '#' as the separator if the optional parameter exists (e.g.: --pyRequirements file:///tmp/requirements.txt#file:///t mp/cached_dir). | eg:--pyRequirements file:///tmp/requirements.txt |
-s,--fromSavepoint <savepointPath> | Path to a savepoint to restore the job from (for example hdfs:///flink/savepoint-1537). | 读取保存状态的文件恢复之前的状态计算 eg: flink run -s hdfs:///flink/savepoint-1537 |
-sae,--shutdownOnAttachedExit | If the job is submitted in attached mode, perform a best-effort cluster shutdown when the CLI is terminated abruptly, e.g., in response to a user interrupt, such as typing Ctrl + C. | |
Options for executor mode: | ||
-D <property=value> | Generic configuration options for execution/deployment and for the configured executor. The available options can be found at https://ci.apache.org/projects/flink/flink-docs-stabl e/ops/config.html | 动态属性 |
Options for yarn-cluster mode: | ||
-d,--detached | If present, runs the job in detached mode | 客户端提交作业的时候断开,session会话也会断开,加上这个参数后,会继续保持会话 eg: flink run -d |
-m,--jobmanager <arg> | Address of the JobManager (master) to which to connect. Use this flag to connect to a different JobManager than the one specified in the configuration. | eg:flink run -m yarn-cluster |
-yat,--yarnapplicationType <arg> | Set a custom application type for the application on YARN | 设置yarn应用的类型 |
-yD <property=value> | use value for given property | 使用给定属性的值 |
-yd,--yarndetached | If present, runs the job in detached mode (deprecated; use non-YARN specific option instead) | 已弃用 |
-yh,--yarnhelp | Help for the Yarn session CLI | yarn help命令 eg: flink run -yh |
-yid,--yarnapplicationId <arg> | Attach to running YARN session | yarn-session模式下 关联yid |
-yj,--yarnjar <arg> | Path to Flink jar file | jar文件路径 eg: flink run -yj /xxx/WordCount.jar |
-yjm,--yarnjobManagerMemory <arg> | Memory for JobManager Container with optional unit (default: MB) | 配置给JobManager 内存大小 默认MB eg: flink run -yjm 2048 |
-ynl,--yarnnodeLabel <arg> | Specify YARN node label for the YARN application | 给yarn 应用指定yarn 节点标签 |
-ynm,--yarnname <arg> | Set a custom name for the application on YARN | 设置名称 eg: flink run -ynm WordCount |
-yq,--yarnquery | Display available YARN resources (memory, cores) | 查询出yarn里面可用的资源,内存 核数 |
-yqu,--yarnqueue <arg> | Specify YARN queue | 指定yarn队列 |
-ys,--yarnslots <arg> | Number of slots per TaskManager | 指定每个TM的slots数 |
-yt,--yarnship <arg> | Ship files in the specified directory (t for transfer) | 指定一个传输文件 |
-ytm,--yarntaskManagerMemory <arg> | Memory per TaskManager Container with optional unit (default: MB) | 给每一个TaskManager Container 分配内存大小 |
-yz,--yarnzookeeperNamespace <arg> | Namespace to create the Zookeeper sub-paths for high availability mode | 创建ha的zk子路径的命名空间 |
-z,--zookeeperNamespace <arg> | Namespace to create the Zookeeper sub-paths for high availability mode | 创建ha的zk子路径的命名空间 |
Options for default mode: | ||
-m,--jobmanager <arg> | Address of the JobManager (master) to which to connect. Use this flag to connect to a different JobManager than the one specified in the configuration. | eg:flink run -m localhost:6123 eg:flink run -m yarn-cluster |
-z,--zookeeperNamespace <arg> | Namespace to create the Zookeeper sub-paths for high availability mode | 创建ha的zk子路径的命名空间 |