文章目录
- 前言
- 问题1:TypeError: Could not found the Java class 'org.apache.flink.connector.pulsar.source.PulsarSource.builder'. The Java dependencies could be specified via command line argument '--jarfile'
- 解决措施
- 问题二:py4j.protocol.Py4JError: org.apache.flink.connector.pulsar.source.PulsarSource.builder does not exist in the JVM
- 解决措施
- 问题三:Caused by: java.io.IOException: Cannot run program "python": error=2, No such file or directory
- 解决措施
- 问题五:Caused by: java.io.IOException: unable to open JDBC writer
- 解决措施
- 附Dockerfile文件
前言
在我的https://blog.csdn.net/zzp28218/article/details/140770338这篇文章实现之后,需要将爬虫项目打包成docker镜像,然后出现了一些问题,在此记录一下。文章结尾会附上我的Dockerfile文件
- docker当中需要有flink1.18、java11、python3.10。
- Dockerfile中需要一个基础镜像,这里选择基础镜像为:flink:1.18-java11,链接:https://hub.docker.com/_/flink/tags?page=3&page_size=&name=&ordering=
- 因为flink:1.18-java11基础镜像自带了java11,所以下一步需要下载python3.10(我选择的是python3.10版本)
下面是遇到的问题
问题1:TypeError: Could not found the Java class ‘org.apache.flink.connector.pulsar.source.PulsarSource.builder’. The Java dependencies could be specified via command line argument ‘–jarfile’
or the config option ‘pipeline.jars’
What's next:
View a summary of image vulnerabilities and recommendations → docker scout quickview
pulsar_source = PulsarSource.builder() \
File "/usr/local/lib/python3.9/dist-packages/pyflink/datastream/connectors/pulsar.py", line 305, in builder
return PulsarSourceBuilder()
File "/usr/local/lib/python3.9/dist-packages/pyflink/datastream/connectors/pulsar.py", line 357, in __init__
self._j_pulsar_source_builder = JPulsarSource.builder()
File "/usr/local/lib/python3.9/dist-packages/pyflink/util/exceptions.py", line 185, in wrapped_call
raise TypeError(
TypeError: Could not found the Java class 'org.apache.flink.connector.pulsar.source.PulsarSource.builder'. The Java dependencies could be specified via command line argument '--jarfile'
or the config option 'pipeline.jars'
解决措施
此错误是添加jar包的时候格式有问题,需要确保格式正确,这里需要注意下
代码:
current_dir = os.path.abspath(os.path.dirname(__file__)) # 获取工作目录绝对路径
jar_path = os.path.join(current_dir, "libs")
jars = []
for file in os.listdir(jar_path):
if file.endswith('.jar'):
file_path = current_dir + '/' + 'libs' + '/' + file
jars.append(file_path.replace(f'\\', f'/'))
str_jars = ';'.join(['file:///' + jar for jar in jars])
print(f"str_jars:{str_jars}")
table_jars = ';'.join(['file:///' + jar for jar in jars if 'pulsar' not in jar])
print(f"table_jars:{table_jars}")
# env.add_jars(str_jars)
# table_env.get_config().get_configuration().set_string("pipeline.jars", str_jars)
env.add_jars(table_jars) # windows系统下
table_env.get_config().get_configuration().set_string("pipeline.jars", table_jars) # Windows系统下
问题二:py4j.protocol.Py4JError: org.apache.flink.connector.pulsar.source.PulsarSource.builder does not exist in the JVM
file:mycode/libs/flink-connector-jdbc-3.1.2-1.18.jar;file:mycode/libs/mysql-connector-java-5.1.9.jar;file:mycode/libs/flink-connector-pulsar-4.1.0-1.18.jar
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/py4j/java_gateway.py", line 1224, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
File "/usr/local/lib/python3.9/dist-packages/py4j/java_gateway.py", line 1038, in send_command
response = connection.send_command(command)
File "/usr/local/lib/python3.9/dist-packages/py4j/java_gateway.