docker打包pyflink爬虫项目遇到问题

前言

       在我的https://blog.csdn.net/zzp28218/article/details/140770338这篇文章实现之后,需要将爬虫项目打包成docker镜像,然后出现了一些问题,在此记录一下。文章结尾会附上我的Dockerfile文件

  1. docker当中需要有flink1.18、java11、python3.10。
  2. Dockerfile中需要一个基础镜像,这里选择基础镜像为:flink:1.18-java11,链接:https://hub.docker.com/_/flink/tags?page=3&page_size=&name=&ordering=
  3. 因为flink:1.18-java11基础镜像自带了java11,所以下一步需要下载python3.10(我选择的是python3.10版本)

       下面是遇到的问题

问题1:TypeError: Could not found the Java class ‘org.apache.flink.connector.pulsar.source.PulsarSource.builder’. The Java dependencies could be specified via command line argument ‘–jarfile’

or the config option ‘pipeline.jars’

What's next:
    View a summary of image vulnerabilities and recommendations → docker scout quickview
    pulsar_source = PulsarSource.builder() \
  File "/usr/local/lib/python3.9/dist-packages/pyflink/datastream/connectors/pulsar.py", line 305, in builder
    return PulsarSourceBuilder()
  File "/usr/local/lib/python3.9/dist-packages/pyflink/datastream/connectors/pulsar.py", line 357, in __init__
    self._j_pulsar_source_builder = JPulsarSource.builder()
  File "/usr/local/lib/python3.9/dist-packages/pyflink/util/exceptions.py", line 185, in wrapped_call
    raise TypeError(
TypeError: Could not found the Java class 'org.apache.flink.connector.pulsar.source.PulsarSource.builder'. The Java dependencies could be specified via command line argument '--jarfile' 
or the config option 'pipeline.jars'

解决措施

       此错误是添加jar包的时候格式有问题,需要确保格式正确,这里需要注意下
在这里插入图片描述
       代码:

     current_dir = os.path.abspath(os.path.dirname(__file__))  # 获取工作目录绝对路径
    jar_path = os.path.join(current_dir, "libs")
    jars = []
    for file in os.listdir(jar_path):
        if file.endswith('.jar'):
            file_path = current_dir + '/' + 'libs' + '/' + file
            jars.append(file_path.replace(f'\\', f'/'))

    str_jars = ';'.join(['file:///' + jar for jar in jars])
    print(f"str_jars:{str_jars}")
    table_jars = ';'.join(['file:///' + jar for jar in jars if 'pulsar' not in jar])
    print(f"table_jars:{table_jars}")
    # env.add_jars(str_jars)
    # table_env.get_config().get_configuration().set_string("pipeline.jars", str_jars)
    env.add_jars(table_jars)        # windows系统下
    table_env.get_config().get_configuration().set_string("pipeline.jars", table_jars)  # Windows系统下

问题二:py4j.protocol.Py4JError: org.apache.flink.connector.pulsar.source.PulsarSource.builder does not exist in the JVM

file:mycode/libs/flink-connector-jdbc-3.1.2-1.18.jar;file:mycode/libs/mysql-connector-java-5.1.9.jar;file:mycode/libs/flink-connector-pulsar-4.1.0-1.18.jar
ERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/py4j/java_gateway.py", line 1224, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

  File "/usr/local/lib/python3.9/dist-packages/py4j/java_gateway.py", line 1038, in send_command
    response = connection.send_command(command)
  File "/usr/local/lib/python3.9/dist-packages/py4j/java_gateway.
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值