我最近开始使用flink用于数据处理。
当我尝试执行table api 用于计数时
我不能导入OldCsv and FileSystem from pyflink.table.descriptors.
I have also downloaded apache-flink using: pip install apache-flink
[root@master flink]# pip3 list | grep flink
apache-flink 1.17.0
apache-flink-libraries 1.17.0
flink 1.0
pyflink 1.0
Libraries imported:
from pyflink.table import DataTypes, TableEnvironment, EnvironmentSettings
from pyflink.table.descriptors import Schema, OldCsv , FileSystem
from pyflink.table.expressions import lit
1
maybe it works in newest version and you have to update module. OR maybe it was avaliable in old version and removed in new version. You links show message that it is old documentation. OR maybe it needs to install other module to get this function. –
furas
Nov 23, 2021 at 7:21
here is link to newest documentation for version 1.15 and I can't find OldCsv in this version
but I could find in documentation for 1.13 –
furas
Nov 23, 2021 at 7:26
pip3 install apache-flink==1.13
[root@master ~]# pip3 list | grep flink
WARNING: Ignoring invalid distribution -andas (/usr/local/python38/lib/python3.8/site-packages)
apache-flink 1.13.0
apache-flink-libraries 1.13.0
flink 1.0
pyflink 1.0
[root@master ~]#
[root@master flink]# cat t3.py
from pyflink.table import DataTypes, TableEnvironment, EnvironmentSettings
from pyflink.table.descriptors import Schema, OldCsv, FileSystem
from pyflink.table.expressions import lit
settings = EnvironmentSettings.new_instance().in_batch_mode().use_blink_planner().build()
t_env = TableEnvironment.create(settings)
# write all the data to one file
t_env.get_config().get_configuration().set_string("parallelism.default", "1")
t_env.connect(FileSystem().path('/tmp/input')) \
.with_format(OldCsv().field('word', DataTypes.STRING())) \
.with_schema(Schema().field('word', DataTypes.STRING())) \
.create_temporary_table('mySource')
t_env.connect(FileSystem().path('/tmp/output')) \
.with_format(OldCsv().field_delimiter('\t') \
.field('word', DataTypes.STRING()) \
.field('count', DataTypes.BIGINT())) \
.with_schema(Schema() \
.field('word', DataTypes.STRING()) \
.field('count', DataTypes.BIGINT())) \
.create_temporary_table('mySink')
tab = t_env.from_path('mySource')
tab.group_by(tab.word).select(tab.word, lit(1).count).execute_insert('mySink').wait()
[root@master flink]# python3 t3.py
[root@master flink]# cat /tmp/output
flink 2
pyflink 1