python flink_apache flink的python客户端开发

1 flink开发环境安装

1.1 cmd命令行执行:

set MAVEN_OPTS="-Xmx4G"

mvn clean install package -Dmaven.test.skip=true

1.2 powershell命令行执行

set MAVEN_OPTS="-Xmx4G"

mvn clean install package '-Dmaven.test.skip=true'

1.3 flink的目录结构

cd E:\app-installtools\flink\flink-dist\target\flink-1.9-SNAPSHOT-bin\flink-1.9-SNAPSHOT\bin

Mode LastWriteTime Length Name

---- ------------- ------ ----

-a---- 2019/4/24 11:50 29647 config.sh

-a---- 2019/4/24 11:50 2279 flink

-a---- 2019/4/24 11:50 2847 flink-console.sh

-a---- 2019/4/24 11:50 6545 flink-daemon.sh

-a---- 2019/4/24 11:50 1271 flink.bat

-a---- 2019/4/24 11:50 1603 historyserver.sh

-a---- 2019/4/24 11:50 2967 jobmanager.sh

-a---- 2019/4/24 11:50 1849 mesos-appmaster-job.sh

-a---- 2019/4/24 11:50 1883 mesos-appmaster.sh

-a---- 2019/4/24 11:50 1935 mesos-taskmanager.sh

-a---- 2019/4/24 11:50 1207 pyflink-stream.sh

-a---- 2019/4/24 11:50 1166 pyflink.bat

-a---- 2019/4/24 11:50 1132 pyflink.sh

-a---- 2019/4/24 11:50 3517 sql-client.sh

-a---- 2019/4/24 11:50 2597 standalone-job.sh

-a---- 2019/4/24 11:50 3364 start-cluster.bat

-a---- 2019/4/24 11:50 1889 start-cluster.sh

-a---- 2019/4/24 11:50 3538 start-scala-shell.sh

-a---- 2019/4/24 11:50 1900 start-zookeeper-quorum.sh

-a---- 2019/4/24 11:50 1663 stop-cluster.sh

-a---- 2019/4/24 11:50 1891 stop-zookeeper-quorum.sh

-a---- 2019/4/24 11:50 3941 taskmanager.sh

-a---- 2019/4/24 11:50 1714 yarn-session.sh

-a---- 2019/4/24 11:50 2346 zookeeper.sh

1.4 启动flink

E:\app-installtools\flink\flink-dist\target\flink-1.9-SNAPSHOT-bin\flink-1.9-SNAPSHOT\bin\start-cluster.bat

Web interface by default on http://localhost:8081/.

2 flink计算word count案例

2.1 word count案例1

from flink.plan.Environment import get_environment

from flink.functions.GroupReduceFunction import GroupReduceFunction

class Adder(GroupReduceFunction):

def reduce(self, iterator, collector):

count, word = iterator.next()

count += sum([x[0] for x in iterator])

collector.collect((count, word))

env = get_environment()

data = env.from_elements("Who's there?",

"I think I hear them. Stand, ho! Who's there?")

data \

.flat_map(lambda x, c: [(1, word) for word in x.lower().split()]) \

.group_by(1) \

.reduce_group(Adder(), combinable=True) \

.map(lambda y: 'Count:%sWord:%s' % (y[0], y[1])) \

.output()

# Out[6]:

env.execute(local=True)

2.2 代码输出结果为

Count: 2 Word: i

Count: 1 Word: ho!

Count: 1 Word: hear

Count: 1 Word: them.

Count: 1 Word: think

Count: 2 Word: who's

Count: 1 Word: stand,

Count: 2 Word: there?

2.3 代码讲解

2.3.1 Map

输入一个元素,输出一个元素

data.map(lambda x: x * 2)

2.3.2 FlatMap

输入一个元素,输出0个,1个,或多个元素

data.flat_map(

lambda x,c: [(1,word) for word in line.lower().split() for line

in x])

2.3.3 MapPartition通过一次函数调用实现并行的分割操作。

该函数将分割变换作为一个”迭代器”,并且能够产生任意数量的输出值。

每次分割变换的元素数量取决于变换的并行性和之前的操作结果。

data.map_partition(lambda x,c: [value * 2 for value in x])

2.3.4 Filter

对每一个元素,计算一个布尔表达式的值,保留函数计算结果为true的元素。

data.filter(lambda x: x > 1000)

2.3.5 Reduce通过不断的将两个元素组合为一个,来将一组元素结合为一个单一的元素。

这种缩减变换可以应用于整个数据集,也可以应用于已分组的数据集。

data.reduce(lambda x,y : x + y)

2.3.6 ReduceGroup将一组元素缩减为1个或多个元素。

缩减分组变换可以被应用于一个完整的数据集,或者一个分组数据集。

class Adder(GroupReduceFunction):

def reduce(self, iterator, collector):

count, word = iterator.next()

count += sum([x[0] for x in iterator)

collector.collect((count, word))

data.reduce_group(Adder())

2.4 代码优化,结果写入到文件中

from flink.plan.Environment import get_environment

from flink.plan.Constants import INT, STRING, WriteMode

from flink.functions.GroupReduceFunction import GroupReduceFunction

class Adder(GroupReduceFunction):

def reduce(self, iterator, collector):

count, word = iterator.next()

count += sum([x[0] for x in iterator])

collector.collect((count, word))

env = get_environment()

data = env.from_elements("Who's there?",

"I think I hear them. Stand, ho! Who's there?")

output_file = 'file:///../examples/out.txt'

data \

.flat_map(lambda x, c: [(1, word) for word in x.lower().split()]) \

.group_by(1) \

.reduce_group(Adder(), combinable=True) \

.map(lambda y: 'Count:%sWord:%s' % (y[0], y[1])) \

.write_text('out.txt', write_mode=WriteMode.OVERWRITE)

# Out[6]:

env.execute(local=True)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值