I've been trying to get the Apache Beam Portability Framework to work with Python and Apache Flink and I can't seem to find a complete set of instructions to get the environment working. Are there any references with complete list of prerequisites and steps to get a simple python pipeline working?
解决方案
Overall, for local portable runner (ULR), see the wiki, quote from there:
Run a Python-SDK Pipeline:
Compile container as a local build: ./gradlew :beam-sdks-python-container:docker
Start ULR job server, for example: ./gradlew :beam-runners-reference-job-server:run -PlogLevel=debug -PvendorLogLevel=warning . For details see the Java section in the above link.
3 Set up python environment properly. More details can be found here.
Run pipeline by using following (under folder sdk/python),
example:
python -m apache_beam.examples.wordcount\
--input=gs://dataflow-samples/shakespeare/kinglear.txt \
--output=/tmp/output \
--runner=PortableRunner \
--job_endpoint=localhost:8099 \
--experiments beam_fn_api
For Flink you need to use a different job server: ./gradlew beam-runners-flink_2.11-job-server:runShadow. The host:port is localhost:8099,
Relevant email discussions: one, two.
Possibly worth looking at some code: one, two.