I've seen the documentatio here, but I confess that I feel it rather lacking. I was wondering if anyone could give me collection of examples as to incorporating Python UDFs into Pig. In particular
Prior to Pig 0.10, the boolean type does not exist, but a FILTER operation requires the result resolve to a boolean. Am I forever cursed with returning 1 or 0 and using FILTER alias BY py_udf.f(field) > 0 if I don't have the latest version?
Are the Algebraic, Accumulator, and Filter interfaces inaccessible from Python?
Can I not access the Distributed Cache either?
What about Store/Load functions?
解决方案
Python UDFs are quite limited. You cannot use Algebraic or Accumulator interfaces, nor can you write a LoadFunc in Python. For anything more complicated than a map operation you will likely need to resort to a Java UDF.
That said, a more complex Python UDF with a dynamic outputSchema can be found at http://ragrawal.wordpress.com/2013/02/24/on-writing-python-udf-for-pig-a-perspective/. This likely won't help you, but it will give you a better understanding of what Python UDFs can do.