Writing Python UDFs
1.write python script:
@outputSchema("word:chararray")
def helloworld():
return 'Hello, World'
@outputSchema("word:chararray,num:long")
def complex(word):
return str(word),len(word)
@outputSchemaFunction("squareSchema")
def square(num):
return ((num)*(num))
@schemaFunction("squareSchema")
def squareSchema(input):
return input
# No decorator - bytearray
def concat(str):
return str+str
2.register python script as myFcuns:
register './Desktop/test.py' using jython as myFuncs;
3.use the python script:
python_records =foreach records generate myFuncs.helloworld(),myFuncs.square(4),myFuncs.complex('data');
4.check data:
dump python_records;
5.analyse:
outputSchema – 定义一种 pig 能解析的自定义函数模式
outputFunctionSchema – 定义可传入参数的函数,根据用户传入的参数觉得返回的值,这个需要函数能接受泛型参数
schemaFunction – 这种函数不会在 pig 中注册
6. 实际应用,使用 square 函数:
python_records =foreach records generate name,myFuncs.square(age),sex;