I am writing MapReduce job in Python, and want to use some third libraries like chardet.
I konw that we can use option -libjars=... to include them for java MapReduce.
But how to include third party libraries in Python MapReduce Job ?
Thank you!
解决方案
Problem has been solved by zipimport.
Then I zip chardet to file module.mod, and used like this:
importer = zipimport.zipimporter('module.mod')
chardet = importer.load_module('chardet')
Add -file module.mod in hadoop streaming command.
Now chardet can be used in script.