在测试mapred程序中连接mongodb时(python streaming作业),mapred程序抛出了错误
Traceback (most recent call last):
File "/data3/hadoop/mapred/mrlocal/taskTracker/hadoop/jobcache/job_201201040946_65346/attempt_201201040946_65346_m_000000_0/work/./ExampleDataPreproMap.py", line 7, in ?
import pymongo
File "build/bdist.linux-x86_64/egg/pymongo/__init__.py", line 104, in ?
File "build/bdist.linux-x86_64/egg/pymongo/connection.py", line 45, in ?
File "build/bdist.linux-x86_64/egg/pymongo/common.py", line 20, in ?
File "build/bdist.linux-x86_64/egg/pymongo/errors.py", line 17, in ?
File "build/bdist.linux-x86_64/egg/bson/__init__.py", line 39, in ?
File "build/bdist.linux-x86_64/egg/bson/_cbson.py", line 7, in ?
File "build/bdist.linux-x86_64/egg/bson/_cbson.py", line 4, in __bootstrap__
File "/usr/lib/python2.4/site-packages/setuptools-0.6c11-py2.4.egg/pkg_resources.py", line 881, in resource_filename
return get_provider(package_or_requirement).get_resource_filename(
File "/usr/lib/python2.4/site-packages/setuptools-0.6c11-py2.4.egg/pkg_resources.py", line 1351, in get_resource_filename
self._extract_resource(manager, self._eager_to_zip(name))
File "/usr/lib/python2.4/site-packages/setuptools-0.6c11-py2.4.egg/pkg_resources.py", line 1372, in _extract_resource
real_path = manager.get_cache_path(
File "/usr/lib/python2.4/site-packages/setuptools-0.6c11-py2.4.egg/pkg_resources.py", line 962, in get_cache_path
self.extraction_error()
File "/usr/lib/python2.4/site-packages/setuptools-0.6c11-py2.4.egg/pkg_resources.py", line 928, in extraction_error
raise err
pkg_resources.ExtractionError: Can't extract file(s) to egg cache
The following error occurred while trying to extract file(s) to the Python egg
cache:
[Errno 13] Permission denied: '/homes'
The Python egg cache directory is currently set to:
/homes/.python-eggs
Perhaps your account does not have write access to this directory? You can
change the cache directory by setting the PYTHON_EGG_CACHE environment
variable to point to an accessible directory.
按照上述的错误信息,python程序在尝试导入pymong模块时发生了错误(在对pymongo的egg文件解压时发生了权限错误)
于是按照日志中提供的解决办法,修改了hadoop集群服务器的环境变量:PYTHON_EGG_CACHE(在hadoop用户的.bashrc文件中修改),修改后重新执行mapred程序,发现程序还是报相同的错误,可能原因是hadoop的tasktracker程序中的环境变量是启动时的值(未经测试)。
既然修改直接服务器的的环境变量无法生效,那么就在mapred程序中修改PYTHON_EGG_CACHE变量,代码如下:
os.environ['PYTHON_EGG_CACHE'] = '/tmp/'
import pymongo
代码修改后再次运行mapred程序,错误消失。
附注,网上找到的解决办法:
From my investigations it turns out that some eggs are packaged as zip files, and are saved as such in Python's site-packages directory.
These zipped eggs need to be unzipped before they can be executed, so are expanded into the PYTHON_EGG_CACHE directory which by default is ~/.python-eggs (located in the user's home directory). If this doesn't exist it causes problems when trying to run applications.
There are a number of fixes:
1. Create a .python-eggs directory in the user's home directory and make it writable for the user.
2. Create a global directory for unzipping (eg. /tmp/python-eggs) and set the environment variable PYTHON_EGG_CACHE to this directory.
3. Use the -Z switch when using easy_install to unzip the package when installing.