ray使用中常见报错整理及解决
报错一:
ray start 时报错:
Could not terminate `"/usr/bin/redis-server 127.0.0.1:6379" "" "" "" "" "" "" ""` due to psutil.AccessDenied (pid=1866, name='redis-server')
Stopped only 0 out of 1 Ray processes. Set `-v` to see more details.
Try running the command again, or use `--force`.
解决方法:
sudo service redis-server stop
报错二:
ray init 时报错:
[raylet_client.cc:54] Could not connect to socket /tmp/ray/session_xxx/sockets/raylet
*** Check failure stack trace: ***
@ 0x7fa83d5ddf5d google::LogMessage::Fail()
@ 0x7fa83d5df0bc google::LogMessage::SendToLog()
@ 0x7fa83d5ddc39 google::LogMessage::Flush()
@ 0x7fa83d5dde51 google::LogMessage::~LogMessage()
@ 0x7fa83d594ff9 ray::RayLog::~RayLog()
@ 0x7fa83d2bc905 ray::raylet::RayletConnection::RayletConnection()
@ 0x7fa83d2bdc6e ray::raylet::RayletClient::RayletClient()
@ 0x7fa83d25c2d7 ray::CoreWorker::CoreWorker()
@ 0x7fa83d260484 ray::CoreWorkerProcess::CreateWorker()
@ 0x7fa83d2616f2 ray::CoreWorkerProcess::CoreWorkerProcess()
@ 0x7fa83d2620bb ray::CoreWorkerProcess::Initialize()
@ 0x7fa83d19bdce __pyx_pw_3ray_7_raylet_10CoreWorker_1__cinit__()
@ 0x7fa83d19d5b5 __pyx_tp_new_3ray_7_raylet_CoreWorker()
@ 0x5653e65a4c49 _PyObject_FastCallKeywords
@ 0x5653e66099a1 _PyEval_EvalFrameDefault
@ 0x5653e654db00 _PyEval_EvalCodeWithName
@ 0x5653e659d497 _PyFunction_FastCallKeywords
@ 0x5653e6605cba _PyEval_EvalFrameDefault
@ 0x5653e654db00 _PyEval_EvalCodeWithName
@ 0x5653e659d497 _PyFunction_FastCallKeywords
@ 0x5653e6605cba _PyEval_EvalFrameDefault
@ 0x5653e654d59a _PyEval_EvalCodeWithName
@ 0x5653e654e610 _PyFunction_FastCallDict
@ 0x5653e656cb93 _PyObject_Call_Prepend
@ 0x5653e65a40aa slot_tp_init
@ 0x5653e65a4a47 type_call
@ 0x5653e655f95e PyObject_Call
@ 0x5653e660651a _PyEval_EvalFrameDefault
@ 0x5653e654d2b9 _PyEval_EvalCodeWithName
@ 0x5653e659d497 _PyFunction_FastCallKeywords
@ 0x5653e6605cba _PyEval_EvalFrameDefault
@ 0x5653e654d2b9 _PyEval_EvalCodeWithName
/tmp/usernamexxx.sh: line 35: 7004 Aborted (core dumped) xxx
Build step 'Execute shell' marked build as failure
Archiving artifacts
Finished: FAILURE
尝试删除
sudo rm -r /tmp/ray
后依旧报错。
解决方法:
重启ray cluster即可(重启head及所有worker节点):
ray stop