运行增强学习框架Ray报错:关于Redis的一些东西,redis.exceptions...

(不断更新...)

 

  • Ray Version:0.5.3
  • Python Version:3.5.6

导入ray,并初始化执行环境

import ray
ray.init(use_raylet=True)

得到如下错误1:

redis.exceptions.DataError:类型的输入无效:'NoneType'。首先转换为字节,字符串或数字。

Process STDOUT and STDERR is being redirected to /tmp/raylogs/.
Waiting for redis server at 127.0.0.1:21072 to respond...
Waiting for redis server at 127.0.0.1:56491 to respond...
Starting the Plasma object store with 54.00 GB memory.
Starting local scheduler with the following resources: {'CPU': 40, 'GPU': 6}.

======================================================================
View the web UI at http://localhost:8888/notebooks/ray_ui29035.ipynb?token=a281378b39e3d5736fa2fbe58e2235bebd445fa8347d040f
======================================================================

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/ray/worker.py", line 1866, in init
    use_raylet=use_raylet)
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/ray/worker.py", line 1727, in _init
    use_raylet=use_raylet)
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/ray/worker.py", line 2165, in connect
    worker.redis_client.hmset(b"Drivers:" + worker.worker_id, driver_info)
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/ray/utils.py", line 387, in _wrapper
    return orig_attr(*args, **kwargs)
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/redis/client.py", line 2636, in hmset
    return self.execute_command('HMSET', name, *items)
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/redis/client.py", line 754, in execute_command
    connection.send_command(*args)
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/redis/connection.py", line 619, in send_command
    self.send_packed_command(self.pack_command(*args))
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/redis/connection.py", line 659, in pack_command
    for arg in imap(self.encoder.encode, args):
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/redis/connection.py", line 124, in encode
    "byte, string or number first." % typename)
redis.exceptions.DataError: Invalid input of type: 'NoneType'. Convert to a byte, string or number first.

Redis的异常,加上Redis的地址之后:

ray.init(redis_address='127.0.0.1:6379',use_raylet=True)

得到错误2:

redis.exceptions.ConnectionError: Error 111 connecting to 10.36.3.117:6379. Connection refused.

 

解决:

问题1:因为使用〜$ pip install redis版本为最新的redis-3.0版本,Ray不适用于最新版本的redis。

解决1:降级到〜$ pip install -U redis == 2.10.6

 

问题2:因为此方法使用特定了一个存在的雷集群,需要先与Redis的绑定

解决2:$ ray start --redis-address 127.0.0.1:6379

 

问题3

redis.exceptions.ConnectionError: Error 111 connecting to localhost:6379. Connection refused.

ConnectionResetError: [Errno 104] Connection reset by peer   --- send的数据size太大,服务器端重置了连接

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/redis/connection.py", line 484, in connect
    sock = self._connect()
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/redis/connection.py", line 541, in _connect
    raise err
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/redis/connection.py", line 529, in _connect
    sock.connect(socket_address)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/redis/client.py", line 1032, in keys
    return self.execute_command('KEYS', pattern)
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/redis/client.py", line 673, in execute_command
    connection.send_command(*args)
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/redis/connection.py", line 610, in send_command
    self.send_packed_command(self.pack_command(*args))
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/redis/connection.py", line 585, in send_packed_command
    self.connect()
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/redis/connection.py", line 489, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 111 connecting to localhost:6379. Connection refused.

解决:

~$ sudo apt-get install -f redis-server

测试redis可用(输出不报错):

>>> import redis
>>> conn = redis.Redis()
>>> conn.keys('*')
[]

但是Ray中问题依旧。

原因:

      原因是数据量过大,尝试使用小数据得以解决。

      查看日志

byz@ubuntu:~$ cat /tmp/raylogs/redis-0-2018-11-29_16-36-35-01614.out
39660:M 29 Nov 16:36:35.321 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
39660:M 29 Nov 16:36:35.321 # Server started, Redis version 3.9.102
39660:M 29 Nov 16:36:35.321 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
39660:M 29 Nov 16:36:35.321 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
39660:signal-handler (1543480675) Received SIGTERM scheduling shutdown...
39660:M 29 Nov 16:37:55.635 # User requested shutdown...
39660:M 29 Nov 16:37:55.635 # Redis is now ready to exit, bye bye...

 

解决:    

第一个警告:The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.

意思是:TCP  backlog设置值,511没有成功,因为 /proc/sys/net/core/somaxconn这个设置的是更小的128.

该内核参数默认值一般是128,对于负载很大的服务程序来说大大的不够。一般会将它修改为2048或者更大。

临时解决方法:(即下次启动还需要修改此值)

echo 511 > /proc/sys/net/core/somaxconn

永久解决方法:(即以后启动还需要修改此值)

将其写入/etc/rc.local或/etc/sysctl.conf文件中:

     net.core.somaxconn = 2048

baklog参数实际控制的是已经3次握手成功的还在accept queue的大小。

参考linux里的backlog详解

 

第二个警告:overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to/etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.

意思是:overcommit_memory参数设置为0!在内存不足的情况下,后台程序save可能失败。建议在文件 /etc/sysctl.conf 中将overcommit_memory修改为1。

临时解决方法:echo "vm.overcommit_memory=1" > /etc/sysctl.conf

永久解决方法:将其写入/etc/sysctl.conf文件中。

参考:有关linux下redis overcommit_memory的问题

 

第三个警告:you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix thisissue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain thesetting after a reboot. Redis must be restarted after THP is disabled.

意思是:你使用的是透明大页,可能导致redis延迟和内存使用问题。

临时解决方法:

echo never > /sys/kernel/mm/transparent_hugepage/enabled

永久解决方法:

      将其写入/etc/rc.local或/etc/sysctl.conf文件中。

      为了在重启后保留设置,禁用THP后必须重新启动Redis。

      ~$ systemctl status redis-server.service

      ~$ sudo /etc/init.d/redis-server restart 

if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
   echo never > /sys/kernel/mm/transparent_hugepage/enabled
fi
if test -f /sys/kernel/mm/transparent_hugepage/defrag; then
   echo never > /sys/kernel/mm/transparent_hugepage/defrag
fi

最后在终端中执行

     $ sysctl -p

参考透明大页介绍

            

      

      

问题4:

Traceback (most recent call last):
  File "<stdin>", line 35, in <module>
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/ray/tune/tune.py", line 118, in run_experiments
    raise TuneError("Trials did not complete", errored_trials)
ray.tune.error.TuneError: ('Trials did not complete', )

解决:

       哈哈,仔细检查代码,其实是神经网络的label格式做错了。

 

问题5:

redis.exceptions.ConnectionError: Error 104 while writing to socket. Connection reset by peer.

描述:将Pyspark(spark = SparkSession.. )和(数据获取) 放在Ray代码之外 作为数据源,再将其传入Ray代码内,即会报这个错误。

解决:将spark初始化放入Ray的初始化(_setup(self, config))中,并在Ray的类内定义一个用于数据获取的函数.

Traceback (most recent call last):
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/redis/connection.py", line 590, in send_packed_command
    self._sock.sendall(item)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/redis/client.py", line 667, in execute_command
    connection.send_command(*args)
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/redis/connection.py", line 610, in send_command
    self.send_packed_command(self.pack_command(*args))
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/redis/connection.py", line 603, in send_packed_command
    (errno, errmsg))
redis.exceptions.ConnectionError: Error 104 while writing to socket. Connection reset by peer.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/redis/connection.py", line 590, in send_packed_command
    self._sock.sendall(item)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "pbt_fabric_8/Fabric_Ray_inception_resnet_v2.py", line 431, in <module>
    trials = tune.run_experiments({"pbt_fabric_8": train_spec}, scheduler=pbt, verbose=0) #verbose=False
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/ray/tune/tune.py", line 93, in run_experiments
    search_alg.add_configurations(experiments)
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/ray/tune/suggest/basic_variant.py", line 38, in add_configurations
    experiment_list = convert_to_experiment_list(experiments)
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/ray/tune/experiment.py", line 202, in convert_to_experiment_list
    for name, spec in experiments.items()
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/ray/tune/experiment.py", line 202, in <listcomp>
    for name, spec in experiments.items()
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/ray/tune/experiment.py", line 139, in from_json
    exp = cls(name, run_value, **spec)
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/ray/tune/experiment.py", line 94, in __init__
    "run": self._register_if_needed(run),
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/ray/tune/experiment.py", line 173, in _register_if_needed
    register_trainable(name, run_object)
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/ray/tune/registry.py", line 38, in register_trainable
    _global_registry.register(TRAINABLE_CLASS, name, trainable)
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/ray/tune/registry.py", line 79, in register
    self.flush_values()
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/ray/tune/registry.py", line 101, in flush_values
    _internal_kv_put(_make_key(category, key), value, overwrite=True)
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/ray/experimental/internal_kv.py", line 42, in _internal_kv_put
    updated = worker.redis_client.hset(key, "value", value)
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/ray/utils.py", line 404, in _wrapper
    return orig_attr(*args, **kwargs)
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/redis/client.py", line 1992, in hset
    return self.execute_command('HSET', name, key, value)
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/redis/client.py", line 673, in execute_command
    connection.send_command(*args)
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/redis/connection.py", line 610, in send_command
    self.send_packed_command(self.pack_command(*args))
  File "/home/byz/anaconda3/envs/py35/lib/python3.5/site-packages/redis/connection.py", line 603, in send_packed_command
    (errno, errmsg))
redis.exceptions.ConnectionError: Error 104 while writing to socket. Connection reset by peer.

 

 

 

 

 

 

 

评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值