思路
- 读取大规模数据需要借助
pipeline
,速度会更快 - 在读取后,需要删除掉异常读取的数据,pipeline不会报错,但是会返回一个
type(line).__name__ == "ResponseError"
的结果 - redis读取的数据是byte格式,需要整理为str格式
示例
import redis
import pandas as pd
from tqdm import tqdm
pool = redis.ConnectionPool(host='127.0.0.1', db=5)
redis_cli = redis.Redis(connection_pool=pool)
redis_cli.hmset("value1", {"k1": "v1", "k2": "v2", "k3": "v3"})
redis_cli.expire('value1', 30)
redis_cli.hmset("value2", {"k1": "v1", "k2": "v2", "k3": "v3"})
redis_cli.expire('value2', 30)
redis_cli.hmset("value3", {"k1": "v1", "k2": "v2", "k3": "v3"})
redis_cli.expire('value3', 30)
redis_cli.set('奇奇怪怪的key', '奇奇怪怪的value')
redis_cli.expire('奇奇怪怪的key', 30)
pipe = redis_cli.pipeline()
key_list = []
keys = redis_cli.keys()
for key in keys:
key_list.append(key)
pipe.hgetall(key)
value_list = pipe.execute(raise_on_error=False)
drop_index = []
for i, line in tqdm(enumerate(value_list)):
if type(line).__name__ == "ResponseError":
drop_index.append(i)
continue
else:
value_list[i] = {k.decode('utf8'): v.decode('utf8') for k, v in line.items()}
[value_list.pop(index) for index in drop_index]
df = pd.DataFrame(value_list)
print('a')