(一)问题描述:拥有接近亿级的数据需要写入,如何能较快速把数据写入redis,成为比较棘手的问题?
方案(1):一条一条的写
方案(2):使用pipeline
方案(3):使用管道pipe
cat data.txt | redis-cli --pipe
三种方法的速度是(1)>(2)>(3),方法(1)之所以耗时很大原因是因为命令是同步的,读写的次数大大增大了其时间的损耗,而方法(2)由于命令是异步的,可以很大程度上减少由于指令同步带来的资源损耗。方法(3)比方法(2)快的原因在于,其直接交互redis服务器,把命令通过某种协议写入到redis服务器。
官方文档指出:方法(1)
Redis is a TCP server using the client-server model and what is called a Request/Response protocol.
This means that usually a request is accomplished with the following steps:
- The client sends a query to the server, and reads from the socket, usually in a blocking way, for the server response.
- The server processes the command and sends the response back to the client.
官方文档指出:方法(2)
A Request/Response server can be implemented so that it is able to process new requests even if the client didn't already read the old responses. This way it is possible to send multiple commands to the server without waiting for the replies at all, and finally read the replies in a single step.
官方文档指出:方法(3)
Using a normal Redis client to perform mass insertion is not a good idea for a few reasons: the naive approach of sending one command after the other is slow because you have to pay for the round trip time for every command. It is possible to use pipelining, but for mass insertion of many records you need to write new commands while you read replies at the same time to make sure you are inserting as fast as possible.
Only a small percentage of clients support non-blocking I/O, and not all the clients are able to parse the replies in an efficient way in order to maximize throughput. For all this reasons the preferred way to mass import data into Redis is to generate a text file containing the Redis protocol, in raw format, in order to call the commands needed to insert the required data.