介绍
我们来介绍一个使用redis stream的实战项目。
我们要从twitter里拿到名人的数据,然后将其分类存储。
所以项目包含两个端点:Twitter ingest stream和Twitter influencer classifier。
这两端分别是数据的读取和数据的消化。
这两端使用的都是stream的数据类型。
使用stream的好处:
数据的生产和消费是异步的;消费端在生产者产生数据之前会一直等待。
数据的生命周期:
这就是stream的一个group消费数据的过程。
上图所说的safety指的是XACK
命令。这将保证数据不丢失。
消化好的数据,我们要进行分类存储,我们会使用sorted set
和hash
两种数据类型:
用sorted set
存储名人,然后每个名人的信息用hash
存储。
环境和准备
所有的类:
项目用maven管理。需要引进的依赖:
<!-- https://mvnrepository.com/artifact/com.pubnub/pubnub-gson -->
<dependency>
<groupId>com.pubnub</groupId>
<artifactId>pubnub-gson</artifactId>
<version>4.19.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/io.lettuce/lettuce-core -->
<dependency>
<groupId>io.lettuce</groupId>
<artifactId>lettuce-core</artifactId>
<version>5.2.2.RELEASE</version>
</dependency>
一个redis客户端,一个操纵pubnub的api。
twitter的数据是从pubnub上拿取的。
首先你要注册一个pubnub账号。
然后,你需要找到twitter-channel这个通道,因为我们会在代码中订阅它:
它会给你订阅的key:
它还给出了js的示例代码,我们用java写也是一样的。
1. Load the PubNub javascript SDK:
<script src="https://cdn.pubnub.com/sdk/javascript/pubnub.4.3.2.min.js"></script>
2. Init, Listen, and Subscribe!
var pubnub = new PubNub({
subscribe_key: 'sub-c-78806dd4-42a6-11e4-aed8-02ee2ddab7fe'
});
pubnub.addListener({
message: function(message) {
console.log(message.message); }
});
pubnub.subscribe({
channels: ['pubnub-twitter']
});
准备好consumer group
首先你要和redis服务端连接:
/**
* This is a wrapper class around Lettuce library.
*
*/
public class LettuceConnection{
private RedisClient client = null;
private StatefulRedisConnection<String, String> connection = null;
private LettuceConnection() {
}
public synchronized static LettuceConnection getInstance() throws Exception{
LettuceConnection lettuceConnection = new LettuceConnection();
lettuceConnection.init();
return lettuceConnection;
}
private void init() throws Exception{
try {
// Make sure to change the URL if it is different in your case
client = RedisClient.create("redis://hostname:port");
connection = client.connect();
}catch(Exception e) {
e.printStackTrace();
throw e;
}
}
public StatefulRedisConnection<String, String> getRedisConnection() throws Exception{
if(connection == null) {
this.init();
}
return connection;
}
public RedisCommands<String, String> getRedisCommands() throws Exception{
if(connection == null) {
this.init();
}
return connection.sync();
}
public void close() throws Exception{
if(connection != null) {
connection.close();
}
if(client != null) {
client.shutdown();
}
}
}
注意将hostname和port换成你自己的。
接下来我们要准备消费数据的stream以及连着它的group了:
/**
* Redis Stream, in general, doesn't require initialization. In our demo,
* we show how you could use a consumer group to read the data. Redis
* does not allow you to create a consumer group with an empty Redis Stream.
* Therefore, we add a line of dummy data to the stream and create a consumer
* group.
*
* IMPORTANT: Run this program only once before running other programs.
*
*/
public class InitializeConsumerGroup{
public static final String STREAM_ID = "twitterstream";
public static final String GROUP_ID = "influencer";
private static LettuceConnection connection = null;
private static RedisCommands<String, String> commands = null;
public static void main(String[] args) throws Exception{
connection = LettuceConnection.getInstance();;
commands = connection.getRedisCommands();
initStream();
initGroup();
}
private static void initStream() throws Exception{
String type = commands.type(STREAM_ID);
if(type != null && !type.equals("stream")) {
commands.del(STREAM_ID);
addRawData();
}
if(type == null){
addRawData();
}
}
private static void addRawData() throws Exception{
HashMap<String, String> map = new HashMap<String, String>();
map.put("start", "stream");
commands.xadd(STREAM_ID, map);
}
private static void initGroup() {
try {
commands.xgroupCreate(XReadArgs.StreamOffset.latest(STREAM_ID), GROUP_ID);
}catch(Exception e) {
System.out.println(e.getMessage());
}
}
}
因为空的stream不能创建group,所以我们在stream中加了一条数据。
运行main方法,此时我们在redis-server中就有twitterstream
这个key了:
生产数据
/**
* IngestStream class allows you to write data to a Redis Stream.
* You can run this class to test whether you have the right version
* of Redis that supports Redis Stream. Typically you extend IngestStream
* to provide your own implementation. For example, TwitterIngestStream
* extends IngestStream.
*
*/
public class IngestStream{
protected String streamId = null;
protected LettuceConnection connection = null;
protected RedisCommands<String, String> commands = null;
// Hide the constructor and force external objects to instantiate
// via the factory method
protected IngestStream() {
}
// Factory method to instantiate the object. This method instantiates the object
// and creates the connection to the Redis database
public synchronized static IngestStream getInstance(String streamId) throws Exception{
IngestStream ingestStream = new IngestStream();
ingestStream.streamId = streamId;
ingestStream.init();
return ingestStream;
}
// Initializes the Lettuce library
protected void init() throws Exception{
connection = LettuceConnection.getInstance();
commands = connection.getRedisCommands();
}
// Adds the key-value pair as the stream data
// In Redis Stream, you could pass multiple key-value pairs
// for a single data object. For simplicity, we will save one
// object per line.
public void add(String key, String message) throws Exception{
commands.xadd(streamId, key, message);
}
// Use this for testing only
public static void main(String[] args) throws Exception{
IngestStream ingest = IngestStream.getInstance("mystream");
for(int i=20; i<30; i++) {
ingest.add("k"+i, "v"+i);
}
}
}
这是一个生产数据的通用类,你可以自己实现如何生产数据(继承这个类)。
运行main方法测试:
这证明了我们的确可以用它来向一个stream中add数据。
现在我们要连上pubnub来获取数据了:
/**
* This is the main producer class. When you run this program, it collects
* Twitter data from the PubNub channel and adds them to the Redis Stream
*
*/
public class TwitterIngestStream extends IngestStream{
// Follow instructions on PubNub to get your own key
final static String SUB_KEY_TWITTER = "sub-c-78806dd4-42a6-11e4-aed8-02ee2ddab7fe"; // Change the key
final static String CHANNEL_TWITTER = "pubnub-twitter";
// Factory method
public synchronized static TwitterIngestStream getInstance(String streamId) throws Exception{
TwitterIngestStream ingestStream = new TwitterIngestStream();
ingestStream.streamId = streamId;
ingestStream.init();
return ingestStream;
}
// Making the constructor private to force creating new objects through the factory method
private TwitterIngestStream() {
}
// The main method
public static void main(String[] args) throws Exception{
TwitterIngestStream twitterIngestStream = TwitterIngestStream.getInstance(InitializeConsumerGroup.STREAM_ID);
twitterIngestStream.start();
}
// Following PubNub's example
public void start() throws Exception{
final TwitterIngestStream ingestStream = this;
PNConfiguration pnConfig = new PNConfiguration();
pnConfig.setSubscribeKey(SUB_KEY_TWITTER);
pnConfig.setSecure(false);
PubNub pubnub = new PubNub(pnConfig);
pubnub.subscribe().channels(Arrays.asList(CHANNEL_TWITTER)).execute();
// PubNub event callback
SubscribeCallback subscribeCallback = new SubscribeCallback() {
@Override
public void status(PubNub pubnub, PNStatus status) {
if (status.getCategory() == PNStatusCategory.PNUnexpectedDisconnectCategory) {
// internet got lost, do some magic and call reconnect when ready
pubnub.reconnect();
} else if (status.getCategory() == PNStatusCategory.PNTimeoutCategory) {
// do some magic and call reconnect when ready
pubnub.reconnect();
} else {
System.out.println(status.toString());
}
}
// Receive the message and add to the RedisStream
@Override
public void message(PubNub pubnub, PNMessageResult message) {
try{
JsonObject json = message.getMessage().getAsJsonObject();
// Delete this line if you don't need this log
System.out.println(json.toString());
// Each line or data entry of a Redis Stream is a collection of key-value pairs
// For simplicity, we store only one key-value pair per line. "tweet" is the key
// for each line. Note, that it's not the entry id, because Redis Streams
// autogenerates the entry id.
//
// Example of a Redis Stream:
// twitterstream
// 1837847490983-0 tweet {....}
// 1837847490984-0 tweet {....}
// 1837847490986-0 tweet {....}
// 1837847490987-0 tweet {....}
ingestStream.add("tweet", json.toString());
}catch(Exception e){
e.printStackTrace();
}
}
@Override
public void presence(PubNub pubnub, PNPresenceEventResult presence) {
}
};
// Add callback as a listener (PubNub code)
pubnub.addListener(subscribeCallback);
}
}
这个SUB_KEY_TWITTER
来源于pubnub的twitter stream。
运行main方法,此时控制台会一直打印获取到的消息,每条消息的结构是这样的:
{
"created_at":"Fri Jun 12 06:55:40 +0000 2020",
"id":1271335340767355000,
"id_str":"1271335340767354880",
"text":"#leeminhessa",
"source":"<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>",
"truncated":false,
"in_reply_to_status_id":null,
"in_reply_to_status_id_str":null,
"in_reply_to_user_id":null,
"in_reply_to_user_id_str":null,
"in_reply_to_screen_name":null,
"user":Object{...},
"geo":null,
"coordinates":null,
"place":Object{...},
"contributors":null,
"quoted_status_id":1271329242563768300,
"quoted_status_id_str":"1271329242563768321",
"quoted_status":Object{...},
"quoted_status_permalink":{
"url":"https://t.co/Cat3CE9r7g",
"expanded":"https://twitter.com/ActorLeeMinHo/status/1271329242563768321",
"display":"twitter.com/ActorLeeMinHo/…"
},
"is_quote_status":true,
"quote_count":0,
"reply_count":0,
"retweet_count":0,
"favorite_count":0,
"entities":Object{...},
"favorited":false,
"retweeted":false,
"filter_level":"low",
"lang":"und",
"timestamp_ms":"1591944940164"
}
这就是所谓的message,我们会在消费端解析存储。
redis-server中的情况:
每条数据的id是随机生成的,然后数据的key都是tweet
,message是一个json串。
消费数据
消费端很麻烦,因为你还要处理数据存起来。
/**
* This is the consumer class that reads the data from RedisStream.
* In our example, InfluencerCollectorMain initiates StreamConsumer
* and starts it as a separate thread. The thread waits for a new
* message via a blocking call. It expires every 5 seconds and
* rechecks for a new message.
*
*/
public class StreamConsumer implements Runnable{
public static final String READ_FROM_START = "0";
public static final String READ_NEW = "$";
String streamId = null;
String groupId = null;
String consumerId = null;
String readFrom = READ_NEW;
MessageProcessor messageProcessor = null;
LettuceConnection connection = null;
RedisCommands<String, String> commands = null;
public StreamConsumer(String streamId, String groupId, String consumerId,
String readFrom, MessageProcessor messageProcessor) throws Exception{
this.streamId = streamId;
this.groupId = groupId;
this.consumerId = consumerId;
this.readFrom = readFrom;
this.messageProcessor = messageProcessor;
connection = LettuceConnection.getInstance();
commands = connection.getRedisCommands();
}
public void readStream() throws Exception{
boolean reachedEndOfTheStream = false;
while(!reachedEndOfTheStream) {
List<StreamMessage<String, String>> msgList = getNextMessageList();
if(msgList.size()==0) {
reachedEndOfTheStream = true;
}else {
processMessageList(msgList);
}
}
}
// Non-blocking call
private List<StreamMessage<String, String>> getNextMessageList() throws Exception{
return commands.xreadgroup(
Consumer.from(groupId, consumerId),
XReadArgs.Builder.count(1),
XReadArgs.StreamOffset.from(streamId, "0"));
}
// Blocking call; blocks for 5 seconds
private List<StreamMessage<String, String>> getNextMessageListBlocking() throws Exception{
return commands.xreadgroup(
Consumer.from(groupId, consumerId),
XReadArgs.Builder.count(1).block(Duration.ofSeconds(5)),
XReadArgs.StreamOffset.lastConsumed(streamId));
}
// processes the message and reports back to Redis Stream with XACK
private void processMessageList(List<StreamMessage<String, String>> msgList) {
if(msgList.size()> 0) {
Iterator itr = msgList.iterator();
while(itr.hasNext()) {
StreamMessage<String, String> message =
(StreamMessage<String, String>)itr.next();
Map<String, String> body = message.getBody();
String msgId = message.getId();
System.out.println("message id----->" + msgId);
System.out.println("message body---->" + body);
Iterator keyItr = body.keySet().iterator();
while(keyItr.hasNext()) {
String key = (String)keyItr.next();
String value = (String)body.get(key);
try {
messageProcessor.processMessage(value);
commands.xack(streamId, groupId, msgId);
}catch(Exception e) {
System.out.println(e.getMessage());
}
}
}
}
}
private boolean stopThread = false;
public void close() throws Exception{
stopThread = true;
if(connection != null) {
connection.close();
}
}
// This is helpful during the startup. It helps the consumer
// to catch up with the messages that it has not read so far
private boolean processPendingMessages() throws Exception{
boolean pendingMessages = true;
List<StreamMessage<String, String>> msgList = getNextMessageList();
if(msgList.size()!=0) {
processMessageList(msgList);
}else {
System.out.println("Done processing pending messages");
pendingMessages = false;
}
return pendingMessages;
}
// Read messages at runtime
private void processOngoingMessages() throws Exception{
List<StreamMessage<String, String>> msgList = getNextMessageListBlocking();
if(msgList.size()!=0) {
processMessageList(msgList);
}else {
System.out.println("******Group: "+groupId+" waiting. No new message*****");
}
}
// Thread function
@Override
public void run() {
try {
boolean pendingMessages = true;
while(pendingMessages) {
pendingMessages = processPendingMessages();
}
while(!stopThread) {
processOngoingMessages();
}
}catch(Exception e) {
e.printStackTrace();
}
}
}
这个StreamConsumer
代码很多(我将其分割成两部分展示)。
我们肢解它看。
首先,它实现了Runnable
,所以这是一个task,之后一定会有一个线程来调用它。
既然实现了Runnable
,所以我们先看重写的run
方法:
// Thread function
@Override
public void run() {
try {
boolean pendingMessages = true;
while(pendingMessages) {
pendingMessages = processPendingMessages();
}
while(!stopThread) {
processOngoingMessages();
}
}catch(Exception e) {
e.printStackTrace();
}
}
它这里进入了一个死循环来执行这行代码:
processPendingMessages()
// This is helpful during the startup. It helps the consumer
// to catch up with the messages that it has not read so far
private boolean processPendingMessages() throws Exception{
boolean pendingMessages = true;
List<StreamMessage<String, String>> msgList = getNextMessageList();
if(msgList.size()!=0) {
processMessageList(msgList);
}else {
System.out.println("Done processing pending messages");
pendingMessages = false;
}
return pendingMessages;
}
这个方法里又有:
List<StreamMessage<String, String>> msgList = getNextMessageList();
// Non-blocking call
private List<StreamMessage<String, String>> getNextMessageList() throws Exception{
return commands.xreadgroup(
Consumer.from(groupId, consumerId),
XReadArgs.Builder.count(1),
XReadArgs.StreamOffset.from(streamId, "0"));
}
好了,我们终于找到redis命令行级别的代码了。
这行代码的意思就是从twitterstream
中读取一条数据。
读过来之后通过List<StreamMessage<String, String>> msgList = getNextMessageList();
将其装在StreamMessage
这个对象中。
然后通过processMessageList(msgList);
来处理数据:
// processes the message and reports back to Redis Stream with XACK
private void processMessageList(List<StreamMessage<String, String>> msgList) {
if(msgList.size()> 0) {
Iterator itr = msgList.iterator();
while(itr.hasNext()) {
StreamMessage<String, String> message =
(StreamMessage<String, String>)itr.next();
Map<String, String> body = message.getBody();
String msgId = message.getId();
System.out.println("message id----->" + msgId);
System.out.println("message body---->" + body);
Iterator keyItr = body.keySet().iterator();
while(keyItr.hasNext()) {
String key = (String)keyItr.next();
String value = (String)body.get(key);
try {
messageProcessor.processMessage(value);
commands.xack(streamId, groupId, msgId);
}catch(Exception e) {
System.out.println(e.getMessage());
}
}
}
}
}
首先它会拿到该StreamMessage
的id和body:
message id----->1591944946001-0
body:
这个body都是kv值。每个key都是tweet
。
然后取出body的value值(就是上图的json串),通过messageProcessor.processMessage(value);
处理。
这里我们又要介绍消息处理的类:
/**
* MessageProcessor type declares a method, processMessage. This
* data type is passed on to the StreamConsumer object. StreamConsumer
* calls the processMessage method for every data item in the stream.
* You should provide your own implementation of how to process the data.
*
* In our example, InfluencerMessageProcessor implements MessageProcessor
*/
public interface MessageProcessor{
public void processMessage(String message) throws Exception;
}
/**
* This is a message processor object that reads the twitter stream,
* collects influencer information, and stores it back in Redis.
*
*/
public class InfluencerMessageProcessor implements MessageProcessor{
LettuceConnection connection = null;
RedisCommands<String, String> commands = null;
// Factory method
public synchronized static InfluencerMessageProcessor getInstance() throws Exception{
InfluencerMessageProcessor processor = new InfluencerMessageProcessor();
processor.init();
return processor;
}
// Suppress instantiation outside the factory method
private InfluencerMessageProcessor() {
}
// Initialize Redis connections
private void init() throws Exception{
connection = LettuceConnection.getInstance();
commands = connection.getRedisCommands();
}
@Override
public void processMessage(String message) throws Exception {
try {
JsonParser jsonParser = new JsonParser();
JsonElement jsonElement = jsonParser.parse(message);
JsonObject jsonObject = jsonElement.getAsJsonObject();
JsonObject userObject = jsonObject.get("user").getAsJsonObject();
JsonElement followerCountElm = userObject.get("followers_count");
// 10,000 is just an arbitrary number. We are marking any handle with
// more than 10,000 followers as an influencer.
if (followerCountElm != null && followerCountElm.getAsDouble() > 10000) {
String name = userObject.get("name").getAsString();
String screenName = userObject.get("screen_name").getAsString();
int followerCount = userObject.get("followers_count").getAsInt();
int friendCount = userObject.get("friends_count").getAsInt();
HashMap<String, String> map = new HashMap<String, String>();
map.put("name", name);
map.put("screen_name", screenName);
if (userObject.get("location") != null) {
map.put("location", userObject.get("location").getAsString());
}
map.put("followers_count", Integer.toString(followerCount));
map.put("friendCount", Integer.toString(friendCount));
// Lettuce commands that store influencer information in Redis
commands.zadd("influencers", followerCount, screenName);
commands.hmset("influencer:" + screenName, map);
// Remove this line if you don't want to read the data
System.out.println(userObject.get("screen_name").getAsString() + "| Followers:"
+ userObject.get("followers_count").getAsString());
}
} catch (Exception e) {
System.out.println("ERROR: " + e.getMessage());
}
}
}
处理的话,我们要user这一项:
如果订阅者超过10000,那就算是名人了。
然后通过
// Lettuce commands that store influencer information in Redis
commands.zadd("influencers", followerCount, screenName);
commands.hmset("influencer:" + screenName, map);
存储每个人以及每个人的具体信息。
最后大概得到这么一个结构:
通过StreamConsumer
中的
while(pendingMessages) {
pendingMessages = processPendingMessages();
}
我们按照上面的逻辑处理每条数据。
直到再也取不到数据:
if(msgList.size()!=0) {
processMessageList(msgList);
}else {
System.out.println("Done processing pending messages");
pendingMessages = false;
}
pendingMessages = false;
会使run
方法中的死循环跳出。
然后进入另一个死循环:
while(!stopThread) {
processOngoingMessages();
}
// Read messages at runtime
private void processOngoingMessages() throws Exception{
List<StreamMessage<String, String>> msgList = getNextMessageListBlocking();
if(msgList.size()!=0) {
processMessageList(msgList);
}else {
System.out.println("******Group: "+groupId+" waiting. No new message*****");
}
}
// Blocking call; blocks for 5 seconds
private List<StreamMessage<String, String>> getNextMessageListBlocking() throws Exception{
return commands.xreadgroup(
Consumer.from(groupId, consumerId),
XReadArgs.Builder.count(1).block(Duration.ofSeconds(5)),
XReadArgs.StreamOffset.lastConsumed(streamId));
}
这里的逻辑是处理新增的数据。每隔5秒尝试取一次。
启动consumer
我们用一个类来启动上面的task:
/**
* This is the main consumer class. It does the following:
* a. Initiates a StreamConsumer object to read data from the Redis Stream named
* "twitterstream", consumer group called "influencer" and consumer "a"
* b. Starts a StreamConsumer in a separate thread
* c. Reads only new messages
*
*/
public class InfluencerCollectorMain{
public static void main(String[] args) throws Exception{
StreamConsumer influencerStreamGroupReader = null;
try {
InfluencerMessageProcessor imProcessor = InfluencerMessageProcessor.getInstance();
/*
* Redis Stream name = twitterstream (InitializeConsumerGroup.STREAM_ID)
* Consumer group = influencer (InitializeConsumerGroup.GROUP_ID)
* Consumer = a
* Message processor = InfluncerMessageProccessor object
*/
influencerStreamGroupReader = new StreamConsumer(InitializeConsumerGroup.STREAM_ID,InitializeConsumerGroup.GROUP_ID,"a",
StreamConsumer.READ_NEW, imProcessor);
Thread t = new Thread((Runnable) influencerStreamGroupReader);
t.start();
}catch(Exception e) {
e.printStackTrace();
}
}
}
这就完工了。