redis stream的应用

介绍

我们来介绍一个使用redis stream的实战项目。

我们要从twitter里拿到名人的数据,然后将其分类存储。

所以项目包含两个端点:Twitter ingest streamTwitter influencer classifier

这两端分别是数据的读取和数据的消化。

这两端使用的都是stream的数据类型。

使用stream的好处:

数据的生产和消费是异步的;消费端在生产者产生数据之前会一直等待。


数据的生命周期:

这就是stream的一个group消费数据的过程。

上图所说的safety指的是XACK命令。这将保证数据不丢失。


消化好的数据,我们要进行分类存储,我们会使用sorted sethash两种数据类型:

sorted set存储名人,然后每个名人的信息用hash存储。


环境和准备

所有的类:

项目用maven管理。需要引进的依赖:


	<!-- https://mvnrepository.com/artifact/com.pubnub/pubnub-gson -->
	<dependency>
		<groupId>com.pubnub</groupId>
		<artifactId>pubnub-gson</artifactId>
		<version>4.19.0</version>
	</dependency>



	<!-- https://mvnrepository.com/artifact/io.lettuce/lettuce-core -->
	<dependency>
		<groupId>io.lettuce</groupId>
		<artifactId>lettuce-core</artifactId>
		<version>5.2.2.RELEASE</version>
	</dependency>

一个redis客户端,一个操纵pubnub的api。

twitter的数据是从pubnub上拿取的。

首先你要注册一个pubnub账号。

然后,你需要找到twitter-channel这个通道,因为我们会在代码中订阅它:

它会给你订阅的key:

它还给出了js的示例代码,我们用java写也是一样的。

1. Load the PubNub javascript SDK:
 
<script src="https://cdn.pubnub.com/sdk/javascript/pubnub.4.3.2.min.js"></script>
 
2. Init, Listen, and Subscribe!
 
var pubnub = new PubNub({
subscribe_key: 'sub-c-78806dd4-42a6-11e4-aed8-02ee2ddab7fe'
});
pubnub.addListener({
message: function(message) {
console.log(message.message); }
});
pubnub.subscribe({
channels: ['pubnub-twitter']
});

准备好consumer group

首先你要和redis服务端连接:

/**
 * This is a wrapper class around Lettuce library.
 *

 */
public class LettuceConnection{

	private RedisClient client = null;
	private StatefulRedisConnection<String, String> connection = null;

	private LettuceConnection() {

	}

	public synchronized static LettuceConnection getInstance() throws Exception{
		LettuceConnection lettuceConnection = new LettuceConnection();
		lettuceConnection.init();
		return lettuceConnection;
	}

	private void init() throws Exception{
		try {
			// Make sure to change the URL if it is different in your case
			client = RedisClient.create("redis://hostname:port");
			connection = client.connect();
		}catch(Exception e) {
			e.printStackTrace();
			throw e;
		}
	}

	public StatefulRedisConnection<String, String> getRedisConnection() throws Exception{
		if(connection == null) {
			this.init();
		}
		return connection;
	}

	public RedisCommands<String, String> getRedisCommands() throws Exception{
		if(connection == null) {
			this.init();
		}

		return connection.sync();
	}

	public void close() throws Exception{
		if(connection != null) {
			connection.close();
		}

		if(client != null) {
			client.shutdown();
		}
	}

}

注意将hostnameport换成你自己的。

接下来我们要准备消费数据的stream以及连着它的group了:

/**
 * Redis Stream, in general, doesn't require initialization. In our demo,
 * we show how you could use a consumer group to read the data. Redis
 * does not allow you to create a consumer group with an empty Redis Stream.
 * Therefore, we add a line of dummy data to the stream and create a consumer
 * group.
 *
 * IMPORTANT: Run this program only once before running other programs.
 *
 */

public class InitializeConsumerGroup{

	public static final String STREAM_ID = "twitterstream";
	public static final String GROUP_ID = "influencer";


	private static LettuceConnection connection = null;
	private static RedisCommands<String, String> commands = null;

	public static void main(String[] args) throws Exception{

		connection = LettuceConnection.getInstance();;
		commands = connection.getRedisCommands();

		initStream();
		initGroup();

	}

	private static void initStream() throws Exception{
		String type = commands.type(STREAM_ID);

		if(type != null && !type.equals("stream")) {
			commands.del(STREAM_ID);
			addRawData();
		}

		if(type == null){
			addRawData();
		}
	}

	private static void addRawData() throws Exception{
		HashMap<String, String> map = new HashMap<String, String>();
		map.put("start", "stream");
		commands.xadd(STREAM_ID, map);
	}

	private static void initGroup() {
		try {
			commands.xgroupCreate(XReadArgs.StreamOffset.latest(STREAM_ID), GROUP_ID);
		}catch(Exception e) {
			System.out.println(e.getMessage());
		}
	}

}

因为空的stream不能创建group,所以我们在stream中加了一条数据。

运行main方法,此时我们在redis-server中就有twitterstream这个key了:


生产数据

/**
 * IngestStream class allows you to write data to a Redis Stream.
 * You can run this class to test whether you have the right version
 * of Redis that supports Redis Stream. Typically you extend IngestStream
 * to provide your own implementation. For example, TwitterIngestStream
 * extends IngestStream.
 *
 */
public class IngestStream{

	protected String streamId = null;

	protected LettuceConnection connection = null;
	protected RedisCommands<String, String> commands = null;

	// Hide the constructor and force external objects to instantiate
	// via the factory method
	protected IngestStream() {

	}

	// Factory method to instantiate the object. This method instantiates the object
	// and creates the connection to the Redis database
	public synchronized static IngestStream getInstance(String streamId) throws Exception{
		IngestStream ingestStream = new IngestStream();
		ingestStream.streamId = streamId;
		ingestStream.init();
		return ingestStream;
	}

	// Initializes the Lettuce library
	protected void init() throws Exception{
		connection = LettuceConnection.getInstance();
		commands =  connection.getRedisCommands();
	}

	// Adds the key-value pair as the stream data
	// In Redis Stream, you could pass multiple key-value pairs
	// for a single data object. For simplicity, we will save one
	// object per line.
	public void add(String key, String message) throws Exception{
		commands.xadd(streamId, key, message);
	}


	// Use this for testing only
	public static void main(String[] args) throws Exception{
		IngestStream ingest = IngestStream.getInstance("mystream");

		for(int i=20; i<30; i++) {
			ingest.add("k"+i, "v"+i);
		}
	}

}

这是一个生产数据的通用类,你可以自己实现如何生产数据(继承这个类)。

运行main方法测试:

这证明了我们的确可以用它来向一个stream中add数据。


现在我们要连上pubnub来获取数据了:

/**
 * This is the main producer class. When you run this program, it collects
 * Twitter data from the PubNub channel and adds them to the Redis Stream
 *
 */
public class TwitterIngestStream extends IngestStream{

	// Follow instructions on PubNub to get your own key
	final static String SUB_KEY_TWITTER = "sub-c-78806dd4-42a6-11e4-aed8-02ee2ddab7fe"; // Change the key
	final static String CHANNEL_TWITTER = "pubnub-twitter";

	// Factory method
	public synchronized static TwitterIngestStream getInstance(String streamId) throws Exception{
		TwitterIngestStream ingestStream = new TwitterIngestStream();
		ingestStream.streamId = streamId;
		ingestStream.init();
		return ingestStream;
	}

	// Making the constructor private to force creating new objects through the factory method
	private TwitterIngestStream() {

	}

	// The main method
	public static void main(String[] args) throws Exception{

		TwitterIngestStream twitterIngestStream = TwitterIngestStream.getInstance(InitializeConsumerGroup.STREAM_ID);
		twitterIngestStream.start();
	}

	// Following PubNub's example
	public void start() throws Exception{
		final TwitterIngestStream ingestStream = this;
		PNConfiguration pnConfig = new PNConfiguration();
		pnConfig.setSubscribeKey(SUB_KEY_TWITTER);
		pnConfig.setSecure(false);

		PubNub pubnub = new PubNub(pnConfig);

		pubnub.subscribe().channels(Arrays.asList(CHANNEL_TWITTER)).execute();


		// PubNub event callback
		SubscribeCallback subscribeCallback = new SubscribeCallback() {
			@Override
			public void status(PubNub pubnub, PNStatus status) {
				if (status.getCategory() == PNStatusCategory.PNUnexpectedDisconnectCategory) {
					// internet got lost, do some magic and call reconnect when ready
					pubnub.reconnect();
				} else if (status.getCategory() == PNStatusCategory.PNTimeoutCategory) {
					// do some magic and call reconnect when ready
					pubnub.reconnect();
				} else {
					System.out.println(status.toString());
				}
			}

			// Receive the message and add to the RedisStream
			@Override
			public void message(PubNub pubnub, PNMessageResult message) {
				try{
					JsonObject json = message.getMessage().getAsJsonObject();

					// Delete this line if you don't need this log
					System.out.println(json.toString());

					// Each line or data entry of a Redis Stream is a collection of key-value pairs
					// For simplicity, we store only one key-value pair per line. "tweet" is the key
					// for each line. Note, that it's not the entry id, because Redis Streams
					// autogenerates the entry id.
					//
					// Example of a Redis Stream:
					// twitterstream
					//		1837847490983-0 tweet {....}
					//		1837847490984-0 tweet {....}
					//		1837847490986-0 tweet {....}
					//		1837847490987-0 tweet {....}
					ingestStream.add("tweet", json.toString());
				}catch(Exception e){
					e.printStackTrace();
				}


			}

			@Override
			public void presence(PubNub pubnub, PNPresenceEventResult presence) {
			}
		};

		// Add callback as a listener (PubNub code)
		pubnub.addListener(subscribeCallback);

	}




}

这个SUB_KEY_TWITTER来源于pubnub的twitter stream。

运行main方法,此时控制台会一直打印获取到的消息,每条消息的结构是这样的:

{
    "created_at":"Fri Jun 12 06:55:40 +0000 2020",
    "id":1271335340767355000,
    "id_str":"1271335340767354880",
    "text":"#leeminhessa",
    "source":"<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>",
    "truncated":false,
    "in_reply_to_status_id":null,
    "in_reply_to_status_id_str":null,
    "in_reply_to_user_id":null,
    "in_reply_to_user_id_str":null,
    "in_reply_to_screen_name":null,
    "user":Object{...},
    "geo":null,
    "coordinates":null,
    "place":Object{...},
    "contributors":null,
    "quoted_status_id":1271329242563768300,
    "quoted_status_id_str":"1271329242563768321",
    "quoted_status":Object{...},
    "quoted_status_permalink":{
        "url":"https://t.co/Cat3CE9r7g",
        "expanded":"https://twitter.com/ActorLeeMinHo/status/1271329242563768321",
        "display":"twitter.com/ActorLeeMinHo/…"
    },
    "is_quote_status":true,
    "quote_count":0,
    "reply_count":0,
    "retweet_count":0,
    "favorite_count":0,
    "entities":Object{...},
    "favorited":false,
    "retweeted":false,
    "filter_level":"low",
    "lang":"und",
    "timestamp_ms":"1591944940164"
}

这就是所谓的message,我们会在消费端解析存储。

redis-server中的情况:

每条数据的id是随机生成的,然后数据的key都是tweet,message是一个json串。


消费数据

消费端很麻烦,因为你还要处理数据存起来。

/**
 * This is the consumer class that reads the data from RedisStream.
 * In our example, InfluencerCollectorMain initiates StreamConsumer
 * and starts it as a separate thread. The thread waits for a new
 * message via a blocking call. It expires every 5 seconds and
 * rechecks for a new message.
 *
 */
public class StreamConsumer implements Runnable{

	public static final String READ_FROM_START = "0";
	public static final String READ_NEW = "$";

	String streamId = null;
	String groupId = null;
	String consumerId = null;
	String readFrom = READ_NEW;
	MessageProcessor messageProcessor = null;

	LettuceConnection connection = null;
	RedisCommands<String, String> commands = null;

	public StreamConsumer(String streamId, String groupId, String consumerId,
			String readFrom, MessageProcessor messageProcessor) throws Exception{
		this.streamId = streamId;
		this.groupId = groupId;
		this.consumerId = consumerId;
		this.readFrom = readFrom;
		this.messageProcessor = messageProcessor;

		connection = LettuceConnection.getInstance();
		commands =  connection.getRedisCommands();

	}

	public void readStream() throws Exception{

		boolean reachedEndOfTheStream = false;
		while(!reachedEndOfTheStream) {
			List<StreamMessage<String, String>> msgList = getNextMessageList();

			if(msgList.size()==0) {
				reachedEndOfTheStream = true;
			}else {
				processMessageList(msgList);
			}
		}

	}

	// Non-blocking call
	private List<StreamMessage<String, String>> getNextMessageList() throws Exception{
		return commands.xreadgroup(
				Consumer.from(groupId, consumerId),
				XReadArgs.Builder.count(1),
				XReadArgs.StreamOffset.from(streamId, "0"));
	}


	// Blocking call; blocks for 5 seconds
	private List<StreamMessage<String, String>> getNextMessageListBlocking() throws Exception{
		return commands.xreadgroup(
				Consumer.from(groupId, consumerId),
				XReadArgs.Builder.count(1).block(Duration.ofSeconds(5)),
				XReadArgs.StreamOffset.lastConsumed(streamId));

	}


	// processes the message and reports back to Redis Stream with XACK
	private void processMessageList(List<StreamMessage<String, String>> msgList) {

		if(msgList.size()> 0) {
			Iterator itr = msgList.iterator();
			while(itr.hasNext()) {
				StreamMessage<String, String> message =
						(StreamMessage<String, String>)itr.next();

				Map<String, String> body = message.getBody();
				String msgId = message.getId();
				System.out.println("message id----->" + msgId);
				System.out.println("message body---->" + body);
				Iterator keyItr = body.keySet().iterator();
				while(keyItr.hasNext()) {
					String key = (String)keyItr.next();
					String value = (String)body.get(key);
					try {
						messageProcessor.processMessage(value);
						commands.xack(streamId, groupId, msgId);
					}catch(Exception e) {
						System.out.println(e.getMessage());
					}
				}

			}
		}
	}

	private boolean stopThread = false;

	public void close() throws Exception{
		stopThread = true;
		if(connection != null) {
			connection.close();
		}
	}

	// This is helpful during the startup. It helps the consumer
	// to catch up with the messages that it has not read so far
	private boolean processPendingMessages() throws Exception{

		boolean pendingMessages = true;

		List<StreamMessage<String, String>> msgList = getNextMessageList();

		if(msgList.size()!=0) {
			processMessageList(msgList);
		}else {
			System.out.println("Done processing pending messages");
			pendingMessages = false;
		}

		return pendingMessages;
	}

	// Read messages at runtime
	private void processOngoingMessages() throws Exception{
		List<StreamMessage<String, String>> msgList = getNextMessageListBlocking();

		if(msgList.size()!=0) {
			processMessageList(msgList);
		}else {
			System.out.println("******Group: "+groupId+" waiting. No new message*****");
		}
	}

	// Thread function
	@Override
	public void run() {
		try {
			boolean pendingMessages = true;
			while(pendingMessages) {
				pendingMessages = processPendingMessages();
			}

			while(!stopThread) {
				processOngoingMessages();
			}
		}catch(Exception e) {
			e.printStackTrace();
		}
	}
}





这个StreamConsumer代码很多(我将其分割成两部分展示)。

我们肢解它看。

首先,它实现了Runnable,所以这是一个task,之后一定会有一个线程来调用它。

既然实现了Runnable,所以我们先看重写的run方法:

// Thread function
	@Override
	public void run() {
		try {
			boolean pendingMessages = true;
			while(pendingMessages) {
				pendingMessages = processPendingMessages();
			}

			while(!stopThread) {
				processOngoingMessages();
			}
		}catch(Exception e) {
			e.printStackTrace();
		}
	}

它这里进入了一个死循环来执行这行代码:

processPendingMessages()
	// This is helpful during the startup. It helps the consumer
	// to catch up with the messages that it has not read so far
	private boolean processPendingMessages() throws Exception{

		boolean pendingMessages = true;

		List<StreamMessage<String, String>> msgList = getNextMessageList();

		if(msgList.size()!=0) {
			processMessageList(msgList);
		}else {
			System.out.println("Done processing pending messages");
			pendingMessages = false;
		}

		return pendingMessages;
	}

这个方法里又有:

List<StreamMessage<String, String>> msgList = getNextMessageList();

	// Non-blocking call
	private List<StreamMessage<String, String>> getNextMessageList() throws Exception{
		return commands.xreadgroup(
				Consumer.from(groupId, consumerId),
				XReadArgs.Builder.count(1),
				XReadArgs.StreamOffset.from(streamId, "0"));
	}

好了,我们终于找到redis命令行级别的代码了。

这行代码的意思就是从twitterstream中读取一条数据。

读过来之后通过List<StreamMessage<String, String>> msgList = getNextMessageList();将其装在StreamMessage这个对象中。

然后通过processMessageList(msgList);来处理数据:

	// processes the message and reports back to Redis Stream with XACK
	private void processMessageList(List<StreamMessage<String, String>> msgList) {

		if(msgList.size()> 0) {
			Iterator itr = msgList.iterator();
			while(itr.hasNext()) {
				StreamMessage<String, String> message =
						(StreamMessage<String, String>)itr.next();

				Map<String, String> body = message.getBody();
				String msgId = message.getId();
				System.out.println("message id----->" + msgId);
				System.out.println("message body---->" + body);
				Iterator keyItr = body.keySet().iterator();
				while(keyItr.hasNext()) {
					String key = (String)keyItr.next();
					String value = (String)body.get(key);
					try {
						messageProcessor.processMessage(value);
						commands.xack(streamId, groupId, msgId);
					}catch(Exception e) {
						System.out.println(e.getMessage());
					}
				}

			}
		}
	}

首先它会拿到该StreamMessage的id和body:

message id----->1591944946001-0

body:

这个body都是kv值。每个key都是tweet

然后取出body的value值(就是上图的json串),通过messageProcessor.processMessage(value);处理。


这里我们又要介绍消息处理的类:

/**
 * MessageProcessor type declares a method, processMessage. This
 * data type is passed on to the StreamConsumer object. StreamConsumer
 * calls the processMessage method for every data item in the stream.
 * You should provide your own implementation of how to process the data.
 *
 * In our example, InfluencerMessageProcessor implements MessageProcessor
 */
public interface MessageProcessor{

	public void processMessage(String message) throws Exception;

}
/**
 * This is a message processor object that reads the twitter stream,
 * collects influencer information, and stores it back in Redis.
 *
 */
public class InfluencerMessageProcessor implements MessageProcessor{

	LettuceConnection connection = null;
	RedisCommands<String, String> commands = null;

	// Factory method
	public synchronized static InfluencerMessageProcessor getInstance() throws Exception{
		InfluencerMessageProcessor processor = new InfluencerMessageProcessor();
		processor.init();
		return processor;
	}

	// Suppress instantiation outside the factory method
	private InfluencerMessageProcessor() {

	}

	// Initialize Redis connections
	private void init() throws Exception{
		connection = LettuceConnection.getInstance();
		commands =  connection.getRedisCommands();
	}


	@Override
	public void processMessage(String message) throws Exception {
		try {
			JsonParser jsonParser = new JsonParser();
			JsonElement jsonElement = jsonParser.parse(message);
			JsonObject jsonObject = jsonElement.getAsJsonObject();
			JsonObject userObject = jsonObject.get("user").getAsJsonObject();

			JsonElement followerCountElm = userObject.get("followers_count");

			// 10,000 is just an arbitrary number. We are marking any handle with
			// more than 10,000 followers as an influencer.
			if (followerCountElm != null && followerCountElm.getAsDouble() > 10000) {
				String name = userObject.get("name").getAsString();
				String screenName = userObject.get("screen_name").getAsString();
				int followerCount = userObject.get("followers_count").getAsInt();
				int friendCount = userObject.get("friends_count").getAsInt();

				HashMap<String, String> map = new HashMap<String, String>();
				map.put("name", name);
				map.put("screen_name", screenName);
				if (userObject.get("location") != null) {
					map.put("location", userObject.get("location").getAsString());
				}
				map.put("followers_count", Integer.toString(followerCount));
				map.put("friendCount", Integer.toString(friendCount));


				// Lettuce commands that store influencer information in Redis
				commands.zadd("influencers", followerCount, screenName);
				commands.hmset("influencer:" + screenName, map);

				// Remove this line if you don't want to read the data
				System.out.println(userObject.get("screen_name").getAsString() + "| Followers:"
						+ userObject.get("followers_count").getAsString());
			}

		} catch (Exception e) {
			System.out.println("ERROR: " + e.getMessage());
		}
	}
}

处理的话,我们要user这一项:

如果订阅者超过10000,那就算是名人了。

然后通过

// Lettuce commands that store influencer information in Redis
				commands.zadd("influencers", followerCount, screenName);
				commands.hmset("influencer:" + screenName, map);

存储每个人以及每个人的具体信息。

最后大概得到这么一个结构:

通过StreamConsumer中的

	while(pendingMessages) {
				pendingMessages = processPendingMessages();
			}

我们按照上面的逻辑处理每条数据。

直到再也取不到数据:

if(msgList.size()!=0) {
			processMessageList(msgList);
		}else {
			System.out.println("Done processing pending messages");
			pendingMessages = false;
		}
pendingMessages = false;

会使run方法中的死循环跳出。

然后进入另一个死循环:

while(!stopThread) {
				processOngoingMessages();
			}
// Read messages at runtime
	private void processOngoingMessages() throws Exception{
		List<StreamMessage<String, String>> msgList = getNextMessageListBlocking();

		if(msgList.size()!=0) {
			processMessageList(msgList);
		}else {
			System.out.println("******Group: "+groupId+" waiting. No new message*****");
		}
	}
// Blocking call; blocks for 5 seconds
	private List<StreamMessage<String, String>> getNextMessageListBlocking() throws Exception{
		return commands.xreadgroup(
				Consumer.from(groupId, consumerId),
				XReadArgs.Builder.count(1).block(Duration.ofSeconds(5)),
				XReadArgs.StreamOffset.lastConsumed(streamId));

	}

这里的逻辑是处理新增的数据。每隔5秒尝试取一次。


启动consumer

我们用一个类来启动上面的task:

/**
 * This is the main consumer class. It does the following:
 * a. Initiates a StreamConsumer object to read data from the Redis Stream named
 *    "twitterstream", consumer group called "influencer" and consumer "a"
 * b. Starts a StreamConsumer in a separate thread
 * c. Reads only new messages
 *
 */
public class InfluencerCollectorMain{

	public static void main(String[] args) throws Exception{
		StreamConsumer influencerStreamGroupReader  = null;

		try {
			InfluencerMessageProcessor imProcessor = InfluencerMessageProcessor.getInstance();
			/*
			 * Redis Stream name = twitterstream (InitializeConsumerGroup.STREAM_ID)
			 * Consumer group = influencer (InitializeConsumerGroup.GROUP_ID)
			 * Consumer = a
			 * Message processor = InfluncerMessageProccessor object
			 */
			influencerStreamGroupReader = new StreamConsumer(InitializeConsumerGroup.STREAM_ID,InitializeConsumerGroup.GROUP_ID,"a",
					StreamConsumer.READ_NEW, imProcessor);
			Thread t = new Thread((Runnable) influencerStreamGroupReader);
			t.start();
		}catch(Exception e) {
			e.printStackTrace();
		}
	}

}

这就完工了。

  • 2
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值