cassandra随机获取数据_从Cassandra读取数据以在Flink中进行处理

I have to process data streams from Kafka using Flink as the streaming engine. To do the analysis on the data, I need to query some tables in Cassandra. What is the best way to do this? I have been looking for examples in Scala for such cases. But I couldn't find any.How can data from Cassandra be read in Flink using Scala as the programming language?

Read & write data into cassandra using apache flink Java API has another question on the same lines. It has multiple approaches mentioned in the answers. I would like to know what is the best approach in my case. Also, most of the examples available are in Java. I am looking for Scala examples.

解决方案

I currently read from cassandra using asyncIO in flink 1.3. Here is the documentation on it:

https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/asyncio.html (where it has DatabaseClient, you will use the com.datastax.drive.core.Cluster instead)

Let me know if you need a more in depth example for using it to read from cassandra specifically, but I unfortunately can only provide an example in java.

EDIT 1

Here is an example of the code I am using for reading from Cassandra with flink's Async I/O. I am still working on identifying and fixing an issue where for some reason (without going deep into it) for large amounts of data being returned by a single query, the async data stream's timeout is triggered even though it looks to be returned fine by Cassandra and well before the timeout time. But assuming that is just a bug with other stuff I am doing and not because of this code, this should work fine for you (and has worked fine for months for me as well):

public class GenericCassandraReader extends RichAsyncFunction {

private final Properties props;

private Session client;

public GenericCassandraReader(Properties props) {

super();

this.props = props;

}

@Override

public void open(Configuration parameters) throws Exception {

client = Cluster.builder()

.addContactPoint(props.cassandraUrl)

.withPort(props.cassandraPort)

.build()

.connect(props.cassandraKeyspace);

}

@Override

public void close() throws Exception {

client.close();

}

@Override

public void asyncInvoke(final CustomInputObject customInputObject, final AsyncCollector asyncCollector) throws Exception {

String queryString = "select * from table where fieldToFilterBy='" + customInputObject.id() + "';";

ListenableFuture resultSetFuture = client.executeAsync(queryString);

Futures.addCallback(resultSetFuture, new FutureCallback() {

public void onSuccess(ResultSet resultSet) {

asyncCollector.collect(Collections.singleton(resultSet));

}

public void onFailure(Throwable t) {

asyncCollector.collect(t);

}

});

}

}

Again, sorry for the delay. Was hoping to have the bug resolved so I could be certain, but figured at this point just having some reference would be better than nothing.

EDIT 2

So we came to finally determine that the issue isn't with the code, but with the network throughput. Lot of bytes trying to come through a pipe that isn't large enough to handle it, stuff starts backing up, some start trickling in but (thanks to datastax cassandra driver's QueryLogger we could see this) the time it took to receive the result of each query started climbing to 4 seconds, then 6, then 8 and so on.

TL;DR, code is fine, just be aware that if you experience timeoutExceptions from Flink's asyncWaitOperator, it could be a network issue.

Edit 2.5

Also realized that it might be beneficial to mention that because of the network latency issue, we ended up moving to using a RichMapFunction that holds the data we were reading from cassandra in state. So the job just keeps track of all the records that come through it instead of having to read from the table each time a new record comes through to get all that are in there.

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值