头歌Trident State API

m0_62103032

已于 2023-11-23 11:30:57 修改

阅读量90

点赞数

文章标签： storm 分布式 java

于 2023-11-23 11:20:46 首次发布

本文链接：https://blog.csdn.net/m0_62103032/article/details/134572852

版权

任务描述

本关任务：使用 Storm Trident 完成与数据库的交互。

相关知识

1.调用Trident API 2.State接口 3.StateFactory工厂接口 4.QueryFunction接口 5.UpdateState接口 6.MapState接口

调用 Trident API

Trident State API 在内部为我们实现了所有状态管理的逻辑，我们不需要再进行诸如对比 txid ，在数据库中存储多个值等操作，仅需要简单调用 Trident API 即可，例如：

TridentTopology topology = new TridentTopology();
TridentState wordCounts =
topology.newStream("spout1", spout)
.each(new Fields("sentence"), new Split(), new Fields("word"))
.groupBy(new Fields("word"))
.persistentAggregate(MemcachedState.opaque(serverLocations), new Count(), new Fields("count"))
.parallelismHint(6);

所有的管理 Opaque transactional states 状态的逻辑都在 MemcachedState.opaque() 方法内部实现了。所有的更新操作都是以 batch 为单位的，这样减少了对数据库的调用次数，极大的提高了效率。

State接口

基本 State interface （状态接口）只有两种方法:

public interface State {
void beginCommit(Long txid);
void commit(Long txid);
}

假设我们有一个 Location 数据库，我们要通过 Trident 查新和更新这个数据库，那么我们可以自己实现这样一个 LocationDB State ，因为我们需要查询和更新，所以我们为这个 LocationDB 可以添加对 Location 的 get 和 set 的实现：

public class LocationDB implements State {
public void beginCommit(Long txid) {
}
public void commit(Long txid) {
}
public void setLocation(long userId, String location) {
// code to access database and set location
}
public String getLocation(long userId) {
// code to get location from database
}
}

StateFactory 工厂接口

Trident 提供了 State Factory 接口，我们实现了这个接口之后， Trident 就可以通过这个接口获得具体的 Trident State 实例了，下面我们就实现一个可以制造 LocationDB 实例的 LocationDBFactory ：

public class LocationDBFactory implements StateFactory {
public State makeState(Map conf, int partitionIndex, int numPartitions) {
return new LocationDB();
}
}

QueryFunction 接口

这个接口是用来帮助 Trident 查询一个 State ，这个接口定义了两个方法：

public interface QueryFunction<S extends State, T> extends EachOperation {
List<T> batchRetrieve(S state, List<TridentTuple> args);
void execute(TridentTuple tuple, T result, TridentCollector collector);
}

接口的第一个方法 batchRetrieve() 有两个参数，分别是要查询的 State 源和查询参数，因为 trident 都是以 batch 为单位处理的，所以这个查询参数是一个List<TridentTuple>集合。

关于第二个方法 execute() 有三个参数，第一个代表查询参数中的某个 tuple ，第二个代表这个查询参数 tuple 对应的查询结果，第三个则是一个消息发送器。下面就看一个 QuaryLocation 的实例：

public class QueryLocation extends BaseQueryFunction<LocationDB, String> {
public List<String> batchRetrieve(LocationDB state, List<TridentTuple> inputs) {
List<String> ret = new ArrayList();
for(TridentTuple input: inputs) {
ret.add(state.getLocation(input.getLong(0)));
}
return ret;
}
public void execute(TridentTuple tuple, String location, TridentCollector collector) {
collector.emit(new Values(location));
}
}

QueryLocation 接收到 Trident 发送的查询参数，参数是一个 batch ， batch 中 tuple 内容是 userId 信息，然后 batchRetrieve() 方法负责从 State 源中获取每个 userId 对应的的 location 。

最终 batchRetrieve() 查询的结果会被 execute() 方法发送出去。

但这里有个问题， batchRetrieve() 方法中针对每个 userid 都做了一次查询 State 操作，这样处理显然效率不高，也不符合 Trident 所有操作都是针对 batch 的原则。

所以，我们要对 LocationDB 这个 State 做一下改造，提供一个 bulkGetLocations() 方法来替换掉 getLocation() 方法，请看改造后的 LocationDB 的实现：

public class LocationDB implements State {
public void beginCommit(Long txid) {
}
public void commit(Long txid) {
}
public void setLocationsBulk(List<Long> userIds, List<String> locations) {
// set locations in bulk
}
public List<String> bulkGetLocations(List<Long> userIds) {
// get locations in bulk
}
}

改造的 LocationDB 对 Location 的查询和更新都是批量操作的，这样显然可以提高处理效率。此时，我们再稍微改一下 QueryFunction 中的 batchRetrieve() 方法：

public class QueryLocation extends BaseQueryFunction<LocationDB, String> {
public List<String> batchRetrieve(LocationDB state, List<TridentTuple> inputs) {
List<Long> userIds = new ArrayList<Long>();
for(TridentTuple input: inputs) {
userIds.add(input.getLong(0));
}
return state.bulkGetLocations(userIds);
}
public void execute(TridentTuple tuple, String location, TridentCollector collector) {
collector.emit(new Values(location));
}
}

QueryLocation 在 topology 中可以这么使用：

TridentTopology topology = new TridentTopology();
topology.newStream("myspout", spout)
.stateQuery(locations, new Fields("userid"), new QueryLocation(), new Fields("location"))

UpdateState 接口

当我们要更新一个 State 源时，我们需要实现一个 UpdateState 接口。 UpdateState 接口只提供了一个方法：

public interface StateUpdater<S extends State> extends Operation {
void updateState(S state, List<TridentTuple> tuples, TridentCollector collector);
}

我们来具体看一下 LocationUpdater 的实现：

public class LocationUpdater extends BaseStateUpdater<LocationDB> {
public void updateState(LocationDB state, List<TridentTuple> tuples, TridentCollector collector) {
List<Long> ids = new ArrayList<Long>();
List<String> locations = new ArrayList<String>();
for(TridentTuple t: tuples) {
ids.add(t.getLong(0));
locations.add(t.getString(1));
}
state.setLocationsBulk(ids, locations);
}
}

对于 LocationUpdater 在 topology 中可以这么使用：

TridentTopology topology = new TridentTopology();
TridentState locations =
topology.newStream("locations", locationsSpout)

通过调用 Trident Stream 的 partitionPersist 方法可以更新一个 State 。

在上面这个实例中， LocationUpdater 接收一个 State 和要更新的 batch ，最终通过调用 LocationFactory 制造的 LocationDB 中的 setLocationsBulk() 方法把 batch 中的 userid 及其 location 批量更新到 State 中。

partitionPersist 操作会返回一个 TridentState 对象，这个对象即是被 TridentTopology 更新后的 LocationDB ，所以，我们可以在 topology 中续继续对这个返回的 State 做查询操作。

另外一点需要注意的是，从上面 StateUpdater 接口可以看出，在它的 updateState() 方法中还提供了一个 TridentCollector ，因此在执行 StateUpdate 的同时仍然可以形成一个新的 Stream 。

若要操作 StateUpdater 形成的 Stream ，可以通过调用 TridentState 。 newValueStream() 方法实现。

MapState 接口

Trident 另一个 update state 的方法是persistentAggregate ，请看下面 word count 的例子：

TridentTopology topology = new TridentTopology();
TridentState wordCounts =
topology.newStream("spout1", spout)
.each(new Fields("sentence"), new Split(), new Fields("word"))
.groupBy(new Fields("word"))
.persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("count"))

persistentAggregate 是在 partitionPersist 之上的另一个抽象，它会对 Trident Stream 进行聚合之后再把聚合结果更新到 State 中。

在上面这个例子中，因为聚合的是一个 groupedStream ， Trident 要求这种情况下 State 需要实现 MapState 接口，被 grouped 的字段会被做为 MapSate 的 key ，被 grouped 的数据计算的结果会被做为 MapSate 的 value 。 MapSate 接口定义如下：

public interface MapState<T> extends State {
List<T> multiGet(List<List<Object>> keys);
List<T> multiUpdate(List<List<Object>> keys, List<ValueUpdater> updaters);
void multiPut(List<List<Object>> keys, List<T> vals);
}

Snapshottable 接口

如果我们聚合的不是一个 groupedStream ， Trident 要求我们的 State 实现 Snapshottable 接口：

public interface Snapshottable<T> extends State {
T get();
T update(ValueUpdater updater);
void set(T o);
}

编程要求

根据提示，在右侧编辑器补充代码，使用 Storm Trident 完成与数据库的交互。

测试说明

平台会对你编写的代码进行测试：

输入内容：

the cow
the man
four score
many apples

输出内容：

the
success0
cow
success1
the
success2
man
success3
four
success0
score
success1
many
success2
apples
success3

代码如下：

import java.util.ArrayList;

import java.util.List;

import org.apache.storm.Config;

import org.apache.storm.LocalCluster;

import org.apache.storm.generated.StormTopology;

import org.apache.storm.trident.TridentTopology;

import org.apache.storm.trident.operation.BaseFunction;

import org.apache.storm.trident.operation.TridentCollector;

import org.apache.storm.trident.testing.FixedBatchSpout;

import org.apache.storm.trident.tuple.TridentTuple;

import org.apache.storm.tuple.Fields;

import org.apache.storm.tuple.Values;

import org.apache.storm.trident.state.BaseQueryFunction;

import org.apache.storm.trident.state.State;

import java.util.Map;

import org.apache.storm.task.IMetricsContext;

import org.apache.storm.trident.state.StateFactory;



public class StateTopology {

    public static void main(String[] agrs){

        FixedBatchSpout spout = new FixedBatchSpout(

                new Fields("sentence"), 2,

                new Values("the cow"),

                new Values("the man"),

                new Values("four score"),

                new Values("many apples"));

        spout.setCycle(false);

        TridentTopology topology = new TridentTopology();

 //****请根据提示补全Topology程序****//

        /*********begin*********/

        //newStream 方法从输入源中读取数据, 并在 topology 中创建一个新的数据流 spout

        topology.newStream("spout",spout)

        //使用.each()方法，sentence tuple经过split()方法后输出word tuple

        .each(new Fields("sentence"),new Split(),new Fields("word"))

                //Stream.stateQuery(TridentState state, Fields inputFields, QueryFunction function, Fields functionFields)方法主要是根据输入从持久化存储系统中读取相应的数据并将其当作一个数据流供Strom Topology使用。

                //使用 .newStaticState()方法创建了一个外部数据库,

        .stateQuery(topology.newStaticState(new TestStateFactory()),new Fields("word"), new TestQueryLocation(), new Fields("test"));  

                //.stateQuery()将topology.newStaticState(new TestStateFactory())映射到new Fields("word") word字段

                //将new TestQueryLocation()映射到 new Fields("test") test字段 上    





        /*********end*********/

        StormTopology stormTopology = topology.build();

        LocalCluster cluster = new LocalCluster();

        Config conf = new Config();

        cluster.submitTopology("test", conf,stormTopology);

    }



    public static class Split extends BaseFunction {

        public void execute(TridentTuple tuple, TridentCollector collector) {

            String sentence = tuple.getString(0);

            for(String word: sentence.split(" ")) {

                collector.emit(new Values(word));

            }

        }

    }

    public static class TestState implements State{

        @Override

        public void beginCommit(Long arg0) {

            // TODO Auto-generated method stub

        }

        @Override

        public void commit(Long arg0) {

            // TODO Auto-generated method stub

        }

        public String getDBOption(int i){

            return "success"+i;

        }

    }

    public static class TestStateFactory implements StateFactory{



        @Override

        public State makeState(Map arg0, IMetricsContext arg1, int arg2, int arg3) {

            // TODO Auto-generated method stub

            return new TestState();

        }



    }



    public static class TestQueryLocation extends BaseQueryFunction<TestState, String>{

        @Override

        public List<String> batchRetrieve(TestState state, List<TridentTuple> arg1) {

            List<String> list = new ArrayList<String>();

            for(int i = 0 ; i< arg1.size() ; i++){

                list.add(state.getDBOption(i));

            }

            return list;

        }

        @Override

        public void execute(TridentTuple arg0, String arg1, TridentCollector arg2) {

            System.out.println(arg0.getString(0));

            System.out.println(arg1);

        }

    }

}

m0_62103032

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
头歌Trident State API

假设我们有一个 Location 数据库，我们要通过 Trident 查新和更新这个数据库，那么我们可以自己实现这样一个 LocationDB State ，因为我们需要查询和更新，所以我们为这个 LocationDB 可以添加对 Location 的。接口，我们实现了这个接口之后， Trident 就可以通过这个接口获得具体的 Trident State 实例了，下面我们就实现一个可以制造 LocationDB 实例的。此时，我们再稍微改一下。
复制链接

扫一扫