简介
维度表,作为数据仓库里面的概念,是维度属性的集合,比如时间维、地点维;可以是一个mysql或者cassandra,redis等存储,甚至是自己定义的一些api。
流表是kafka等流式数据。
根据流表join维表的字段去异步查询维表。
举个例子
流表:kafka id1,id2,id3三列
维表:mysql id,age,name
sql:select id1,id2,id3,age,name from kafka join mysql on id1=id;
join的结果就是: id1,id2,id3,age,name 流表的字段加上mysql维表的字段。
流表这边提供id1,给到维表,维表那边执行的sql是select * from mysql where id=id1
实战
流表:文本数据csv包含uid、phone
维表:Elasticsearch数据包含uid、username
需要把流表和维表的数据进行join,形成uid、username、phone
第一步从文本获取流数据
public class Test {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStreamSource<String> dataStreamSource = env.readTextFile("/mytextFile.txt");
SingleOutputStreamOperator<Tuple5<String, String, String, String, String>> map = dataStreamSource.map(new MapFunction<String, Tuple2<String, String>>() {
@Override
public Tuple5<String, String, String, String, String> map(String s) throws Exception {
String[] splits = s.split("\t");
String uid = splits[0];
String phone = splits[1];
return new Tuple2<>(uid, phone);
}
});
//SingleOutputStreamOperator<Tuple5<String, Set<String>, Set<String>, Set<String>, Set<String>>> renyuanku = AsyncDataStream.unorderedWait(map, new AsyncEsDataRequest(), 2, TimeUnit.SECONDS, 100);
//renyuanku.writeAsText("E:/test/renyuanku.txt").setParallelism(1);
env.execute("Test");
}
}
异步从Elasticsearch获取数据
public class AsyncEsDataRequest extends RichAsyncFunction<Tuple2<String, String>, Tuple3<String, String, String>> {
private transient RestHighLevelClient restHighLevelClient;
@Override
public void open(Configuration parameters) throws Exception {
HttpHost httpHost = new HttpHost("swarm-manager", 9200, "http");
//初始化ElasticSearch-Client
restHighLevelClient = new RestHighLevelClient(RestClient.builder(httpHost));
}
@Override
public void close() throws Exception {
restHighLevelClient.close();
}
@Override
public void asyncInvoke(Tuple2<String, String> input, ResultFuture<Tuple3<String, String, String>> resultFuture) throws Exception {
search(input, resultFuture);
}
//异步去读Es表
private void search(Tuple2<String, String> input, ResultFuture<Tuple3<String, String, String>> resultFuture) {
SearchRequest searchRequest = new SearchRequest("renyuanku");
String uid = input.f0;
QueryBuilder builder = QueryBuilders.boolQuery().must(QueryBuilders.termQuery("uid", uid));
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(builder);
searchRequest.source(sourceBuilder);
ActionListener<SearchResponse> listener = new ActionListener<SearchResponse>() {
String uid = input.f1;
String phone = input.f2;
//成功
@Override
public void onResponse(SearchResponse searchResponse) {
SearchHit[] searchHits = searchResponse.getHits().getHits();
if (searchHits.length > 0) {
JSONObject jsonObject = JSONObject.parseObject(searchHits[0].getSourceAsString());
String username = jsonObject.getString("username");
}
resultFuture.complete(Collections.singleton(Tuple5.of(uid, username, phone)));
}
//失败
@Override
public void onFailure(Exception e) {
System.out.println(e.getMessage());
resultFuture.complete(Collections.singleton(Tuple5.of(uid, username, phone));*/
}
};
restHighLevelClient.searchAsync(searchRequest, listener);
}
}
连接这两个流,并将结果输出到文件
SingleOutputStreamOperator<Tuple5<String, Set<String>, Set<String>, Set<String>, Set<String>>> renyuanku = AsyncDataStream.unorderedWait(map, new AsyncEsDataRequest(), 2, TimeUnit.SECONDS, 100);
renyuanku.writeAsText("E:/test/renyuanku.txt").setParallelism(1);
这样就将这两个流进行合并了。