saiku在公司内部推广开来之后,得到了产品、运营、策略等同事一致好评,由此带来了一大波需求,就连打点、日志数据的需求都来了,笔者觉得这种数据不适合用来做多维分析,但是迫于公司内部压力,又不得不做,只好硬头皮上了;日志类型的数据每天大概有几个亿条,对kylin的压力太大(目前kylin集群只有3台),由此只好限制用户查询数据时间的跨度;在大数据量的情况下,估计网友也会有这种情况的,由此笔者把自己的思路与实现方式分享出来,或许有其它更好的实现方式,望网友不吝赐教;
方案选择
经过分析saiku的代码,结合笔者使用saiku的经验,笔者针对这个问题想到了3个方向:
1.直接在MDX语句上加入限制条件 ;
2.在MDX转换后的SQL上加上限制条件;
3.直接加入日期维度限制;
经过笔者持续研究,最终选择了方案3,;说下放弃方案1、2的理由:方案1,由于对MDX语法不太熟悉,而将前端的参数传到后台之后,转换出来的MDX语句,比较灵活多变,这时就需要判断多种情况了,此时便有可能不会兼顾所有的case;方案2与方案1有点类似,其实更甚,在SQL上添加限制条件其实就是在where后面加入and 【字段】的条件,经过mondrian之后转换出的sql可能会有多条,而且SQL的结构有可能不同,在where后面加入条件就需要考虑更多的case了(至于详细的分析过程,笔者就不在赘述了);
解决过程
方案3的处理思想很简单:用户在页面中选择指标和维度时,如果选用了【日期】维度,就限制该维度的跨度,若没有选用,则加入该维度,并限制查询日期为今日;详细的实现逻辑是:用户点击维度,添加查询维度时,会自动添加到【列】上,当然用户也可以自己拖拽到【行】或者【过滤】上,在这3条件的处理方式是不同的需要分别处理;思路有了就直接上代码,saiku处理和执行查询的逻辑的代码在Query2Resource.java中,执行的接口为:execute(),代码如下:
@POST
@Consumes({"application/json" })
@Path("/execute")
public QueryResult execute(ThinQuery tq) {
//限制查询时间区间
tq = this.restrictQueryByDate(tq);
try {
if (thinQueryService.isMdxDrillthrough(tq)) {
Long start = (new Date()).getTime();
ResultSet rs = thinQueryService.drillthrough(tq);
QueryResult rsc = RestUtil.convert(rs);
rsc.setQuery(tq);
Long runtime = (new Date()).getTime()- start;
rsc.setRuntime(runtime.intValue());
return rsc;
}
QueryResult qr = RestUtil.convert(thinQueryService.execute(tq));
ThinQuery tqAfter = thinQueryService.getContext(tq.getName()).getOlapQuery();
qr.setQuery(tqAfter);
return qr;
}
catch (Exception e) {
log.error("Cannot execute query (" + tq + ")",e);
String error = ExceptionUtils.getRootCauseMessage(e);
return new QueryResult(error);
}
}
/***
* 该方法为限制在查询中的日期:若不限制日期,则选取当天日期加以限制
* zst create 20160802
* */
private ThinQuery restrictQueryByDate(ThinQuery tq) {
ThinQueryModel queryModel = tq.getQueryModel();
Map<AxisLocation, ThinAxis> axesMap = queryModel.getAxes();
NamedList<ThinHierarchy> namedList = new NamedListImpl<ThinHierarchy>();
ThinAxis filterAxis = axesMap.get(AxisLocation.FILTER);
List<ThinHierarchy> filterHie = filterAxis.getHierarchies();
namedList = this.resetThinHierachy(filterHie);
//将修改后的Row重新set到queryModel
if(namedList.size() > 0) {
ThinAxis newFilterAxis = new ThinAxis(
AxisLocation.FILTER,
namedList,
filterAxis.isNonEmpty(),
filterAxis.getAggregators()
);
axesMap.put(AxisLocation.FILTER,newFilterAxis);
}
//将修改后的Row重新set到queryModel
if(namedList.size() == 0) {
ThinAxis rowAxis = axesMap.get(AxisLocation.ROWS);
List<ThinHierarchy> rowHie = rowAxis.getHierarchies();
namedList = this.resetThinHierachy(rowHie);
if(namedList.size() > 0) {
ThinAxis newRowsAxis = new ThinAxis(
AxisLocation.ROWS,
namedList,
rowAxis.isNonEmpty(),
rowAxis.getAggregators()
);
axesMap.put(AxisLocation.ROWS,newRowsAxis);
}
}
//若rows中不包含[日期]维度,则判断culowns中是否包含;
//若columns中包含[日期]维度,且没有限制时间区间,则限制为当天;不包含[日期]维度,则强制添加[日期]维度,并限制时间为当天;
if(namedList.size() == 0) {
// namedList.clear();
ThinAxis colAxis = axesMap.get(AxisLocation.COLUMNS);
List<ThinHierarchy> colHie = colAxis.getHierarchies();
namedList = this.resetThinHierachy(colHie);
if(namedList.size() == 0) {
//若list为空,则说明columns中不包含日期维度,将colHie加入list中,并强制添加日期维度
namedList.addAll(colHie);
SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd");
String yesterday = format.format(new Date(new Date().getTime() - 24 * 60 * 60 * 1000));
String newDateMdx = "[日期].[日期].["+yesterday+"]";
ThinMember thinMember = new ThinMember(yesterday,newDateMdx,yesterday);
ThinHierarchy thinHie = new ThinHierarchy();
thinHie.setName("[日期].[日期]");
List<ThinMember> thinMemberList = new ArrayList<ThinMember>();
thinMemberList.add(thinMember);
ThinSelection selection = new ThinSelection();
selection.setMembers(thinMemberList);
selection.setType(ThinSelection.Type.INCLUSION);
ThinLevel thinLevel = new ThinLevel(yesterday,yesterday,selection,null);
// thinLevel.setSelection(selection);
Map<String,ThinLevel> mapLevel = new LinkedHashMap<String,ThinLevel>();
mapLevel.put("日期", thinLevel);
thinHie.setLevels(mapLevel);
namedList.add(thinHie);
ThinAxis newColAxis = new ThinAxis(
AxisLocation.COLUMNS,
namedList,
colAxis.isNonEmpty(),
colAxis.getAggregators()
);
axesMap.put(AxisLocation.COLUMNS,newColAxis);
}
}
return tq;
}
private NamedList<ThinHierarchy> resetThinHierachy(List<ThinHierarchy> hieList) {
NamedList<ThinHierarchy> namedList = new NamedListImpl<ThinHierarchy>();
boolean flag = false;
for(ThinHierarchy hie : hieList) {
if(hie.getName().equals("[日期].[日期]")) {
if(hie.getLevels().get("日期").getSelection() == null) {
SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd");
String yesterday = format.format(new Date(new Date().getTime() - 24 * 60 * 60 * 1000));
String newDateMdx = "[日期].[日期].["+yesterday+"]";
ThinMember thinMember = new ThinMember(null,newDateMdx,yesterday);
List<ThinMember> thinMemberList = new ArrayList<ThinMember>();
thinMemberList.add(thinMember);
ThinSelection selection = new ThinSelection();
selection.setMembers(thinMemberList);
hie.getLevels().get("日期").setSelection(selection);
}
flag = true;
}
namedList.add(hie);
}
if(flag) return namedList;
namedList.clear();
return namedList;
}
总结
按照上述方法,实现了笔者的需求:并没有加入【日期】维度,但是saiku默认将查询日期限制为当日,如下图:
需要特别强调的一点是:在配置schema时,必须要单独加入【日期】维度配置