从ES里取数
一、设置每次抽取条数
1.获取ES索引数据量
获取数据的方法与我之间写过的kettle从ES中抽数的文章中是一样的,只是本次抽数是为了获取索引的总条数,即total字段
1. 设置请求头参数
2. 获取数据(因为是之间对ES请求,所以返回的数据未被封装,数据结构即为原始的数据结构)
3. 根据总条数设置页数
代码如下:
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException {
if (first) {
first = false;
/* TODO: Your code here. (Using info fields)
FieldHelper infoField = get(Fields.Info, "info_field_name");
RowSet infoStream = findInfoRowSet("info_stream_tag");
Object[] infoRow = null;
int infoRowCount = 0;
// Read all rows from info step before calling getRow() method, which returns first row from any
// input rowset. As rowMeta for info and input steps varies getRow() can lead to errors.
while((infoRow = getRowFrom(infoStream)) != null){
// do something with info data
infoRowCount++;
}
*/
}
Object[] r = getRow();
if (r == null) {
setOutputDone();
return false;
}
// It is always safest to call createOutputRow() to ensure that your output row's Object[] is large
// enough to handle any new fields you are creating in this step.
r = createOutputRow(r, data.outputRowMeta.size());
double num = get(Fields.In, "total").getNumber(r);
int pageSize = 10000;
int pages = (int)num/pageSize +1; //计算总页数
//生成页码,并输出
for(int i=0;i<31;i++){
//个人觉得r类似于输出器,如果想将每个页码都输出去,则必须独立进行声明,此步骤为本人测试所得
r = createOutputRow(r, data.outputRowMeta.size());
get(Fields.Out, "PAGE").setValue(r, "{\"size\":"+pageSize+",\"from\":"+i*pageSize+"}");
//get(Fields.Out, "PAGE").setValue(r, "http://32.1.0.238:9096/api/resource/query/standard_address2/"+i+"/100"); //将页码赋值给PAGE
//get(Fields.Out, "Content-Type").setValue(r,"application/json; charset=UTF-8");
//get(Fields.Out, "username").setValue(r,"spp_ent");
//get(Fields.Out, "key_token").setValue(r,"bcb38cc2-927d-4927-8076-7a7d4c2a972b");
//get(Fields.Out, "json").setValue(r,"{}");
putRow(data.outputRowMeta, r);
}
return true;
}
4. 添加复制记录到结果
步骤
二、动态抽取