读代码-TopKStringPatterns

package org.apache.mahout.fpm.pfpgrowth.convertors.string;
public final class TopKStringPatterns implements Writable

用于存储pattern,进行merge找到top k的pattern

核心,pair链表,每个pair由pattern构成的string链表和long型support值组成。
即 <pattern,值> 链表

private final List<Pair<List<String>,Long>> frequentPatterns;



读写。格式 链表总长 pattern长 support值 pattern所含字段

@Override
public void readFields(DataInput in) throws IOException {
frequentPatterns.clear();
int length = in.readInt();
for (int i = 0; i < length; i++) {
List<String> items = new ArrayList<String>();
int itemsetLength = in.readInt();
long support = in.readLong();
for (int j = 0; j < itemsetLength; j++) {
items.add(in.readUTF());
}
frequentPatterns.add(new Pair<List<String>,Long>(items, support));
}
}

@Override
public void write(DataOutput out) throws IOException {
out.writeInt(frequentPatterns.size());
for (Pair<List<String>,Long> pattern : frequentPatterns) {
out.writeInt(pattern.getFirst().size());
out.writeLong(pattern.getSecond());
for (String item : pattern.getFirst()) {
out.writeUTF(item);
}
}
}



merge操作,将两个TopKStringPatterns 合并生成新TopKStringPatterns
两个TopKStringPatterns 各取一个进行先support,再pattern长,再pattern比较
赢者进入新TopKStringPatterns ,输者跟下一个比较。
总体效果跟堆类似。

[color=red]疑问:二者预先经过了降序排序???[/color]

public TopKStringPatterns merge(TopKStringPatterns pattern, int heapSize) {
List<Pair<List<String>,Long>> patterns = new ArrayList<Pair<List<String>,Long>>();
Iterator<Pair<List<String>,Long>> myIterator = frequentPatterns.iterator();
Iterator<Pair<List<String>,Long>> otherIterator = pattern.iterator();
Pair<List<String>,Long> myItem = null;
Pair<List<String>,Long> otherItem = null;
for (int i = 0; i < heapSize; i++) {
if (myItem == null && myIterator.hasNext()) {
myItem = myIterator.next();
}
if (otherItem == null && otherIterator.hasNext()) {
otherItem = otherIterator.next();
}
if (myItem != null && otherItem != null) {
int cmp = myItem.getSecond().compareTo(otherItem.getSecond());
if (cmp == 0) {
cmp = myItem.getFirst().size() - otherItem.getFirst().size();
if (cmp == 0) {
for (int j = 0; j < myItem.getFirst().size(); j++) {
cmp = myItem.getFirst().get(j).compareTo(
otherItem.getFirst().get(j));
if (cmp != 0) {
break;
}
}
}
}
if (cmp <= 0) {
patterns.add(otherItem);
if (cmp == 0) {
myItem = null;
}
otherItem = null;
} else if (cmp > 0) {
patterns.add(myItem);
myItem = null;
}
} else if (myItem != null) {
patterns.add(myItem);
myItem = null;
} else if (otherItem != null) {
patterns.add(otherItem);
otherItem = null;
} else {
break;
}
}
return new TopKStringPatterns(patterns);
}
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值