帖子博客等资源点击量缓存杀手级解决方案-CSDN博客

标题党了 :lol:

关于点击量几年前发过帖子[url]http://www.iteye.com/topic/171240[/url]
现在看来太简单了而且问题多多

最近有琢磨出了一套新的方案

[b]进入正题[/b]

关于帖子点击量，通常的办法是缓存在内存，然后等到合适的时机写入数据库，一般是设置一个阈值，到达后更新数据库
这种方式主要面临如下几个问题：
1 有些帖子永远到达不了阈值怎么办？如阈值为10，但到9后再也没有人点击了
2 阈值设置多大合适？太大了服务器当机会丢失大量数据，太小了没啥意义
3 每个帖子到达阈值后都要访问数据库，能不能合并起来只访问一次DB

[color=red]采用阈值方式是被动的，应该用主动的方式来解决问题[/color]
主动方式的思路如下：cache ---》文件 ---》DB
[color=blue]1 cache中保存两个点击量，我们称之为todayHits和yestodayHits[/color]
todayHits保存资源当天的点击量
yestodayHits保存昨天的点击量

[color=blue]2 定时把cache中发生变化的数据导出到文件，未发生变化的删除[/color]
[color=blue]3 把导出的文件导入到数据库。多数DB都提供命令. mysql 为 load data local infile[/color]
此方法是把文件中的数据追加到表尾，这样会导致一个资源对应多个点击量的问题，我们用导入时间获取最新的点击量
[color=blue]4 删除过期数据。每次追加后会使原来部分数据变得无意义，需要清理掉[/color]
主动方式会定时扫描cache中数据

[b]空说太抽象直接上代码[/b]
不重要的方法省略
设计Cahce的key和value
HitKey 封装了资源id和资源类型

public final class HitKey implements Serializable {
	private Integer id;
	private HitType type;
//setter getter ...
}

/**
 * 保存资源的点击量
 * @author xuliangyong
 * 2009-7-12
 */
public class HitValue implements Serializable {
	/**
	 * 今天的总点击量
	 */
	private Integer todayHits;
	/**
	 * 昨天的总点击量
	 */
	private Integer yestodayHits;

	public HitValue(){}

	private HitValue(Integer todayHits, Integer yestodayHits){
		this.todayHits = todayHits;
		this.yestodayHits = yestodayHits;
	}

	/**
	 * 工厂方法
	 */
	public static HitValue valueOf(Integer todayHits, Integer yestodayHits){
		return new HitValue(todayHits, yestodayHits);
	}

	/**
	 * 增加点击次数
	 * @param hit 点击次数
	 * @return 返回总点击次数
	 */
	public void addHits(Integer hit){
		if(todayHits == null){
			todayHits = new Integer(0);
		}
		todayHits += hit;
	}

	/**
	 * 点击次数加1
	 * @return 返回总点击次数
	 */
	public void addHits(){
		addHits(1);
	}

	/**
	 * 把昨天点击量与今天点击量同步。
	 * 此方法通常在写完日志文件后调用
	 */
	public void synchronize(){
		yestodayHits = todayHits;
	}

	/**
	 * 测试点击量是否变化。
	 */
	public boolean isChanged(){
		return yestodayHits != todayHits;
	}

}

用一个Map做cache，可更换成第三方缓存，最好是有region概念的缓存

public class HitsFacade {

	private static final Map<HitKey, HitValue>  HITS_CACHE = Collections.synchronizedMap(new HashMap<HitKey, HitValue>());

	private HitsManager hitsManager;

	/**
	 * 获取资源点击量.
	 * 1 从cache读 
	 * 2 从持久存储读
	 */
	public Integer get(HitKey hitKey){
		HitValue hitValue = HITS_CACHE.get(hitKey);

		if(hitValue == null){
			Integer hits = getHits(hitKey);
			hitValue = HitValue.valueOf(hits, hits);
			HITS_CACHE.put(hitKey, hitValue);
		}

		return hitValue.getTodayHits();
	}

	/**
	 * 增加1次点击量
	 * 用法：
	 * hitsFacade.add( HitKey.valueOf(blogId, HitType.BLOG) );
	 * @param hitKey
	 */
	public void add(HitKey hitKey){
		add(hitKey, 1);
	}

	/**
	 * 增加点击量
	 * 用法：
	 *  hitsFacade.add( HitKey.valueOf(blogId, HitType.BLOG), 10 );
	 * @param hits 增加的次数
	 */
	public void add(HitKey hitKey, Integer hits){
		HitValue hitValue = HITS_CACHE.get(hitKey);
		if(hitValue == null){
			get(hitKey);
			hitValue = HITS_CACHE.get(hitKey);
		}

		hitValue.addHits(hits);
		HITS_CACHE.put(hitKey, hitValue);
	}

	/**
	 * 从持久存储加载点击量。
	 * 为避免并发导致多次访问持久存储，故加synchronized关键字
	 */
	//TODO 并发如何处理？？
	protected synchronized Integer getHits(HitKey hitKey) {
		return hitsManager.getHits(hitKey);
	}
}

至此cache代码处理完毕

接下来处理cache ---》文件


	/**
	 * 把hits cache中的数据导出到日志文件
	 */
	public File exportHitsCacheToLog() throws IOException{
		Map<HitKey, HitValue> hitsCache = hitsFacade.getCache();
		Iterator<HitKey> hitKeyIterator = hitsCache.keySet().iterator();

		//创建文件
		File logFile = createFile();
		BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter(logFile));

		while (hitKeyIterator.hasNext()) {
			HitKey hitKey = hitKeyIterator.next();
			HitValue hitValue = hitsCache.get(hitKey);

			if( !hitValue.isChanged() ){
				hitKeyIterator.remove();
			}else{
				StringBuilder sb = new StringBuilder();
				sb.append(hitKey.getId()).append("\t")
					.append(hitKey.getType()).append("\t")
					.append(hitValue.getTodayHits()).append("\t")
					.append(DateFormatUtils.format(new Date(), "yyyy-MM-dd HH:mm:ss"))
					.append("\n");
				bufferedWriter.write(sb.toString());

				hitValue.synchronize();
				hitsCache.put(hitKey, hitValue);
			}
		}

		//关闭文件
		bufferedWriter.flush();
		bufferedWriter.close();
		return logFile;
	}

此方法是核心尤其是这几句

if( !hitValue.isChanged() ){
//如果今天的数据相比昨天无变化则删除
				hitKeyIterator.remove();
			}else{
				StringBuilder sb = new StringBuilder();
				sb.append(hitKey.getId()).append("\t")
					.append(hitKey.getType()).append("\t")
					.append(hitValue.getTodayHits()).append("\t")
					.append(DateFormatUtils.format(new Date(), "yyyy-MM-dd HH:mm:ss"))
					.append("\n");
				bufferedWriter.write(sb.toString());
				//有今天的数据相比昨天发生变化则写入文件，并把昨天的数据与今天同步
				hitValue.synchronize();
				hitsCache.put(hitKey, hitValue);
			}

再接下来文件 ----》DB
为什么要从文件导入DB而不直接从cache写入DB呢请参考[url]http://xuliangyong.iteye.com/admin/blogs/424921[/url]

importDB("load data local infile '" + path + "' into table " + tableName);

最后为清理过期的无效数据，此过程可另找时间清理不必与上述步骤同步进行

该缓存方案已初步完成，随着项目的变化也会做相应调整
到底性能如何能否应付海量数据还有待检验

[b]2009-09-20 补充[/b]
最后一步清理过期数据采用了新的方法
使用

load data local infile ... replace into ...

这样会自动覆盖掉旧点击次数，也就无需清理无效数据了

帖子 博客等资源点击量缓存杀手级解决方案

帖子博客等资源点击量缓存杀手级解决方案