访问量定时程序新方案

最新推荐文章于 2024-03-25 20:12:40 发布

figo_2009

最新推荐文章于 2024-03-25 20:12:40 发布

阅读量108

点赞数

分类专栏：公司项目总结文章标签： Hadoop Mapreduce Apache Cache Blog

本文链接：https://blog.csdn.net/figo_2009/article/details/83875294

版权

公司项目总结专栏收录该内容

4 篇文章 0 订阅

订阅专栏

http://guafei.iteye.com/blog/938094
这个是之前的方案，结果遇到了一些问题。现在改变方案，不打算把访问量入库（因为数据量太大，是在太慢了）。新方案：把cache服务器中的访问量放入isearch的xml。
数据源1：萝卜cache服务器的数据，格式：key（MD5，city_storeId）+访问量
数据源2：店铺主表的信息。格式：key（city_storeId）,storeId。

cache服务器分析：线上共有2台cache服务器，两台分别有一个从服务，并且存的是对方的访问量数据。访问量数据是实时更新的，并且是以一个文件的形式存在，我们只要得到这个文件，对他操作就可以，处理方式1：关闭从服务，拷贝数据到isearch主服务器，开启从服务。处理方式2：直接从从服务中拷贝数据。前者的优点是：数据相对完整，但是有宕机的风险，后者刚好相反。考虑到稳定的原因，采用方案2

拷贝cache服务器的数据之后，和store主表的信息放在一块，进行mapredure操作。
附上代码：
package com.koubei.store.fullbuild;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

import com.alibaba.asc.ajoin.Utilities;
import com.koubei.store.fullbuild.common.IsearchDbInputFormat;
import com.koubei.store.isearch.common.IsearchUtil;
import com.koubei.store.isearch.common.Md5;

/**
* 获取萝卜的数据，只要去跑他给我的那个c文件，但是在跑之前要先把从节点给关掉现在的方案是：不关闭从服务，直接copy数据
*
* @author guafei.wgf
*/
public class VisitCountMapper extends BaseDbMapper implements Tool {
private static final String NAME = "VisitCountMapper";

/**
* 输入1和输入2的数据要放在同一个目录下输入1：id,key,store_id 输入2:
* memcache,key(MD5),value(score) 第一个字段为了判断是否是从memcache过来的数据输出1：key(MD5) id
* 针对输入1的输出输出2: key(MD5) score 针对输入2的输出
*/
//（还有需要把MD5的jar包去掉，光放一个方法在isearch-common工程中）。
public static class InnerMapper extends
Mapper<LongWritable, Text, Text, Text> {
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
Text[] values = Utilities.textToTexts(value, line_separator);
for (Text sData : values) {
Text[] data = Utilities.textToTexts(sData, filed_separator);
if (data != null && data.length >= 3) {
String id = IsearchUtil.getString(data[0].toString());
// 对两个不同的输入加标示，如果问memcache，那么就是从输入2过来的
if ("memcache".equals(id)) {
context.write(data[1], data[2]);
} else {
String city = IsearchUtil.getString(data[6].toString());
String d_key = city + "_" + id;
// TODO 这个key还需MD5一下
d_key = new String(Md5.digest(d_key));
context.write(new Text(d_key), new Text(id));
}
}
}
}
}

/**
* 输入：mapper的输出（输出1和输出2）输出：id score
*/
public static class InnerReducer extends
Reducer<Text, Text, NullWritable, Text> {
public void reduce(Text key, Iterable<Text> texts, Context context)
throws IOException, InterruptedException {
StringBuffer result = null;
String id = null;
String score = null;
for (Text text : texts) {
if (text.toString().length() == 32) {
id = text.toString();
} else if (IsearchUtil.isNum(text.toString())) {
score = text.toString();
}
if (id != null && score != null) {
result = new StringBuffer(id.toString() + filed_separator
+ score.toString() + line_separator);
context.write(NullWritable.get(), new Text(result
.toString()));
id = null;
score = null;
}
}
}
}

public int run(String[] args) throws Exception {
Job job = getJob(args, NAME);
if (job == null) {
return printUsage(NAME);
}
// 主要用来切割文件
job.setInputFormatClass(IsearchDbInputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(InnerMapper.class);
job.setReducerClass(InnerReducer.class);
job.setJarByClass(VisitCountMapper.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);

return 0;
}

public static void main(String[] args) throws Exception {
int errCode = ToolRunner.run(new Configuration(),
new VisitCountMapper(), args);
System.exit(errCode);
}
}

在写shell的时候，不能随便用空格，特别是在定义变量的时候，如果用空格，它会报错syntax error near unexpected token 。
通过mapredurce跑出来的数据，现在还需经过join处理，并且要修改/store-ajoin-conf.xml配置文件，几个类文件，因为total访问量本来在isearch的xml中就是有的。
store-ajoin-conf.xml需要改的地方是：增加一张table表（但不是真正从数据库里存在的表），select_fields里添加属性：score.total，as里添加属性visits（因为这个值Fileds.java中已经有配了）

IsearchXMLFileUDP类中的process方法把IsearchStoreVO的map赋值。
IsearchDbReader类实现对文件的分割操作。

之前在我本机跑mapredurce程序的时候，经常会出现内存溢出的问题。
IsearchDbInputFormat来对文件进行切割，buffer最大设置成了64M，其实这个buffer大小对于本机来说如果全部读入就会出现内存溢出，而我们map，redurce一些程序是，切割文件是按key相同的，读入切割，所以很少有可能key相同的内容有64M，而这次是因为我们在萝卜的那匹数据（98M）面前加入memcache前缀，所以他的那匹数据就会以64M读入，导致内存溢出。
还有萝卜的这批数据是以\0\b分割的，其实没有什么必要，这批数据可以用hadoop自带的分割符进行分割（\0\b），原因是这批数据中不可能有\n这样的字符，而我们店铺的一些信息则是会有这些信息，所以只能改代码，进行\b分割。