在debug hive的问题时,我们经常需要在源码里面增加必要的日志信息,来跟踪变量的变化或者是方法执行的情况,性能等。。hive里面有下面几个日志的实现方式。
1种是常用的通过
org.apache.commons.logging.Log
和
org.apache.commons.logging.LogFactory
类实现的。
比如要打印类的info信息时,只需要下面这样就可以。
static final private Log LOG = LogFactory.getLog(CLASS_NAME)
LOG.info("xxxxxx");
如果要跟踪一个方法的执行时间,只需要在在方法调用前后各加一条日志信息记录当前时间戳,然后取差值即可。
另一种是LogHelper
org.apache.hadoop.hive.ql.session.SessionState.LogHelper
LogHelper类是SessionState的内部类
输出日志最终调用的是LOG.xxx和对应日志输出的println方法(比如System.out.println/log4j等)
但是日志只有两个级别,info和error。
以printInfo方法为例,具体实现:
public void printInfo(String info, String detail) {
if (!getIsSilent()) {
getInfoStream().println(info);
}
LOG.info(info + StringUtils.defaultString(detail)); //调用LOG.info(即第一种方法)
}
使用的sample:
static final private LogHelper console = new LogHelper (LOG);
console.printError("Authorization failed:" + authExp.getMessage() + ". Use SHOW GRANT to get more details." );
还有一种重要的类是PerfLogger类,可以实现记录方法执行时间的信息,在一些情况下可以取代第一种记录性能的方法。
PerfLogger类主要是用在Driver类的compile方法中,用来记录各个step使用的时间。
看下具体实现:
首先声明存储starttime和endtime的map:
protected final Map<String, Long> startTimes = new HashMap<String, Long>();
protected final Map<String, Long> endTimes = new HashMap<String, Long>();
主要方法:
1).构造方法可以传入一个boolean参数,控制是否reset(默认是false)
protected static final ThreadLocal<PerfLogger> perfLogger = new ThreadLocal<PerfLogger>();
public static PerfLogger getPerfLogger(boolean resetPerfLogger) {
if (SessionState.get() == null) {
if (perfLogger.get() == null || resetPerfLogger) {
perfLogger.set(new PerfLogger());
}
return perfLogger.get();
} else {
return SessionState.get().getPerfLogger(resetPerfLogger);
}
}
2).PerfLogBegin 用来记录开始时间,主要是使用了LOG.info(第一种记录日志的方法)
public void PerfLogBegin(String callerName, String method) {
long startTime = System.currentTimeMillis();
LOG.info( "<PERFLOG method=" + method + " from=" + callerName + ">");
startTimes.put(method, new Long(startTime)); //将方法和对应的startTime放在startTimes的map中
}
3).相应的PerfLogEnd用来记录结束时间
public long PerfLogEnd(String callerName, String method) {
Long startTime = startTimes.get(method);
long endTime = System. currentTimeMillis();
long duration = -1;
endTimes.put(method, new Long(endTime));
StringBuilder sb = new StringBuilder("</PERFLOG method=" ).append(method);
if (startTime != null) {
sb.append( " start=").append(startTime);
}
sb.append(" end=").append(endTime);
if (startTime != null) {
duration = endTime - startTime.longValue();
sb.append( " duration=").append(duration);
}
sb.append(" from=").append(callerName).append( ">");
LOG.info(sb);
return duration;
}
getStartTime和getEndTime分别通过对应的map来获取starttime和endtime
4.是不是所有的method都可以用这种方法获取执行的性能?答案是no,在PerfLogger类中定义了部分属性,对应于method
public static final String ACQUIRE_READ_WRITE_LOCKS = "acquireReadWriteLocks";
public static final String COMPILE = "compile";
public static final String PARSE = "parse";
public static final String ANALYZE = "semanticAnalyze";
public static final String DO_AUTHORIZATION = "doAuthorization";
public static final String DRIVER_EXECUTE = "Driver.execute";
public static final String INPUT_SUMMARY = "getInputSummary";
public static final String GET_SPLITS = "getSplits";
public static final String RUN_TASKS = "runTasks";
public static final String SERIALIZE_PLAN = "serializePlan";
public static final String DESERIALIZE_PLAN = "deserializePlan";
public static final String CLONE_PLAN = "clonePlan";
public static final String TASK = "task.";
public static final String RELEASE_LOCKS = "releaseLocks";
public static final String PRUNE_LISTING = "prune-listing";
public static final String PARTITION_RETRIEVING = "partition-retrieving";
public static final String PRE_HOOK = "PreHook.";
public static final String POST_HOOK = "PostHook.";
public static final String FAILURE_HOOK = "FailureHook.";
public static final String DRIVER_RUN = "Driver.run";
public static final String TIME_TO_SUBMIT = "TimeToSubmit";
public static final String TEZ_SUBMIT_TO_RUNNING = "TezSubmitToRunningDag";
public static final String TEZ_BUILD_DAG = "TezBuildDag";
public static final String TEZ_SUBMIT_DAG = "TezSubmitDag";
public static final String TEZ_RUN_DAG = "TezRunDag";
public static final String TEZ_CREATE_VERTEX = "TezCreateVertex.";
public static final String TEZ_RUN_VERTEX = "TezRunVertex.";
public static final String TEZ_INITIALIZE_PROCESSOR = "TezInitializeProcessor";
public static final String TEZ_RUN_PROCESSOR = "TezRunProcessor";
public static final String TEZ_INIT_OPERATORS = "TezInitializeOperators";
public static final String LOAD_HASHTABLE = "LoadHashtable";
public static final String ORC_GET_SPLITS = "OrcGetSplits";
再来看一下其调用方式:
比如org.apache.hadoop.hive.ql.Driver类中的compile方法:
static final private String CLASS_NAME = Driver. class.getName();
public int compile(String command, boolean resetTaskIds) {
PerfLogger perfLogger = PerfLogger.getPerfLogger(); //实例化一个PerfLogger的对象
.......
perfLogger.PerfLogBegin( CLASS_NAME, PerfLogger.PARSE); //记录开始时间,传入本类的类名记录的方法(方法对应了PerfLogger类中对应的属性)
ParseDriver pd = new ParseDriver();
ASTNode tree = pd.parse(command, ctx);
tree = ParseUtils.findRootNonNullToken(tree);
perfLogger.PerfLogEnd( CLASS_NAME, PerfLogger.PARSE);
记录parse消耗时间的对应日志:
14/07/31 14:33:21 INFO log.PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
.....
14/07/31 14:33:21 INFO log.PerfLogger: </PERFLOG method=parse start=1406788401914 end=1406788401916 duration=2 from=org.apache.hadoop.hive.ql.Driver>
3种方法可以根据需要选择使用。
转载于:https://blog.51cto.com/caiguangguang/1541771