工具 :ubuntu12.04 + spark1.0.0
目标:spark运行过程中会产生日志。希望让客户实时观察到任务执行的过程,就需要截取日志信息展现给客户。
eg:Starting task 17.0:0 as TID 28 on executor localhost: localhost (PROCESS_LOCAL)
Finished TID 28 in 28 ms on localhost (progress: 1/1)
需要截取 以上两段日志信息,并截取任务ID号,和运行时间。
配置文件位置为:/usr/local/spark/spark-1.0.0-bin-hadoop1/conf/log4j.properties.template , 配置信息如下:
log4j.logger.org.apache.spark.scheduler.TaskSetManager = INFO,A5 (这里只截取org.apache.spark.scheduler.TaskSetManager类的日志,日志级别为INFO)
log4j.appender.A5=org.apache.log4j.WriterAppender (以流的方式输出)
log4j.appender.A5.Threshold=INFO
log4j.appender.A5.layout=org.apache.log4j.PatternLayout
log4j.appender.A5.layout.ConversionPattern=%m %n
代码如下,
import java.io.{IOException, PipedWriter, Writer, PipedReader}
import java.sql.{DriverManager, ResultSet}
import java.util.Scanner
import com.typesafe.config.{Config, ConfigFactory}
import org.apache.log4j._
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.mllib.recommendation.{ALS, Rating}
import spark.jobserver.{SparkJobValid, SparkJobValidation, SparkJob}
class CourseCF(rank:Int = 3,numIterations:Int= 10,lambda:Double=0.01)
{
/**
*
* 类描述:
* CourseCF 调用协同过滤算法
* @author gongxuan
* @note 2015-3-12 创建
* @version 1.0
*/