hadoop的迭代实在太快,导致出现很多bug。
在运行SYNTH模式时。发现两个问题。
1.官方给出的SYNTH的json脚本:
{
"description" : "tiny jobs workload", //description of the meaning of this collection of workloads
"num_nodes" : 10, //total nodes in the simulated cluster
"nodes_per_rack" : 4, //number of nodes in each simulated rack
"num_jobs" : 10, // total number of jobs being simulated
"rand_seed" : 2, //the random seed used for deterministic randomized runs
// a list of “workloads”, each of which has job classes, and temporal properties
"workloads" : [
{
"workload_name" : "tiny-test", // name of the workload
"workload_weight": 0.5, // used for weighted random selection of which workload to sample from
"queue_name" : "sls_queue_1", //queue the job will be submitted to
//different classes of jobs for this workload
"job_classes" : [
{
"class_name" : "class_1", //name of the class
"class_weight" : 1.0, //used for weighted random selection of class within workload
//nextr group controls average and standard deviation of a LogNormal distribution that
//determines the number of mappers and reducers for thejob.
"mtasks_avg" : 5,
"mtasks_stddev" : 1,
"rtasks_avg" : 5,
"rtasks_stddev" : 1,
//averge and stdev input param of LogNormal distribution controlling job duration
"dur_avg" : 60,
"dur_stddev" : 5,
//averge and stdev input param of LogNormal distribution controlling mappers and reducers durations
"mtime_avg" : 10,
"mtime_stddev" : 2,
"rtime_avg" : 20,
"rtime_stddev" : 4,
//averge and stdev input param of LogNormal distribution controlling memory and cores for map and reduce
"map_max_memory_avg" : 1024,
"map_max_memory_stddev" : 0.001,
"reduce_max_memory_avg" : 2048,
"reduce_max_memory_stddev" : 0.001,
"map_max_vcores_avg" : 1,
"map_max_vcores_stddev" : 0.001,
"reduce_max_vcores_avg" : 2,
"reduce_max_vcores_stddev" : 0.001,
//probability of running this job with a reservation
"chance_of_reservation" : 0.5,
//input parameters of LogNormal distribution that determines the deadline slack (as a multiplier of job duration)
"deadline_factor_avg" : 10.0,
"deadline_factor_stddev" : 0.001,
}
],
// for each workload determines with what probability each time bucket is picked to choose the job starttime.
// In the example below the jobs have twice as much chance to start in the first minute than in the second minute
// of simulation, and then zero chance thereafter.
"time_distribution" : [
{ "time" : 1, "weight" : 66 },
{ "time" : 60, "weight" : 33 },
{ "time" : 120, "jobs" : 0 }
]
}
]
}
首先json文件中不能有任何注释,因此要删除这些注释才能运行。其次在
"deadline_factor_stddev" : 0.001,
这一行最后的逗号不能有,否则不符合json文件的格式,运行报错。
这里给出一个在线查看json文件是否合格的网站:
https://jsonlint.com/
2.在运行过程中
$HADOOP_HOME/share/hadoop/tools/sls/bin/slsrun.sh --tracetype=SYNTH --tracelocation=/home/c/sls/output2/SYNTH.json --output-dir=/home/c/sls/output1 --print-simulation
报错如下:
java.lang.IllegalArgumentException: Null user
at org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1225)
at org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1212)
at org.apache.hadoop.yarn.sls.appmaster.AMSimulator.submitReservationWhenSpecified(AMSimulator.java:177)
at org.apache.hadoop.yarn.sls.appmaster.AMSimulator.firstStep(AMSimulator.java:154)
at org.apache.hadoop.yarn.sls.scheduler.TaskRunner$Task.run(TaskRunner.java:88)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
java.lang.IllegalArgumentException: Null user
报错信息是NULL user传入了。
在最开始传入该参数时:
private void startAMFromSynthGenerator() throws YarnException, IOException {
Configuration localConf = new Configuration();
localConf.set("fs.defaultFS", "file:///");
long baselineTimeMS = 0;
// if we use the nodeFile this could have been not initialized yet.
if (stjp == null) {
stjp = new SynthTraceJobProducer(getConf(), new Path(inputTraces[0]));
}
SynthJob job = null;
// we use stjp, a reference to the job producer instantiated during node
// creation
while ((job = (SynthJob) stjp.getNextJob()) != null) {
// only support MapReduce currently
String user = job.getUser();
getUser()返回后没有判断是不是为NULL。导致错误。
而对于从SLS和rumen输入的函数,得到user时是做了判断的:
private void createAMForJob(Map jsonJob) throws YarnException {
long jobStartTime = Long.parseLong(
jsonJob.get(SLSConfiguration.JOB_START_MS).toString());
long jobFinishTime = 0;
if (jsonJob.containsKey(SLSConfiguration.JOB_END_MS)) {
jobFinishTime = Long.parseLong(
jsonJob.get(SLSConfiguration.JOB_END_MS).toString());
}
String user = (String) jsonJob.get(SLSConfiguration.JOB_USER);
if (user == null) {
user = "default";
}
所以我在想是不是hadoop官方对SYNTH的支持不是很完善。