14/10/11 10:00:04 WARN crawl.Crawl: solrUrl is not set, indexing will be skipped... 14/10/11 10:00:06 INFO crawl.Crawl: crawl started in: crawl 14/10/11 10:00:06 INFO crawl.Crawl: rootUrlDir = data/urls 14/10/11 10:00:06 INFO crawl.Crawl: threads = 100 14/10/11 10:00:06 INFO crawl.Crawl: depth = 3 14/10/11 10:00:06 INFO crawl.Crawl: solrUrl=null 14/10/11 10:00:06 INFO crawl.Crawl: topN = 100 14/10/11 10:00:07 INFO crawl.Injector: Injector: starting at 2014-10-11 10:00:07 14/10/11 10:00:07 INFO crawl.Injector: Injector: crawlDb: crawl/crawldb 14/10/11 10:00:07 INFO crawl.Injector: Injector: urlDir: data/urls 14/10/11 10:00:07 INFO Configuration.deprecation: mapred.temp.dir is deprecated. Instead, use mapreduce.cluster.temp.dir 14/10/11 10:00:07 INFO crawl.Injector: Injector: Converting injected urls to crawl db entries. 14/10/11 10:00:09 INFO client.RMProxy: Connecting to ResourceManager at idc66/192.168.56.66:8080 14/10/11 10:00:10 INFO client.RMProxy: Connecting to ResourceManager at idc66/192.168.56.66:8080 14/10/11 10:00:20 INFO mapred.FileInputFormat: Total input paths to process : 1 14/10/11 10:00:20 INFO mapreduce.JobSubmitter: number of splits:2 14/10/11 10:00:20 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1412989543453_0001 14/10/11 10:00:21 INFO impl.YarnClientImpl: Submitted application application_1412989543453_0001 14/10/11 10:00:21 INFO mapreduce.Job: The url to track the job: http://idc66:8088/proxy/application_1412989543453_0001/ 14/10/11 10:00:21 INFO mapreduce.Job: Running job: job_1412989543453_0001 14/10/11 10:00:39 INFO mapreduce.Job: Job job_1412989543453_0001 running in uber mode : false 14/10/11 10:00:39 INFO mapreduce.Job: map 0% reduce 0% 14/10/11 10:01:26 INFO mapreduce.Job: map 100% reduce 0% 14/10/11 10:01:51 INFO mapreduce.Job: map 100% reduce 100% 14/10/11 10:01:52 INFO mapreduce.Job: Job job_1412989543453_0001 completed successfully 14/10/11 10:01:52 INFO mapreduce.Job: Counters: 50 File System Counters FILE: Number of bytes read=6 FILE: Number of bytes written=351842 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=245 HDFS: Number of bytes written=86 HDFS: Number of read operations=9 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=2 Launched reduce tasks=1 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=179870 Total time spent by all reduces in occupied slots (ms)=53334 Total time spent by all map tasks (ms)=89935 Total time spent by all reduce tasks (ms)=17778 Total vcore-seconds taken by all map tasks=89935 Total vcore-seconds taken by all reduce tasks=17778 Total megabyte-seconds taken by all map tasks=138140160 Total megabyte-seconds taken by all reduce tasks=54614016 Map-Reduce Framework Map input records=2 Map output records=0 Map output bytes=0 Map output materialized bytes=12 Input split bytes=200 Combine input records=0 Combine output records=0 Reduce input groups=0 Reduce shuffle bytes=12 Reduce input records=0 Reduce output records=0 Spilled Records=0 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=8397 CPU time spent (ms)=8130 Physical memory (bytes) snapshot=1627553792 Virtual memory (bytes) snapshot=7534977024 Total committed heap usage (bytes)=1441071104 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 injector urls_filtered=2 File Input Format Counters Bytes Read=45 File Output Format Counters Bytes Written=86 14/10/11 10:01:52 INFO crawl.Injector: Injector: total number of urls rejected by filters: 2 14/10/11 10:01:52 INFO crawl.Injector: Injector: total number of urls injected after normalization and filtering: 0 14/10/11 10:01:52 INFO crawl.Injector: Injector: Merging injected urls into crawl db. 14/10/11 10:01:52 INFO client.RMProxy: Connecting to ResourceManager at idc66/192.168.56.66:8080 14/10/11 10:01:52 INFO client.RMProxy: Connecting to ResourceManager at idc66/192.168.56.66:8080 14/10/11 10:01:58 INFO mapred.FileInputFormat: Total input paths to process : 1 14/10/11 10:01:58 INFO mapreduce.JobSubmitter: number of splits:1 14/10/11 10:01:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1412989543453_0002 14/10/11 10:01:58 INFO impl.YarnClientImpl: Submitted application application_1412989543453_0002 14/10/11 10:01:58 INFO mapreduce.Job: The url to track the job: http://idc66:8088/proxy/application_1412989543453_0002/ 14/10/11 10:01:58 INFO mapreduce.Job: Running job: job_1412989543453_0002 14/10/11 10:02:33 INFO mapreduce.Job: Job job_1412989543453_0002 running in uber mode : false 14/10/11 10:02:33 INFO mapreduce.Job: map 0% reduce 0% 14/10/11 10:02:40 INFO mapreduce.Job: map 100% reduce 0% 14/10/11 10:02:49 INFO mapreduce.Job: map 100% reduce 100% 14/10/11 10:02:49 INFO mapreduce.Job: Job job_1412989543453_0002 completed successfully 14/10/11 10:02:49 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=6 FILE: Number of bytes written=234971 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=230 HDFS: Number of bytes written=215 HDFS: Number of read operations=7 HDFS: Number of large read operations=0 HDFS: Number of write operations=4 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=10712 Total time spent by all reduces in occupied slots (ms)=17775 Total time spent by all map tasks (ms)=5356 Total time spent by all reduce tasks (ms)=5925 Total vcore-seconds taken by all map tasks=5356 Total vcore-seconds taken by all reduce tasks=5925 Total megabyte-seconds taken by all map tasks=8226816 Total megabyte-seconds taken by all reduce tasks=18201600 Map-Reduce Framework Map input records=0 Map output records=0 Map output bytes=0 Map output materialized bytes=6 Input split bytes=144 Combine input records=0 Combine output records=0 Reduce input groups=0 Reduce shuffle bytes=6 Reduce input records=0 Reduce output records=0 Spilled Records=0 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=44 CPU time spent (ms)=2470 Physical memory (bytes) snapshot=448868352 Virtual memory (bytes) snapshot=5610287104 Total committed heap usage (bytes)=724303872 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=86 File Output Format Counters Bytes Written=215 14/10/11 10:02:49 INFO client.RMProxy: Connecting to ResourceManager at idc66/192.168.56.66:8080 14/10/11 10:02:49 INFO crawl.Injector: Injector: finished at 2014-10-11 10:02:49, elapsed: 00:02:41 14/10/11 10:02:49 INFO crawl.Generator: Generator: starting at 2014-10-11 10:02:49 14/10/11 10:02:49 INFO crawl.Generator: Generator: Selecting best-scoring urls due for fetch. 14/10/11 10:02:49 INFO crawl.Generator: Generator: filtering: true 14/10/11 10:02:49 INFO crawl.Generator: Generator: normalizing: true 14/10/11 10:02:49 INFO crawl.Generator: Generator: topN: 100 14/10/11 10:02:49 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 14/10/11 10:02:49 INFO crawl.Generator: Generator: jobtracker is 'local', generating exactly one partition. 14/10/11 10:02:49 INFO client.RMProxy: Connecting to ResourceManager at idc66/192.168.56.66:8080 14/10/11 10:02:49 INFO client.RMProxy: Connecting to ResourceManager at idc66/192.168.56.66:8080 14/10/11 10:02:55 INFO mapred.FileInputFormat: Total input paths to process : 1 14/10/11 10:02:55 INFO mapreduce.JobSubmitter: number of splits:1 14/10/11 10:02:55 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1412989543453_0003 14/10/11 10:02:55 INFO impl.YarnClientImpl: Submitted application application_1412989543453_0003 14/10/11 10:02:55 INFO mapreduce.Job: The url to track the job: http://idc66:8088/proxy/application_1412989543453_0003/ 14/10/11 10:02:55 INFO mapreduce.Job: Running job: job_1412989543453_0003 14/10/11 10:03:08 INFO mapreduce.Job: Job job_1412989543453_0003 running in uber mode : false 14/10/11 10:03:08 INFO mapreduce.Job: map 0% reduce 0% 14/10/11 10:03:17 INFO mapreduce.Job: map 100% reduce 0% 14/10/11 10:04:15 INFO mapreduce.Job: map 100% reduce 100% 14/10/11 10:04:15 INFO mapreduce.Job: Job job_1412989543453_0003 completed successfully 14/10/11 10:04:15 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=6 FILE: Number of bytes written=237427 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=205 HDFS: Number of bytes written=0 HDFS: Number of read operations=5 HDFS: Number of large read operations=0 HDFS: Number of write operations=0 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=12016 Total time spent by all reduces in occupied slots (ms)=164820 Total time spent by all map tasks (ms)=6008 Total time spent by all reduce tasks (ms)=54940 Total vcore-seconds taken by all map tasks=6008 Total vcore-seconds taken by all reduce tasks=54940 Total megabyte-seconds taken by all map tasks=9228288 Total megabyte-seconds taken by all reduce tasks=168775680 Map-Reduce Framework Map input records=0 Map output records=0 Map output bytes=0 Map output materialized bytes=6 Input split bytes=119 Combine input records=0 Combine output records=0 Reduce input groups=0 Reduce shuffle bytes=6 Reduce input records=0 Reduce output records=0 Spilled Records=0 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=289 CPU time spent (ms)=3510 Physical memory (bytes) snapshot=929054720 Virtual memory (bytes) snapshot=5619216384 Total committed heap usage (bytes)=763297792 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=86 File Output Format Counters Bytes Written=0 14/10/11 10:04:15 WARN crawl.Generator: Generator: 0 records selected for fetching, exiting ... 14/10/11 10:04:15 INFO crawl.Crawl: Stopping at depth=0 - no more URLs to fetch. 14/10/11 10:04:15 WARN crawl.Crawl: No URLs to fetch - check your seed list and URL filters. 14/10/11 10:04:15 INFO crawl.Crawl: crawl finished: crawl |