1,通过sor查询nutch抓取的结果
{
"responseHeader": {
"status": 0,
"QTime": 2,
"params": {
"indent": "true",
"q": "title:幻想",
"_": "1418266706916",
"wt": "json"
}
},
"response": {
"numFound": 7,
"start": 0,
"docs": [
{
"content": "幻想江湖-2.2资料片,巅峰对决,震撼来袭! 跳转官网 装备凝练 巅峰擂台 万圣之夜 新版时装",
"id": "http://hxjh.zqgame.com/",
"title": "幻想江湖-2.2资料片,巅峰对决,震撼来袭!",
"segment": "20141211104005",
"boost": 0,
"digest": "c61521c1861b1a7574c8920fd27d0155",
"tstamp": "2014-12-11T02:40:14.477Z",
"url": "http://hxjh.zqgame.com/",
"anchor": [
"幻想江湖",
"幻想江湖"
],
"_version_": 1487159323035435000
},
{
"content": "幻想江湖-鬼灵精怪万圣节 开启时间 : 10 月 30 日 万圣节礼包领取> 万圣节前夕,为了避免恶灵干扰,大侠们纷纷挂起了南瓜灯,驱逐鬼怪。江湖有一传闻,一群糖果商人行经龙脉岭时,因为身上的糖果、饼干、宝石而找来鬼魂附身,如果帮助他们驱逐了附身邪灵,将会获得他们道谢的礼物哦~! 1 万圣节天天有礼 2 练级打宝两不误 3 节日消费奖励翻倍 4 奖励兑换惊喜不断 5 洗炼折扣大放送 温馨提示: 活动期间,大侠们请每天提着南瓜灯,穿上蝙蝠衫,去龙脉去收集糖果饼干,要不停地说:“trick or treat.”(意思是给不给,不给就捣蛋)。要是不肯给的话,就用各种方法去惩罚他,例如:一招一个怪,“唰唰唰————”把龙脉挂个三小时! 关闭 恭喜你获得幻想江湖万圣节礼包! IOS用户领取: 安卓和越狱用户领取: 有效时间: 即日-2014.11.30 兑换次数: 只限兑换一次 兑换范围: 全服 礼包使用方法: 登录游戏后,点击游戏右上方【领奖】-【福利】-【礼包】后输入正确礼品卡号领取礼包奖励!",
"id": "http://hd.zqgame.com/zqgame/hxjh/dfdj/active_03.html",
"title": "幻想江湖-鬼灵精怪万圣节",
"segment": "20141211104057",
"boost": 0,
"digest": "5ae39251ad06017e4e1854aae9129126",
"tstamp": "2014-12-11T02:41:37.669Z",
"url": "http://hd.zqgame.com/zqgame/hxjh/dfdj/active_03.html",
"anchor": [
"万圣之夜"
],
"_version_": 1487159633802952700
},
{
"content": "幻想江湖-优雅转身华丽时装首曝 夜魔游龙 西式时装 全新时装新品上架啦,这批时装看上去是不是和以前大有不同呢,此次大胆革新,看到下面的时装,不禁令人想到后面可能真的会有结婚系统咯,新版本新换装,不走平凡路~我们就是这样的与众不同! 进入官网 返回活动首页",
"id": "http://hd.zqgame.com/zqgame/hxjh/dfdj/active_04.html",
"title": "幻想江湖-优雅转身华丽时装首曝",
"segment": "20141211104057",
"boost": 0,
"digest": "e086540bf0f721f39560440c85d2161f",
"tstamp": "2014-12-11T02:41:47.879Z",
"url": "http://hd.zqgame.com/zqgame/hxjh/dfdj/active_04.html",
"anchor": [
"新版时装"
],
"_version_": 1487159633805049900
},
{
"content": "《幻想江湖》官网-首部超萌动作武侠片!今天开始,做武侠片主人公 首页 新闻中心 游戏资料 游戏论坛 分享到: 安卓下载 ios越狱下载 ios正版下载 礼包领取 1 2 3 4 幻想江湖绝尚发布会精彩视频 最新 新闻 公告 活动 《幻想江湖》IOS18区“美人天下”12月10日火爆开启 2014史上最萌武侠手游来袭!不用吃药,放弃治疗,12月10日上午11:00新区“美人天下”火爆开启!快来没日没夜一起萌萌哒!... 查看详情 > 2014-12-10 • [新闻] 菜鸟进阶强力党 《幻想江湖》装备属性轻松堆 2014-12-10 • [新闻] 全新资料片即将来袭《幻想江湖》四大活动任你玩 2014-12-09 • [活动] 双12 玩幻想送福利 2014-12-09 • [新闻] 细节决定成败 《幻想江湖》人物属性全掌握 2014-12-09 • [活动] 《幻想江湖》IOS18区”美人天下”十六大活动 2014-12-08 • [新闻] 刀尖上的武侠 挑战《幻想江湖》秦陵副本 2014-12-10 • [新闻] 菜鸟进阶强力党 《幻想江湖》装备属性轻松堆 2014-12-10 • [新闻] 全新资料片即将来袭《幻想江湖》四大活动任你玩 2014-12-09 • [新闻] 细节决定成败 《幻想江湖》人物属性全掌握 2014-12-08 • [新闻] 刀尖上的武侠 挑战《幻想江湖》秦陵副本 2014-12-08 • [新闻] 新版“姑姑”遭吐槽 《幻想江湖》还你女神梦 2014-12-05 • [新闻] 《幻想江湖》我们结婚吧!——订婚篇 2014-12-03 • [公告] 幻想江湖-公测9~14区 数据互通公告 2014-12-01 • [公告] 《幻想江湖》12月2日临时维护公告 2014-11-26 • [公告] 《幻想江湖》2.4版本更新 2014-11-25 • [公告] 幻想江湖临时维护公告 2014-11-25 • [公告] 《幻想江湖》appstore1~8服数据互通完毕 2014-11-25 • [公告] 幻想江湖-appstore数据互通延长公告 2014-12-09 • [活动] 双12 玩幻想送福利 2014-12-09 • [活动] 《幻想江湖》IOS18区”美人天下”十六大活动 2014-12-08 • [活动] 周末齐消费 欢乐享不停 2014-12-08 • [活动] 《幻想江湖》美女主播齐聚乐———回顾 2014-12-08 • [活动] 《幻想江湖》25区”独步江湖”十六大活动 2014-12-05 • [活动] 《幻想江湖》玩家体验指南——做好产品,专注体验 联系人:施若熙 联系QQ:744415486 手机:13510624817 邮箱:ruoxi.shi@zqgame.com 联系人:方彦琼 联系QQ:611535985 手机:13603061895 邮箱:yanqiong.fang@zqgame.com 玩家群② 264103428 企业客服QQ:800056019 客服热线:0755-86160520 特色玩法 玩家攻略 职业介绍 明教 唐门 天山 逍遥 18183 766 91手游网 合作媒体 ———————————————————— 微信公众号 新浪微博 腾讯微博 扫描二维码下载 快速注册 通行证: 密 码: 确认密码: 验证码: 立即注册 用户名 恭喜你已经注册成功! 关闭 恭喜您获得幻想江湖公测新手礼包! 你的礼包卡号是: 礼包使用方法: 登陆游戏后,点击游戏右上方【领奖】-【福利】-【礼包】后输入8位的礼包卡号领取礼包奖励!内容包含:止血丹*2、白色强化石*20、成长丹*5、易功丹*10、进阶丹*5。 关闭 微信公众号",
"id": "http://hxjh.zqgame.com/index.html",
"title": "《幻想江湖》官网-首部超萌动作武侠片!今天开始,做武侠片主人公",
"segment": "20141211104057",
"boost": 0,
"digest": "3f9a2060e12f95316ee0201ce8a21da0",
"tstamp": "2014-12-11T02:41:01.462Z",
"url": "http://hxjh.zqgame.com/index.html",
"anchor": [
"进入官网"
],
"_version_": 1487159633828118500
},
{
"content": "【仙幻奇缘】官网 12.6首次开放公测!无商城,真正免费! 进入官网 论坛中心 游戏下载 购卡充值 1 2 3 4 5 媒体友链 通行证账号: 通行证密码: 确认密码: 验证码: 同意 《中青宝》协议 恭喜你!注册成功! 用户名是: 客户端 立即下载 获取特权礼包 版权所有:深圳中青宝互动网络股份有限公司 客服传真:0755-86368269 中华人民共和国增值电信业务经营许可证:粤B2-20030216 粤ICP备:09057836 网络文化经营许可证:文网文[2008]088号 中华人民共和国互联网出版许可证:新出网证(粤)字017号 每个IP只能参加一次抽奖, 谢谢您的参与! ",
"id": "http://xh.zqgame.com/",
"title": "【仙幻奇缘】官网 12.6首次开放公测!无商城,真正免费!",
"segment": "20141211104057",
"boost": 0,
"digest": "471def081683b7c5f94a39382e4c00a1",
"tstamp": "2014-12-11T02:41:02.165Z",
"url": "http://xh.zqgame.com/",
"anchor": [
"仙幻奇缘",
"仙幻奇缘"
],
"_version_": 1487159634570510300
},
{
"content": "《诸神世界》官方网站—3D魔幻战争网游 诸神世界 首页 新闻动态 游戏资料 下载微端 快速充值 官方论坛 下载微端 快速充值 VIP介绍 领取新手卡 选择大区 请选择服务器 风暴荒漠 战争血径 无尽沙海 燃烧平原 双线1-16服 领取中,请稍候…… 您的礼包号为: 更多服务器 《诸神世界》是一款MMORPG的3D国战网页游戏,采用魔幻风格,3D旋转俯瞰视角,以国家战争、团队冒险等玩法为特色,以大范围多维度强PVP玩法为核心的超激情游戏,体验游戏国战pk激情就来诸神世界。 0755-26635899 客服邮箱:kefu@zqgame.com 客服传真:0755-86368269 游戏QQ群:219759659 259942575 用户名: * 以字母开头由大小写字母、数字、下划线组成,长度为4-32位 密码: * 6-20字母、数字、符号组成,不含空格键、「\"」及「'」 确认密码: * 请再一次输入密码 1 2 3 4 最新 新闻 活动 公告 攻略 诸神世界混服部分区服数据互通公告 公告 06-04 诸神世界混服部分区服数据互通公告 公告 05-23 5月29日12点诸神新区-风暴荒漠火爆开启 新闻 05-14 5月15日12点诸神新区-亡魂峡谷火爆开启 公告 04-18 诸神世界混服合服活动精彩上线 公告 04-18 诸神世界混服部分区服数据互通公告 【新闻】 05-14 5月15日12点诸神新区-亡魂峡谷火爆开启 【新闻】 04-16 4月17日12点诸神新区-呼啸沙漠火爆开启 【新闻】 03-24 3月27日12点诸神新区-巨龙之吼火爆开启 【新闻】 03-17 3月20日12点诸神新区-尘风峡谷火爆开启 【新闻】 03-11 3月13日12点诸神新区-耳语海岸火爆开启 【活动】 04-02 《诸神世界》十大开服活动 【活动】 02-13 《诸神世界》元宵&情人节活动 【活动】 01-26 《诸神世界》春节活动 【活动】 11-21 诸神世界周末限时活动火爆上线 【活动】 11-08 双十一《诸神世界》劲爆大酬宾 【公告】 06-04 诸神世界混服部分区服数据互通公告 【公告】 05-23 5月29日12点诸神新区-风暴荒漠火爆开启 【公告】 04-18 诸神世界混服合服活动精彩上线 【公告】 04-18 诸神世界混服部分区服数据互通公告 【公告】 03-25 3月28日平台网络升级公告 魔 牧 枪 炮 术 战 魔 刃 狩猎灵魂 攻击方式:近程魔法攻击 核心属性:智力 敏捷 职业特质:隐匿暗杀能力 职业说明:刀锋舞者,狩猎着生者的灵魂。隐没于黑暗,游走于光明。不被历史描述,却是历史的主宰! 点击查看详情 牧 师 神的宠儿 攻击方式:中程魔法攻击 核心属性:精神 智力 职业特质:恢复治愈能力 职业说明:神之使徒,捍卫生者,拯救死者。信者永生,不信者也救赎。虔诚的信徒,是神的宠儿! 点击查看详情 枪 手 一击必杀 攻击方式:远程物理攻击 核心属性:力量 精神 职业特质:伤害输出 职业说明:猎命王者,半边恶魔半边天使。沉着冷静,是他们的特质;一击必杀,是他们的实力! 点击查看详情 魔 炮 焚天怒焰 攻击方式:远程魔法攻击 核心属性:智力 精神 职业特质:群体伤害 职业说明:焚天烈焰,吞噬罪孽与苍生。沉稳步伐,吼出战歌嘹亮;怒放炮火,点亮生命奇迹! 点击查看详情 术 士 破碎虚空 攻击方式:中程魔法攻击 核心属性:智力 精神 职业特质:战斗节奏控制能力 职业说明:掌握法则,智慧象征。探索真理,识古通今,洞悉未来。以世间威能,抑恶扬善,改天逆命,破碎虚空! 点击查看详情 战 士 金刚不坏 攻击方式:近程物理攻击 核心属性:体质 力量 职业特质:生存能力 职业说明:移动城墙,金刚不坏。战,则掠地千里;守,则万夫莫开。英勇的灵魂铸造不灭传奇! 点击查看详情 系统介绍 进阶指导 特色系统 活动玩法 结婚系统 | 职业介绍 | FAQ | VIP如何获得 | 坐骑强化 | 转职重修 | 战友系统 | 升级送祝福 | 日常任务 | 拍卖寄售 | 技能遗忘重生 | 道具商城 | 财产保护 炼金系统 | 星耀石 | 装备镶嵌 | 装备升阶 | 装备打孔 | 要塞守卫站 | 神器合成 | 宠物潜力修改 | 宝石摘除 斗气系统 | 羽翼系统 | 1V1模拟战 | 移民系统 | 击鼓传花 | 情缘任务 | 神圣血脉 | 军衔系统 | 钓鱼系统 | 称号系统 | 封印进度 | 离线经验 巴比伦塔 | 跨区国战 | 跨区巡游 | 跨区极速狂飙 | 跨区组队争夺战 | 超级血战到底 | 血战到底 | 小丑的梦境 | 王者试炼 | 探险者地宫 | 前线速递 | 骑魂谷 | 冒险岛 | 极速狂飙 | 毁灭神迹 | 国家正式战争 | 国家远征 | 国家情报 | 国家BOSS | 藏宝峡谷 游戏壁纸 游戏截图 玩家相册 MORE 265G百科 073专区 新浪爱问 抵制不良游戏 拒绝盗版游戏 注意自我保护 谨防上当受骗 适度游戏益脑 沉迷游戏伤身 合理安排时间 享受健康生活 增值电信许可证:粤B2-20120680 网络文化经营许可证: 粤网文[2014]0615-215号 粤ICP备09057836号 深圳市卓页互动网络科技有限公司 Copyright © 2012-2014 All Rights Reserved 本游戏适合18岁以上用户,不含暴力、恐怖、残酷、色情等妨害未成年人身心健康的内容,属于绿色健康产品 yy",
"id": "http://zs.ucjoy.com/",
"title": "《诸神世界》官方网站—3D魔幻战争网游",
"cache": "content",
"segment": "20141211104057",
"boost": 0,
"digest": "8d00af8aaa03c2cf68a69dc68892b764",
"tstamp": "2014-12-11T02:41:18.686Z",
"url": "http://zs.ucjoy.com/",
"anchor": [
"官网",
"诸神世界"
],
"_version_": 1487159634641813500
},
{
"content": "《诸神世界》官方网站—3D魔幻战争网游 诸神世界 首页 新闻动态 游戏资料 下载微端 快速充值 官方论坛 下载微端 快速充值 VIP介绍 领取新手卡 选择大区 请选择服务器 风暴荒漠 战争血径 无尽沙海 燃烧平原 双线1-16服 领取中,请稍候…… 您的礼包号为: 更多服务器 《诸神世界》是一款MMORPG的3D国战网页游戏,采用魔幻风格,3D旋转俯瞰视角,以国家战争、团队冒险等玩法为特色,以大范围多维度强PVP玩法为核心的超激情游戏,体验游戏国战pk激情就来诸神世界。 0755-26635899 客服邮箱:kefu@zqgame.com 客服传真:0755-86368269 游戏QQ群:219759659 259942575 用户名: * 以字母开头由大小写字母、数字、下划线组成,长度为4-32位 密码: * 6-20字母、数字、符号组成,不含空格键、「\"」及「'」 确认密码: * 请再一次输入密码 您所在的位置: 首页 > 服务器列表 推荐服务器列表 风暴荒漠 火爆 战争血径 火爆 我的服务器列表 你还未进入过游戏,请先登录游戏! 所有服务器 1-10 11-20 诸神混服 双线1-16服 火爆 风暴荒漠 火爆 战争血径 火爆 无尽沙海 火爆 燃烧平原 火爆 抵制不良游戏 拒绝盗版游戏 注意自我保护 谨防上当受骗 适度游戏益脑 沉迷游戏伤身 合理安排时间 享受健康生活 增值电信许可证:粤B2-20120680 网络文化经营许可证: 粤网文[2014]0615-215号 粤ICP备09057836号 深圳市卓页互动网络科技有限公司 Copyright © 2012-2014 All Rights Reserved 本游戏适合18岁以上用户,不含暴力、恐怖、残酷、色情等妨害未成年人身心健康的内容,属于绿色健康产品 yy",
"id": "http://zs.ucjoy.com/serverlist.app",
"title": "《诸神世界》官方网站—3D魔幻战争网游",
"cache": "content",
"segment": "20141211104057",
"boost": 0,
"digest": "30a836aae5886924d1a87d3ab1ad42c8",
"tstamp": "2014-12-11T02:41:13.476Z",
"url": "http://zs.ucjoy.com/serverlist.app",
"anchor": [
"进入新服",
"开始游戏"
],
"_version_": 1487159634643910700
}
]
}
}
2,截图展示solr展示的结果
bin/crawl urls crawl http://xx.xx.xx.xx:8983/solr 5
3,nutch抓取时候日志:
<pre name="code" class="plain">2014-12-11 10:23:02,927 INFO crawl.Injector - Injector: starting at 2014-12-11 10:23:02
2289 2014-12-11 10:23:02,928 INFO crawl.Injector - Injector: crawlDb: crawl/crawldb
2290 2014-12-11 10:23:02,928 INFO crawl.Injector - Injector: urlDir: urls
2291 2014-12-11 10:23:02,929 INFO crawl.Injector - Injector: Converting injected urls to crawl db entries.
2292 2014-12-11 10:23:03,210 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2293 2014-12-11 10:23:03,266 WARN snappy.LoadSnappy - Snappy native library not loaded
2294 2014-12-11 10:23:03,748 INFO regex.RegexURLNormalizer - can't find rules for scope 'inject', using default
2295 2014-12-11 10:23:04,496 INFO crawl.Injector - Injector: Total number of urls rejected by filters: 0
2296 2014-12-11 10:23:04,496 INFO crawl.Injector - Injector: Total number of urls after normalization: 1
2297 2014-12-11 10:23:04,496 INFO crawl.Injector - Injector: Merging injected urls into crawl db.
2298 2014-12-11 10:23:04,779 INFO crawl.Injector - Injector: overwrite: false
2299 2014-12-11 10:23:04,779 INFO crawl.Injector - Injector: update: false
2300 2014-12-11 10:23:05,606 INFO crawl.Injector - Injector: URLs merged: 1
2301 2014-12-11 10:23:05,611 INFO crawl.Injector - Injector: Total new urls injected: 0
2302 2014-12-11 10:23:05,612 INFO crawl.Injector - Injector: finished at 2014-12-11 10:23:05, elapsed: 00:00:02
2303 2014-12-11 10:23:06,551 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2304 2014-12-11 10:23:06,552 INFO crawl.Generator - Generator: starting at 2014-12-11 10:23:06
2305 2014-12-11 10:23:06,552 INFO crawl.Generator - Generator: Selecting best-scoring urls due for fetch.
2306 2014-12-11 10:23:06,552 INFO crawl.Generator - Generator: filtering: false
2307 2014-12-11 10:23:06,552 INFO crawl.Generator - Generator: normalizing: true
2308 2014-12-11 10:23:06,552 INFO crawl.Generator - Generator: topN: 50000
2309 2014-12-11 10:23:07,201 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
2310 2014-12-11 10:23:07,202 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000
2311 2014-12-11 10:23:07,202 INFO crawl.AbstractFetchSchedule - maxInterval=7776000
2312 2014-12-11 10:23:07,211 INFO regex.RegexURLNormalizer - can't find rules for scope 'partition', using default
2313 2014-12-11 10:23:07,267 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
2314 2014-12-11 10:23:07,267 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000
2315 2014-12-11 10:23:07,267 INFO crawl.AbstractFetchSchedule - maxInterval=7776000
2316 2014-12-11 10:23:07,272 INFO regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default
2317 2014-12-11 10:23:07,875 INFO crawl.Generator - Generator: Partitioning selected urls for politeness.
2318 2014-12-11 10:23:08,875 INFO crawl.Generator - Generator: segment: crawl/segments/20141211102308
2319 2014-12-11 10:23:09,051 INFO regex.RegexURLNormalizer - can't find rules for scope 'partition', using default
2320 2014-12-11 10:23:09,993 INFO crawl.Generator - Generator: finished at 2014-12-11 10:23:09, elapsed: 00:00:03
2321 2014-12-11 10:23:10,681 INFO fetcher.Fetcher - Fetcher: starting at 2014-12-11 10:23:10
2322 2014-12-11 10:23:10,681 INFO fetcher.Fetcher - Fetcher: segment: crawl/segments/20141211102308
2323 2014-12-11 10:23:10,681 INFO fetcher.Fetcher - Fetcher Timelimit set for : 1418275390681
2324 2014-12-11 10:23:10,956 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2325 2014-12-11 10:23:11,415 INFO fetcher.Fetcher - Using queue mode : byHost
2326 2014-12-11 10:23:11,415 INFO fetcher.Fetcher - Fetcher: threads: 50
2327 2014-12-11 10:23:11,415 INFO fetcher.Fetcher - Fetcher: time-out divisor: 2
2328 2014-12-11 10:23:11,435 INFO fetcher.Fetcher - QueueFeeder finished: total 18 records + hit by time limit :0
2329 2014-12-11 10:23:11,585 INFO fetcher.Fetcher - Using queue mode : byHost
2330 2014-12-11 10:23:11,586 INFO fetcher.Fetcher - Using queue mode : byHost
2331 2014-12-11 10:23:11,586 INFO fetcher.Fetcher - fetching http://v.zqgame.com/moviePlay/goMoviePlay/5/001 (queue crawl delay=5000ms)
2332 2014-12-11 10:23:11,587 INFO fetcher.Fetcher - Using queue mode : byHost
2348 2014-12-11 10:23:11,597 INFO http.Http - http.proxy.host = null
2349 2014-12-11 10:23:11,597 INFO http.Http - http.proxy.port = 8080
2350 2014-12-11 10:23:11,597 INFO http.Http - http.timeout = 10000
2351 2014-12-11 10:23:11,597 INFO http.Http - http.content.limit = 65536
2352 2014-12-11 10:23:11,597 INFO http.Http - http.agent = My Nutch Spider/Nutch-1.9
2353 2014-12-11 10:23:11,597 INFO http.Http - http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3
2354 2014-12-11 10:23:11,597 INFO http.Http - http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
2355 2014-12-11 10:23:11,597 INFO fetcher.Fetcher - Using queue mode : byHost
2387 2014-12-11 10:23:11,620 INFO fetcher.Fetcher - Fetcher: throughput threshold: -1
2388 2014-12-11 10:23:11,620 INFO fetcher.Fetcher - Fetcher: throughput threshold retries: 5
2389 2014-12-11 10:23:11,620 INFO fetcher.Fetcher - fetcher.maxNum.threads can't be < than 50 : using 50 instead
2390 2014-12-11 10:23:12,622 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17, fetchQueues.getQueueCount=1
2391 2014-12-11 10:23:13,622 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17, fetchQueues.getQueueCount=1
2392 2014-12-11 10:23:14,623 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17, fetchQueues.getQueueCount=1
2393 2014-12-11 10:23:15,623 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17, fetchQueues.getQueueCount=1
2394 2014-12-11 10:23:16,624 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17, fetchQueues.getQueueCount=1
2395 2014-12-11 10:23:16,891 INFO fetcher.Fetcher - fetching http://v.zqgame.com/moviePlay/goMoviePlay/3/3 (queue crawl delay=5000ms)
2396 2014-12-11 10:23:17,624 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16, fetchQueues.getQueueCount=1
2397 2014-12-11 10:23:18,625 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16, fetchQueues.getQueueCount=1
2398 2014-12-11 10:23:19,625 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16, fetchQueues.getQueueCount=1
2399 2014-12-11 10:23:20,626 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16, fetchQueues.getQueueCount=1
2400 2014-12-11 10:23:21,626 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16, fetchQueues.getQueueCount=1
2401 2014-12-11 10:23:21,935 INFO fetcher.Fetcher - fetching http://v.zqgame.com/view/index (queue crawl delay=5000ms)
2402 2014-12-11 10:23:22,627 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15, fetchQueues.getQueueCount=1
2403 2014-12-11 10:23:23,627 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15, fetchQueues.getQueueCount=1
2404 2014-12-11 10:23:24,627 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15, fetchQueues.getQueueCount=1
2405 2014-12-11 10:23:25,628 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15, fetchQueues.getQueueCount=1
3158 2014-12-11 10:27:15,997 INFO fetcher.Fetcher - Thread FetcherThread has no more work available
3159 2014-12-11 10:27:15,997 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=0
3160 2014-12-11 10:27:16,004 INFO fetcher.Fetcher - -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=0
3161 2014-12-11 10:27:16,005 INFO fetcher.Fetcher - -activeThreads=0
3162 2014-12-11 10:27:16,629 INFO fetcher.Fetcher - Fetcher: finished at 2014-12-11 10:27:16, elapsed: 00:00:07
3163 2014-12-11 10:27:17,320 INFO parse.ParseSegment - ParseSegment: starting at 2014-12-11 10:27:17
3164 2014-12-11 10:27:17,320 INFO parse.ParseSegment - ParseSegment: segment: crawl/segments/20141211102707
3165 2014-12-11 10:27:17,591 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
3166 2014-12-11 10:27:18,518 INFO crawl.SignatureFactory - Using Signature impl: org.apache.nutch.crawl.MD5Signature
3167 2014-12-11 10:27:18,528 INFO parse.ParseSegment - Parsed (12ms):http://v.zqgame.com/indexmain
3168 2014-12-11 10:27:18,571 INFO parse.ParseSegment - Parsed (1ms):http://v.zqgame.com/moviePlay/goMoviePlay/4/4
3169 2014-12-11 10:27:18,659 INFO regex.RegexURLNormalizer - can't find rules for scope 'outlink', using default
3170 2014-12-11 10:27:18,871 INFO parse.ParseSegment - ParseSegment: finished at 2014-12-11 10:27:18, elapsed: 00:00:01
3171 2014-12-11 10:27:19,794 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
3172 2014-12-11 10:27:19,810 INFO crawl.CrawlDb - CrawlDb update: starting at 2014-12-11 10:27:19
3173 2014-12-11 10:27:19,810 INFO crawl.CrawlDb - CrawlDb update: db: crawl/crawldb
3174 2014-12-11 10:27:19,811 INFO crawl.CrawlDb - CrawlDb update: segments: [crawl/segments/20141211102707]
3175 2014-12-11 10:27:19,811 INFO crawl.CrawlDb - CrawlDb update: additions allowed: true
3176 2014-12-11 10:27:19,811 INFO crawl.CrawlDb - CrawlDb update: URL normalizing: false
3177 2014-12-11 10:27:19,811 INFO crawl.CrawlDb - CrawlDb update: URL filtering: false
3178 2014-12-11 10:27:19,811 INFO crawl.CrawlDb - CrawlDb update: 404 purging: false
3179 2014-12-11 10:27:19,812 INFO crawl.CrawlDb - CrawlDb update: Merging segment data into db.
3180 2014-12-11 10:27:20,639 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
3181 2014-12-11 10:27:20,639 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000
3182 2014-12-11 10:27:20,639 INFO crawl.AbstractFetchSchedule - maxInterval=7776000
3183 2014-12-11 10:27:21,120 INFO crawl.CrawlDb - CrawlDb update: finished at 2014-12-11 10:27:21, elapsed: 00:00:01
3184 2014-12-11 10:27:22,066 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
3185 2014-12-11 10:27:22,067 INFO crawl.LinkDb - LinkDb: starting at 2014-12-11 10:27:22
3186 2014-12-11 10:27:22,067 INFO crawl.LinkDb - LinkDb: linkdb: crawl/linkdb
3187 2014-12-11 10:27:22,067 INFO crawl.LinkDb - LinkDb: URL normalize: true
3188 2014-12-11 10:27:22,067 INFO crawl.LinkDb - LinkDb: URL filter: true
3189 2014-12-11 10:27:22,068 INFO crawl.LinkDb - LinkDb: internal links will be ignored.
3190 2014-12-11 10:27:22,068 INFO crawl.LinkDb - LinkDb: adding segment: crawl/segments/20141211102707
3191 2014-12-11 10:27:23,376 INFO crawl.LinkDb - LinkDb: merging with existing linkdb: crawl/linkdb
3192 2014-12-11 10:27:23,688 INFO regex.RegexURLNormalizer - can't find rules for scope 'linkdb', using default
3193 2014-12-11 10:27:24,510 INFO crawl.LinkDb - LinkDb: finished at 2014-12-11 10:27:24, elapsed: 00:00:02
3194 2014-12-11 10:27:25,209 INFO crawl.DeduplicationJob - DeduplicationJob: starting at 2014-12-11 10:27:25
3195 2014-12-11 10:27:25,483 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
3196 2014-12-11 10:27:26,760 INFO crawl.DeduplicationJob - Deduplication: 2 documents marked as duplicates
3197 2014-12-11 10:27:26,760 INFO crawl.DeduplicationJob - Deduplication: Updating status of duplicate urls into crawl db.
3198 2014-12-11 10:27:27,931 INFO crawl.DeduplicationJob - Deduplication finished at 2014-12-11 10:27:27, elapsed: 00:00:02
3199 2014-12-11 10:27:28,623 INFO indexer.IndexingJob - Indexer: starting at 2014-12-11 10:27:28
3200 2014-12-11 10:27:28,711 INFO indexer.IndexingJob - Indexer: deleting gone documents: false
3201 2014-12-11 10:27:28,711 INFO indexer.IndexingJob - Indexer: URL filtering: false
3202 2014-12-11 10:27:28,718 INFO indexer.IndexingJob - Indexer: URL normalizing: false
3203 2014-12-11 10:27:28,933 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter
3204 2014-12-11 10:27:28,933 INFO indexer.IndexingJob - Active IndexWriters :
3205 SOLRIndexWriter
3206 solr.server.url : URL of the SOLR instance (mandatory)
3207 solr.commit.size : buffer size when sending to SOLR (default 1000)
3208 solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
3209 solr.auth : use authentication (default false)
3210 solr.auth.username : use authentication (default false)
3211 solr.auth : username for authentication
3212 solr.auth.password : password for authentication
3213
3214
3215 2014-12-11 10:27:28,937 INFO indexer.IndexerMapReduce - IndexerMapReduce: crawldb: crawl/crawldb
3216 2014-12-11 10:27:28,937 INFO indexer.IndexerMapReduce - IndexerMapReduce: linkdb: crawl/linkdb
3217 2014-12-11 10:27:28,937 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20141211102707
3218 2014-12-11 10:27:29,087 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
3219 2014-12-11 10:27:29,585 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off
3220 2014-12-11 10:27:29,995 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter
3221 2014-12-11 10:27:30,022 INFO solr.SolrMappingReader - source: content dest: content
3222 2014-12-11 10:27:30,023 INFO solr.SolrMappingReader - source: title dest: title
3223 2014-12-11 10:27:30,023 INFO solr.SolrMappingReader - source: host dest: host
3224 2014-12-11 10:27:30,023 INFO solr.SolrMappingReader - source: segment dest: segment
3225 2014-12-11 10:27:30,023 INFO solr.SolrMappingReader - source: boost dest: boost
3226 2014-12-11 10:27:30,023 INFO solr.SolrMappingReader - source: digest dest: digest
3227 2014-12-11 10:27:30,023 INFO solr.SolrMappingReader - source: tstamp dest: tstamp
3228 2014-12-11 10:27:30,054 INFO solr.SolrIndexWriter - Indexing 2 documents
3229 2014-12-11 10:27:30,175 INFO solr.SolrIndexWriter - Indexing 2 documents
2014-12-11 10:39:34,707 INFO crawl.Injector - Injector: starting at 2014-12-11 10:39:34
3254 2014-12-11 10:39:34,707 INFO crawl.Injector - Injector: crawlDb: crawl/crawldb
3255 2014-12-11 10:39:34,707 INFO crawl.Injector - Injector: urlDir: urls
3256 2014-12-11 10:39:34,708 INFO crawl.Injector - Injector: Converting injected urls to crawl db entries.
3257 2014-12-11 10:39:34,989 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
3258 2014-12-11 10:39:35,046 WARN snappy.LoadSnappy - Snappy native library not loaded
3259 2014-12-11 10:39:35,528 INFO regex.RegexURLNormalizer - can't find rules for scope 'inject', using default
3260 2014-12-11 10:39:36,273 INFO crawl.Injector - Injector: Total number of urls rejected by filters: 0
3261 2014-12-11 10:39:36,273 INFO crawl.Injector - Injector: Total number of urls after normalization: 1
3262 2014-12-11 10:39:36,273 INFO crawl.Injector - Injector: Merging injected urls into crawl db.
3263 2014-12-11 10:39:36,577 INFO crawl.Injector - Injector: overwrite: false
3264 2014-12-11 10:39:36,577 INFO crawl.Injector - Injector: update: false
3265 2014-12-11 10:39:37,387 INFO crawl.Injector - Injector: URLs merged: 1
3266 2014-12-11 10:39:37,392 INFO crawl.Injector - Injector: Total new urls injected: 0
3267 2014-12-11 10:39:37,392 INFO crawl.Injector - Injector: finished at 2014-12-11 10:39:37, elapsed: 00:00:02
3268 2014-12-11 10:39:38,327 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
3269 2014-12-11 10:39:38,328 INFO crawl.Generator - Generator: starting at 2014-12-11 10:39:38
3270 2014-12-11 10:39:38,328 INFO crawl.Generator - Generator: Selecting best-scoring urls due for fetch.
3271 2014-12-11 10:39:38,328 INFO crawl.Generator - Generator: filtering: false
3272 2014-12-11 10:39:38,328 INFO crawl.Generator - Generator: normalizing: true
3273 2014-12-11 10:39:38,328 INFO crawl.Generator - Generator: topN: 50000
3274 2014-12-11 10:39:38,978 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
3275 2014-12-11 10:39:38,978 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000
3276 2014-12-11 10:39:38,978 INFO crawl.AbstractFetchSchedule - maxInterval=7776000
3277 2014-12-11 10:39:38,987 INFO regex.RegexURLNormalizer - can't find rules for scope 'partition', using default
3278 2014-12-11 10:39:39,040 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
3279 2014-12-11 10:39:39,040 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000
3280 2014-12-11 10:39:39,040 INFO crawl.AbstractFetchSchedule - maxInterval=7776000
3281 2014-12-11 10:39:39,045 INFO regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default
3282 2014-12-11 10:39:39,649 INFO crawl.Generator - Generator: Partitioning selected urls for politeness.
3283 2014-12-11 10:39:40,649 INFO crawl.Generator - Generator: segment: crawl/segments/20141211103940
3284 2014-12-11 10:39:40,814 INFO regex.RegexURLNormalizer - can't find rules for scope 'partition', using default
3285 2014-12-11 10:39:41,755 INFO crawl.Generator - Generator: finished at 2014-12-11 10:39:41, elapsed: 00:00:03
3286 2014-12-11 10:39:42,447 INFO fetcher.Fetcher - Fetcher: starting at 2014-12-11 10:39:42
3287 2014-12-11 10:39:42,447 INFO fetcher.Fetcher - Fetcher: segment: crawl/segments/20141211103940
3288 2014-12-11 10:39:42,447 INFO fetcher.Fetcher - Fetcher Timelimit set for : 1418276382447
3289 2014-12-11 10:39:42,720 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
3290 2014-12-11 10:39:43,171 INFO fetcher.Fetcher - Using queue mode : byHost
3291 2014-12-11 10:39:43,171 INFO fetcher.Fetcher - Fetcher: threads: 50
3292 2014-12-11 10:39:43,171 INFO fetcher.Fetcher - Fetcher: time-out divisor: 2
3293 2014-12-11 10:39:43,182 INFO fetcher.Fetcher - QueueFeeder finished: total 1 records + hit by time limit :0
3294 2014-12-11 10:39:43,336 INFO fetcher.Fetcher - Using queue mode : byHost
3295 2014-12-11 10:39:43,337 INFO fetcher.Fetcher - Using queue mode : byHost
3296 2014-12-11 10:39:43,337 INFO fetcher.Fetcher - fetching http://passport.zqgame.com/common/agreement.jsp (queue crawl delay=5000ms)
3297 2014-12-11 10:39:43,337 INFO fetcher.Fetcher - Thread FetcherThread has no more work available
3298 2014-12-11 10:39:43,337 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=1
3299 2014-12-11 10:39:43,337 INFO fetcher.Fetcher - Using queue mode : byHost
3300 2014-12-11 10:39:43,338 INFO fetcher.Fetcher - Thread FetcherThread has no more work available
3301 2014-12-11 10:39:43,338 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=1
3302 2014-12-11 10:39:43,338 INFO fetcher.Fetcher - Using queue mode : byHost
3303 2014-12-11 10:39:43,338 INFO fetcher.Fetcher - Thread FetcherThread has no more work available
3304 2014-12-11 10:39:43,338 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=1
3305 2014-12-11 10:39:43,338 INFO fetcher.Fetcher - Using queue mode : byHost
3306 2014-12-11 10:39:43,339 INFO fetcher.Fetcher - Thread FetcherThread has no more work available
3307 2014-12-11 10:39:43,339 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=1
3308 2014-12-11 10:39:43,339 INFO fetcher.Fetcher - Using queue mode : byHost
3309 2014-12-11 10:39:43,340 INFO fetcher.Fetcher - Thread FetcherThread has no more work available
3310 2014-12-11 10:39:43,340 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=1
3311 2014-12-11 10:39:43,340 INFO fetcher.Fetcher - Using queue mode : byHost
3312 2014-12-11 10:39:43,340 INFO fetcher.Fetcher - Thread FetcherThread has no more work available
3313 2014-12-11 10:39:43,340 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=1
3314 2014-12-11 10:39:43,340 INFO fetcher.Fetcher - Using queue mode : byHost
3315 2014-12-11 10:39:43,341 INFO fetcher.Fetcher - Thread FetcherThread has no more work available
2014-12-11 10:39:57,352 INFO indexer.IndexerMapReduce - IndexerMapReduce: crawldb: crawl/crawldb
3511 2014-12-11 10:39:57,352 INFO indexer.IndexerMapReduce - IndexerMapReduce: linkdb: crawl/linkdb
3512 2014-12-11 10:39:57,353 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20141211103940
3513 2014-12-11 10:39:57,501 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
3514 2014-12-11 10:39:57,970 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off
3515 2014-12-11 10:39:58,376 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter
3516 2014-12-11 10:39:58,403 INFO solr.SolrMappingReader - source: content dest: content
3517 2014-12-11 10:39:58,403 INFO solr.SolrMappingReader - source: title dest: title
3518 2014-12-11 10:39:58,403 INFO solr.SolrMappingReader - source: host dest: host
3519 2014-12-11 10:39:58,403 INFO solr.SolrMappingReader - source: segment dest: segment
3520 2014-12-11 10:39:58,403 INFO solr.SolrMappingReader - source: boost dest: boost
3521 2014-12-11 10:39:58,403 INFO solr.SolrMappingReader - source: digest dest: digest
3522 2014-12-11 10:39:58,403 INFO solr.SolrMappingReader - source: tstamp dest: tstamp
3523 2014-12-11 10:39:58,434 INFO solr.SolrIndexWriter - Indexing 1 documents
3524 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: content dest: content
3525 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: title dest: title
3526 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: host dest: host
3527 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: segment dest: segment
3528 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: boost dest: boost
3529 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: digest dest: digest
3530 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: tstamp dest: tstamp
3531 2014-12-11 10:40:00,130 INFO indexer.IndexingJob - Indexer: finished at 2014-12-11 10:40:00, elapsed: 00:00:03
3532 2014-12-11 10:40:00,830 INFO indexer.CleaningJob - CleaningJob: starting at 2014-12-11 10:40:00
3533 2014-12-11 10:40:01,101 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
3534 2014-12-11 10:40:01,748 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter
3535 2014-12-11 10:40:01,775 INFO solr.SolrMappingReader - source: content dest: content
3536 2014-12-11 10:40:01,775 INFO solr.SolrMappingReader - source: title dest: title
3537 2014-12-11 10:40:01,775 INFO solr.SolrMappingReader - source: host dest: host
3538 2014-12-11 10:40:01,775 INFO solr.SolrMappingReader - source: segment dest: segment
3539 2014-12-11 10:40:01,775 INFO solr.SolrMappingReader - source: boost dest: boost
3540 2014-12-11 10:40:01,775 INFO solr.SolrMappingReader - source: digest dest: digest
3541 2014-12-11 10:40:01,775 INFO solr.SolrMappingReader - source: tstamp dest: tstamp
3542 2014-12-11 10:40:01,963 INFO indexer.CleaningJob - CleaningJob: deleted a total of 10 documents
3543 2014-12-11 10:40:01,967 WARN mapred.FileOutputCommitter - Output path is null in cleanup
3544 2014-12-11 10:40:02,382 INFO indexer.CleaningJob - CleaningJob: finished at 2014-12-11 10:40:02, elapsed: 00:00:01
3545 2014-12-11 10:40:03,313 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
014-12-11 10:40:01,967 WARN mapred.FileOutputCommitter - Output path is null in cleanup
3544 2014-12-11 10:40:02,382 INFO indexer.CleaningJob - CleaningJob: finished at 2014-12-11 10:40:02, elapsed: 00:00:01
3545 2014-12-11 10:40:03,313 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
3546 2014-12-11 10:40:03,314 INFO crawl.Generator - Generator: starting at 2014-12-11 10:40:03
3547 2014-12-11 10:40:03,314 INFO crawl.Generator - Generator: Selecting best-scoring urls due for fetch.
3548 2014-12-11 10:40:03,314 INFO crawl.Generator - Generator: filtering: false
3549 2014-12-11 10:40:03,314 INFO crawl.Generator - Generator: normalizing: true
3550 2014-12-11 10:40:03,315 INFO crawl.Generator - Generator: topN: 50000
3551 2014-12-11 10:40:03,963 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
3552 2014-12-11 10:40:03,964 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000
3553 2014-12-11 10:40:03,964 INFO crawl.AbstractFetchSchedule - maxInterval=7776000
3554 2014-12-11 10:40:03,972 INFO regex.RegexURLNormalizer - can't find rules for scope 'partition', using default
3555 2014-12-11 10:40:04,062 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
3556 2014-12-11 10:40:04,062 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000
3557 2014-12-11 10:40:04,062 INFO crawl.AbstractFetchSchedule - maxInterval=7776000
3558 2014-12-11 10:40:04,067 INFO regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default
3559 2014-12-11 10:40:04,635 INFO crawl.Generator - Generator: Partitioning selected urls for politeness.
3560 2014-12-11 10:40:05,636 INFO crawl.Generator - Generator: segment: crawl/segments/20141211104005
3561 2014-12-11 10:40:05,803 INFO regex.RegexURLNormalizer - can't find rules for scope 'partition', using default
3562 2014-12-11 10:40:06,747 INFO crawl.Generator - Generator: finished at 2014-12-11 10:40:06, elapsed: 00:00:03
3563 2014-12-11 10:40:07,435 INFO fetcher.Fetcher - Fetcher: starting at 2014-12-11 10:40:07
3564 2014-12-11 10:40:07,435 INFO fetcher.Fetcher - Fetcher: segment: crawl/segments/20141211104005
3565 2014-12-11 10:40:07,435 INFO fetcher.Fetcher - Fetcher Timelimit set for : 1418276407435
3566 2014-12-11 10:40:07,707 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
3567 2014-12-11 10:40:08,157 INFO fetcher.Fetcher - Using queue mode : byHost
3568 2014-12-11 10:40:08,158 INFO fetcher.Fetcher - Fetcher: threads: 50
3569 2014-12-11 10:40:08,158 INFO fetcher.Fetcher - Fetcher: time-out divisor: 2
3570 2014-12-11 10:40:08,187 INFO fetcher.Fetcher - QueueFeeder finished: total 40 records + hit by time limit :0
3571 2014-12-11 10:40:08,326 INFO fetcher.Fetcher - Using queue mode : byHost
3572 2014-12-11 10:40:08,327 INFO fetcher.Fetcher - Using queue mode : byHost
3573 2014-12-11 10:40:08,327 INFO fetcher.Fetcher - fetching http://hxjh.zqgame.com/ (queue crawl delay=5000ms)
3574 2014-12-11 10:40:08,328 INFO fetcher.Fetcher - fetching http://lt.zqgame.com/ (queue crawl delay=5000ms)
3575 2014-12-11 10:40:08,328 INFO fetcher.Fetcher - Using queue mode : byHost
3576 2014-12-11 10:40:08,328 INFO fetcher.Fetcher - fetching http://zscq.zqgame.com/ (queue crawl delay=5000ms)
3577 2014-12-11 10:40:08,328 INFO fetcher.Fetcher - Using queue mode : byHost
3578 2014-12-11 10:40:08,329 INFO fetcher.Fetcher - fetching http://lj2.zqgame.com/ (queue crawl delay=5000ms)
3523 2014-12-11 10:39:58,434 INFO solr.SolrIndexWriter - Indexing 1 documents
3524 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: content dest: content
3525 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: title dest: title
3526 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: host dest: host
3527 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: segment dest: segment
3528 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: boost dest: boost
3529 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: digest dest: digest
3530 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: tstamp dest: tstamp
3531 2014-12-11 10:40:00,130 INFO indexer.IndexingJob - Indexer: finished at 2014-12-11 10:40:00, elapsed: 00:00:03
3532 2014-12-11 10:40:00,830 INFO indexer.CleaningJob - CleaningJob: starting at 2014-12-11 10:40:00
3533 2014-12-11 10:40:01,101 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
3534 2014-12-11 10:40:01,748 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter
3535 2014-12-11 10:40:01,775 INFO solr.SolrMappingReader - source: content dest: content
14550 2014-12-11 10:59:29,551 INFO fetcher.Fetcher - fetching http://pay.zqgame.com/pay/toPayPage/dxpc/107 (queue crawl delay=5000ms)
14551 2014-12-11 10:59:29,703 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=49, fetchQueues.getQueueCount=1