www.hexaworld.net网站采集规则更新，原来的还可以用，但是会频繁出现空章节采集

最新推荐文章于 2021-06-03 22:21:29 发布

a8849516

最新推荐文章于 2021-06-03 22:21:29 发布

阅读量321

点赞数

本文链接：https://blog.csdn.net/a8849516/article/details/103719482

版权

这小说网站更新了。我重新弄了下。采集不了的换这个把，测试正常的。

关于过滤这我没多少时间看，如果可以的话，自己多去查查他的内页。

<?xml version="1.0" encoding="UTF-8"?>
<RuleConfigInfo xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="https://www.w3.org/2001/XMLSchema">
<NovelIntro>
<RegexName>NovelIntro</RegexName>
<Pattern><meta property="og:description" content="((.|\n)*?)"/></Pattern>
<Method/>
<FilterPattern/>
<Options/>
</NovelIntro>
<PubContentText>
<RegexName>PubContentText</RegexName>
<Pattern><div id="content">((.|\n)*?)</div></Pattern>
<Method/>
<FilterPattern>积极小说
www.hexaworld.net
m.hexaworld.net
<script.+?</script>|<div.+?>|</div>|<p>|</p>
【<b>(.|\n)*?</B>】♂</FilterPattern>
<Options/>
</PubContentText>
<NovelSearchUrl>
<RegexName>NovelSearchUrl</RegexName>
<Pattern/>
<Method/>
<FilterPattern/>
<Options/>
</NovelSearchUrl>
<NovelList_GetNovelKey>
<RegexName>NovelList_GetNovelKey</RegexName>
<Pattern><span class="s2"><a href="/info/.+?/(.+?).html">.+?</a></Pattern>

<Method/>
<FilterPattern/>
<Options/>
</NovelList_GetNovelKey>
<NovelListUrl>
<RegexName>NovelListUrl</RegexName>
<Pattern>https://www.hexaworld.net/list/1.html
https://www.hexaworld.net/list/2.html
https://www.hexaworld.net/list/3.html
https://www.hexaworld.net/list/4.html
https://www.hexaworld.net/list/5.html
https://www.hexaworld.net/list/6.html
https://www.hexaworld.net/list/7.html
https://www.hexaworld.net/list/8.html
https://www.hexaworld.net/list/9.html
https://www.hexaworld.net/list/10.html</Pattern>
<Method/>
<FilterPattern/>
<Options/>
</NovelListUrl>
<PubChapterRegion>
<RegexName>PubChapterRegion</RegexName>
<Pattern/>
<Method/>
<FilterPattern/>
<Options/>
</PubChapterRegion>
<NovelName>
<RegexName>NovelName</RegexName>
<Pattern><meta property="og:title" content="(.+?)"/></Pattern>
<Method/>
<FilterPattern/>
<Options/>
</NovelName>
<NovelSearch_GetNovelName>
<RegexName>NovelSearch_GetNovelName</RegexName>
<Pattern/>
<Method/>
<FilterPattern/>
<Options/>
</NovelSearch_GetNovelName>
<NovelList_GetNovelKey2>
<RegexName>NovelList_GetNovelKey2</RegexName>
<Pattern/>
<Method/>
<FilterPattern/>
<Options/>
</NovelList_GetNovelKey2>
<LagerSort>
<RegexName>LagerSort</RegexName>
<Pattern><meta property="og:novel:category" content="(.+?)"/></Pattern>
<Method/>
<FilterPattern/>
<Options/>
</LagerSort>
<SmallSort>
<RegexName>SmallSort</RegexName>
<Pattern><meta property="og:novel:category" content="(.+?)"/></Pattern>
<Method/>
<FilterPattern/>
<Options/>
</SmallSort>
<GetSiteUrl>
<RegexName>GetSiteUrl</RegexName>
<Pattern>https://www.hexawolrd.net</Pattern>
<Method/>
<FilterPattern/>
<Options/>
</GetSiteUrl>
<TestSearchNovelName>
<RegexName>TestSearchNovelName</RegexName>
<Pattern/>
<Method/>
<FilterPattern/>
<Options/>
</TestSearchNovelName>
<NovelDegree>
<RegexName>NovelDegree</RegexName>
<Pattern><meta property="og:novel:status" content="(.+?)"/></Pattern>
<Method/>
<FilterPattern/>
<Options/>
</NovelDegree>
<PubContentText_FT2JT>
<RegexName>PubContentText_FT2JT</RegexName>
<Pattern>false</Pattern>
<Method/>
<FilterPattern/>
<Options/>
</PubContentText_FT2JT>
<NovelAuthor>
<RegexName>NovelAuthor</RegexName>
<Pattern><meta property="og:novel:author" content="(.+?)"/></Pattern>
<Method/>
<FilterPattern/>
<Options/>
</NovelAuthor>
<NovelInfo_GetNovelPubKey>
<RegexName>NovelInfo_GetNovelPubKey</RegexName>
<Pattern><meta property="og:novel:read_url" content="(.+?)"/></Pattern>
<Method/>
<FilterPattern/>
<Options/>
</NovelInfo_GetNovelPubKey>
<PubContentText_ASCII>
<RegexName>PubContentText_ASCII</RegexName>
<Pattern>false</Pattern>
<Method/>
<FilterPattern/>
<Options/>
</PubContentText_ASCII>
<NovelCover>
<RegexName>NovelCover</RegexName>
<Pattern><meta property="og:image" content="(.+?)"/></Pattern>
<Method/>
<FilterPattern/>
<Options/>
</NovelCover>
<RuleVersion>
<RegexName>RuleVersion</RegexName>
<Pattern>2</Pattern>
<Method/>
<FilterPattern/>
<Options/>
</RuleVersion>
<PubContentText_BJ2QJ>
<RegexName>PubContentText_BJ2QJ</RegexName>
<Pattern>false</Pattern>
<Method/>
<FilterPattern/>
<Options/>
</PubContentText_BJ2QJ>
<NovelInfoExtra>
<RegexName>NovelInfoExtra</RegexName>
<Pattern/>
<Method/>
<FilterPattern/>
<Options/>
</NovelInfoExtra>
<PubIndexUrl>
<RegexName>PubIndexUrl</RegexName>
<Pattern>{NovelPubKey}</Pattern>
<Method/>
<FilterPattern/>
<Options/>
</PubIndexUrl>
<NovelDefaultCoverUrl>
<RegexName>NovelDefaultCoverUrl</RegexName>
<Pattern>https://www.hexawolrd.net/cover/nocover.jpg</Pattern>
<Method/>
<FilterPattern/>
<Options/>
</NovelDefaultCoverUrl>
<PubContentUrl2>
<RegexName>PubContentUrl2</RegexName>
<Pattern/>
<Method/>
<FilterPattern/>
<Options/>
</PubContentUrl2>
<PubContentUrl>
<RegexName>PubContentUrl</RegexName>
<Pattern>{ChapterKey}</Pattern>
<Method/>
<FilterPattern/>
<Options/>
</PubContentUrl>
<GetSiteName>
<RegexName>GetSiteName</RegexName>
<Pattern>hexaworld</Pattern>
<Method/>
<FilterPattern/>
<Options/>
</GetSiteName>
<PubChapterName>
<RegexName>PubChapterName</RegexName>
<Pattern><a href=".+?" title=".+?">(.+?)</a></Pattern>
<Method/>
<FilterPattern/>
<Options/>
</PubChapterName>
<GetSiteCharset>
<RegexName>GetSiteCharset</RegexName>
<Pattern>utf8</Pattern>
<Method/>
<FilterPattern/>
<Options/>
</GetSiteCharset>
<PubChapter_GetChapterKey>
<RegexName>PubChapter_GetChapterKey</RegexName>
<Pattern><a href="(.+?)" title=".+?">.+?</a></Pattern>
<Method/>
<FilterPattern/>
<Options/>
</PubChapter_GetChapterKey>
<NovelSearch_GetNovelKey>
<RegexName>NovelSearch_GetNovelKey</RegexName>
<Pattern/>
<Method/>
<FilterPattern/>
<Options/>
</NovelSearch_GetNovelKey>
<NovelKeyword>
<RegexName>NovelKeyword</RegexName>
<Pattern/>
<Method/>
<FilterPattern/>
<Options/>
</NovelKeyword>
<NovelUrl>
<RegexName>NovelUrl</RegexName>
<Pattern>https://www.hexawolrd.net/info/10/{NovelKey}.html</Pattern>
<Method/>
<FilterPattern/>
<Options/>
</NovelUrl>
</RuleConfigInfo>

过滤这，我没多看，需要这个采集规则的可以去多看下他的小说内容页面，看下他加了什么广告内容么。 www.hexaworld.net

多去他的内页看看。有些东西我也看不全面的。

易读站不多，我找了下找到一些：

www.next-bet.com
www.cosender.com
www.vivaluta.com
www.sandyall.com

www.vgango.com
www.dosrojos.com
www.aavpccv.com
www.infected-mushroom.net
www.peolple.com
www.athomechecking.com
www.888cqdl.cn
www.666cqdl.cn
www.518cqdl.com
www.178cqdl.cn

这些网站都可以用这个规则进行套，改下过滤和域名就可以了,如果有人发现别的网站可以留言我，我一起写出来分享给大家。

a8849516

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
www.hexaworld.net网站采集规则更新，原来的还可以用，但是会频繁出现空章节采集

这小说网站更新了。我重新弄了下。采集不了的换这个把，测试正常的。关于过滤这我没多少时间看，如果可以的话，自己多去查查他的内页。<?xml version="1.0" encoding="UTF-8"?><RuleConfigInfo xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" xmlns:xsd=...
复制链接

扫一扫

www.hexaworld.net网站采集规则更新，原来 的还可以用，但是会频繁出现空章节采集

“相关推荐”对你有帮助么？

www.hexaworld.net网站采集规则更新，原来的还可以用，但是会频繁出现空章节采集