采集规则四:小说网站 www.dosrojos.com 适用于易读系统的采集规则

有朋友说不会替换和查找过滤,那我就一个一个站弄下吧。没多少时间,一天发一个吧

过滤信息在 <PubContentText>这。可以参考下

易读小说系统自带的采集器测试可用。关关不知道是否可以用

<?xml version="1.0" encoding="UTF-8"?>
<RuleConfigInfo xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="https://www.w3.org/2001/XMLSchema">
 <NovelIntro>
  <RegexName>NovelIntro</RegexName>
  <Pattern>&lt;meta property="og:description" content="((.|\n)*?)"/&gt;</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelIntro>
 <PubContentText>
  <RegexName>PubContentText</RegexName>
  <Pattern>&lt;div id="content"&gt;((.|\n)*?)&lt;/div&gt;</Pattern>
  <Method/>
  <FilterPattern>桔桔小说网
手机站-m.dosrojos.com
www.dosrojos.com
m.dosrojos.com
&lt;script.+?&lt;/script&gt;|&lt;div.+?&gt;|&lt;/div&gt;|&lt;p&gt;|&lt;/p&gt;
【&lt;b&gt;(.|\n)*?&lt;/B&gt;】♂</FilterPattern>
  <Options/>
 </PubContentText>
 <NovelSearchUrl>
  <RegexName>NovelSearchUrl</RegexName>
  <Pattern/>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelSearchUrl>
 <NovelList_GetNovelKey>
  <RegexName>NovelList_GetNovelKey</RegexName>
  <Pattern>&lt;span class="s2"&gt;&lt;a href="/info/.+?/(.+?).html"&gt;.+?&lt;/a&gt;</Pattern>
  
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelList_GetNovelKey>
 <NovelListUrl>
  <RegexName>NovelListUrl</RegexName>
  <Pattern>https://www.dosrojos.com/list/1.html
https://www.dosrojos.com/list/2.html
https://www.dosrojos.com/list/3.html
https://www.dosrojos.com/list/4.html
https://www.dosrojos.com/list/5.html
https://www.dosrojos.com/list/6.html
https://www.dosrojos.com/list/7.html
https://www.dosrojos.com/list/8.html
https://www.dosrojos.com/list/9.html
https://www.dosrojos.com/list/10.html</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelListUrl>
 <PubChapterRegion>
  <RegexName>PubChapterRegion</RegexName>
  <Pattern/>
  <Method/>
  <FilterPattern/>
  <Options/>
 </PubChapterRegion>
 <NovelName>
  <RegexName>NovelName</RegexName>
  <Pattern>&lt;meta property="og:title" content="(.+?)"/&gt;</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelName>
 <NovelSearch_GetNovelName>
  <RegexName>NovelSearch_GetNovelName</RegexName>
  <Pattern/>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelSearch_GetNovelName>
 <NovelList_GetNovelKey2>
  <RegexName>NovelList_GetNovelKey2</RegexName>
  <Pattern/>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelList_GetNovelKey2>
 <LagerSort>
  <RegexName>LagerSort</RegexName>
  <Pattern>&lt;meta property="og:novel:category" content="(.+?)"/&gt;</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </LagerSort>
 <SmallSort>
  <RegexName>SmallSort</RegexName>
  <Pattern>&lt;meta property="og:novel:category" content="(.+?)"/&gt;</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </SmallSort>
 <GetSiteUrl>
  <RegexName>GetSiteUrl</RegexName>
  <Pattern>https://www.dosrojos.com</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </GetSiteUrl>
 <TestSearchNovelName>
  <RegexName>TestSearchNovelName</RegexName>
  <Pattern/>
  <Method/>
  <FilterPattern/>
  <Options/>
 </TestSearchNovelName>
 <NovelDegree>
  <RegexName>NovelDegree</RegexName>
  <Pattern>&lt;meta property="og:novel:status" content="(.+?)"/&gt;</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelDegree>
 <PubContentText_FT2JT>
  <RegexName>PubContentText_FT2JT</RegexName>
  <Pattern>false</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </PubContentText_FT2JT>
 <NovelAuthor>
  <RegexName>NovelAuthor</RegexName>
  <Pattern>&lt;meta property="og:novel:author" content="(.+?)"/&gt;</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelAuthor>
 <NovelInfo_GetNovelPubKey>
  <RegexName>NovelInfo_GetNovelPubKey</RegexName>
  <Pattern>&lt;meta property="og:novel:read_url" content="(.+?)"/&gt;</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelInfo_GetNovelPubKey>
 <PubContentText_ASCII>
  <RegexName>PubContentText_ASCII</RegexName>
  <Pattern>false</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </PubContentText_ASCII>
 <NovelCover>
  <RegexName>NovelCover</RegexName>
  <Pattern>&lt;meta property="og:image" content="(.+?)"/&gt;</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelCover>
 <RuleVersion>
  <RegexName>RuleVersion</RegexName>
  <Pattern>2</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </RuleVersion>
 <PubContentText_BJ2QJ>
  <RegexName>PubContentText_BJ2QJ</RegexName>
  <Pattern>false</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </PubContentText_BJ2QJ>
 <NovelInfoExtra>
  <RegexName>NovelInfoExtra</RegexName>
  <Pattern/>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelInfoExtra>
 <PubIndexUrl>
  <RegexName>PubIndexUrl</RegexName>
  <Pattern>{NovelPubKey}</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </PubIndexUrl>
 <NovelDefaultCoverUrl>
  <RegexName>NovelDefaultCoverUrl</RegexName>
  <Pattern>https://www.dosrojos.com/cover/nocover.jpg</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelDefaultCoverUrl>
 <PubContentUrl2>
  <RegexName>PubContentUrl2</RegexName>
  <Pattern/>
  <Method/>
  <FilterPattern/>
  <Options/>
 </PubContentUrl2>
 <PubContentUrl>
  <RegexName>PubContentUrl</RegexName>
  <Pattern>{ChapterKey}</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </PubContentUrl>
 <GetSiteName>
  <RegexName>GetSiteName</RegexName>
  <Pattern>dosrojos.com</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </GetSiteName>
 <PubChapterName>
  <RegexName>PubChapterName</RegexName>
  <Pattern>&lt;a href=".+?" title=".+?"&gt;(.+?)&lt;/a&gt;</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </PubChapterName>
 <GetSiteCharset>
  <RegexName>GetSiteCharset</RegexName>
  <Pattern>utf8</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </GetSiteCharset>
 <PubChapter_GetChapterKey>
  <RegexName>PubChapter_GetChapterKey</RegexName>
  <Pattern>&lt;a href="(.+?)" title=".+?"&gt;.+?&lt;/a&gt;</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </PubChapter_GetChapterKey>
 <NovelSearch_GetNovelKey>
  <RegexName>NovelSearch_GetNovelKey</RegexName>
  <Pattern/>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelSearch_GetNovelKey>
 <NovelKeyword>
  <RegexName>NovelKeyword</RegexName>
  <Pattern/>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelKeyword>
 <NovelUrl>
  <RegexName>NovelUrl</RegexName>
  <Pattern>https://www.dosrojos.com/info/10/{NovelKey}.html</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelUrl>
</RuleConfigInfo>

过滤这,我没多看,需要这个采集规则的可以去多看下他的小说内容页面,看下他加了什么广告内容么。桔桔小说网

易读站不多,我找了下找到一些:

www.vgango.com
www.dosrojos.com
www.aavpccv.com
www.infected-mushroom.net
www.peoLpLe.com
www.hexaworLd.net
www.athomechecking.com
www.888cqdL.cn
www.666cqdL.cn
www.518cqdL.com
www.178cqdL.cn
www.next-bet.com
www.cosender.com
www.vivaLuta.com
www.sandyall.com

这些网站都可以用这个规则进行套,改下过滤和域名就可以了。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值