1 , <config charset="utf-8">
<var-def name="start">
<html-to-xml>
<http url="http://www.tianya.cn/bbs/index.shtml" charset="utf-8" />
</html-to-xml>
</var-def>
<var-def name="ulList">
<xpath expression="//div[@class='bankuai_list']">
<var name="start" />
</xpath>
</var-def>
<file action="write" path="tianya/siteboards.xml" charset="utf-8">
<![CDATA[ <site> ]]>
<loop item="item" index="i">
<list><var name="ulList"/></list>
<body>
<xquery>
<xq-param name="item">
<var name="item"/>
</xq-param>
<xq-expression><![CDATA[
declare variable $item as node() external;
<board boardname="{normalize-space(data($item//h3/text()))}" boardurl="">
{
for $row in $item//li return
<board boardname="{normalize-space(data($row//a/text()))}" boardurl="{normalize-space(data($row/a/@href))}" />
}
</board>
]]></xq-expression>
</xquery>
</body>
</loop>
<![CDATA[ </site> ]]>
</file>
</config>
这个设置装备摆设文件分为三个部门:
1. 界说爬虫进口:
<var-def name="start">
<html-to-xml>
<http url="http://www.tianya.cn/bbs/index.shtml" charset="utf-8" />
</html-to-xml>
</var-def>
2 ,<var-def name = "requestURL">
http://www.informatik.uni-trier.de/~ley/db/conf/IEEEscc/scc2009.html
</var-def>
<var-def name = "confXML">
http://dblp.uni-trier.de/rec/bibtex/conf/IEEEscc/2009.xml
</var-def>
<var-def name = "article_link">
<xquery>
<xq-param name="doc">
<html-to-xml>
<http url = "${requestURL}"/>
</html-to-xml>
</xq-param>
<xq-param name="confXML" type = "string">
<var name = "confXML"/>
</xq-param>
<xq-expression><![CDATA[
declare variable $doc as node() external;
declare variable $confXML as xs:string external;
<asdfasd>
{ for $x in $doc//a
where $x/@href = $confXML and matches($x/@href,"http:.*\.xml")
return
$x/@href
}
</asdfasd>
]]></xq-expression>
</xquery>
</var-def>
1. 前面定义的变量在Xquery中不能使用,必须在xq-param中再次定义变量去context中定义的值。
2. 在xq-expression中使用变量需要采用declare variable $name as xs:string external。
3. 声明(declare variable $name as xs:string external)需要在加xs:***否则报错。
4. 在返回值是 <asdfasd>
{ for $x in $doc//a
where $x/@href = $confXML and matches($x/@href,"http:.*\.xml")
return
$x/@href
}
</asdfasd>返回结果是计算了for语句后的内容<asdfasd href="http://dblp.uni-trier.de/rec/bibtex/conf/IEEEscc/2009.xml"/>
去了大括号返回<asdfasd>
for $x in $doc//a
where $x/@href = $confXML and matches($x/@href,"http:.*\.xml")
return
$x/@href
</asdfasd>一个字就是怪
<var-def name="start">
<html-to-xml>
<http url="http://www.tianya.cn/bbs/index.shtml" charset="utf-8" />
</html-to-xml>
</var-def>
<var-def name="ulList">
<xpath expression="//div[@class='bankuai_list']">
<var name="start" />
</xpath>
</var-def>
<file action="write" path="tianya/siteboards.xml" charset="utf-8">
<![CDATA[ <site> ]]>
<loop item="item" index="i">
<list><var name="ulList"/></list>
<body>
<xquery>
<xq-param name="item">
<var name="item"/>
</xq-param>
<xq-expression><![CDATA[
declare variable $item as node() external;
<board boardname="{normalize-space(data($item//h3/text()))}" boardurl="">
{
for $row in $item//li return
<board boardname="{normalize-space(data($row//a/text()))}" boardurl="{normalize-space(data($row/a/@href))}" />
}
</board>
]]></xq-expression>
</xquery>
</body>
</loop>
<![CDATA[ </site> ]]>
</file>
</config>
这个设置装备摆设文件分为三个部门:
1. 界说爬虫进口:
<var-def name="start">
<html-to-xml>
<http url="http://www.tianya.cn/bbs/index.shtml" charset="utf-8" />
</html-to-xml>
</var-def>
2 ,<var-def name = "requestURL">
http://www.informatik.uni-trier.de/~ley/db/conf/IEEEscc/scc2009.html
</var-def>
<var-def name = "confXML">
http://dblp.uni-trier.de/rec/bibtex/conf/IEEEscc/2009.xml
</var-def>
<var-def name = "article_link">
<xquery>
<xq-param name="doc">
<html-to-xml>
<http url = "${requestURL}"/>
</html-to-xml>
</xq-param>
<xq-param name="confXML" type = "string">
<var name = "confXML"/>
</xq-param>
<xq-expression><![CDATA[
declare variable $doc as node() external;
declare variable $confXML as xs:string external;
<asdfasd>
{ for $x in $doc//a
where $x/@href = $confXML and matches($x/@href,"http:.*\.xml")
return
$x/@href
}
</asdfasd>
]]></xq-expression>
</xquery>
</var-def>
1. 前面定义的变量在Xquery中不能使用,必须在xq-param中再次定义变量去context中定义的值。
2. 在xq-expression中使用变量需要采用declare variable $name as xs:string external。
3. 声明(declare variable $name as xs:string external)需要在加xs:***否则报错。
4. 在返回值是 <asdfasd>
{ for $x in $doc//a
where $x/@href = $confXML and matches($x/@href,"http:.*\.xml")
return
$x/@href
}
</asdfasd>返回结果是计算了for语句后的内容<asdfasd href="http://dblp.uni-trier.de/rec/bibtex/conf/IEEEscc/2009.xml"/>
去了大括号返回<asdfasd>
for $x in $doc//a
where $x/@href = $confXML and matches($x/@href,"http:.*\.xml")
return
$x/@href
</asdfasd>一个字就是怪