将HTML转化为XML的简单可行办法

 
The method has been found out after several days' hard work. Very simple and high-developing  efficiency.
 
Steps:
 
1 Download the required web page
 
2 Tidy Webpage into standard xhtml file
      2.1 Translate Entities   &--> &
      2.2 Strong tag pair   <span> <meta> <br> <link> <img>
      2.3 Add XML features, PI,  encoding.... 
      2.4 The Quote Symbol
 
3 Retag current xhtml wtih followiing rules:
   method 1:  add "_d(num)" to current tags
                    where the (num) is node depth from document root.
   method 2:  add  "_tl(num)" to current tags
                    where the (num) is the table depth of current node relative to node body.
 
   Both rules are applied to all nodes execpt, Preprocessor Instructions , comments nodes and script nodes.
 
 4  Write out the re-tagged xhtml as xml file
    Remove namespace of xhml from here, otherwise xslt can not work well
 
5   write corresponding xlst file
       Notice here:  Clear your special  template or element
 
6  Write perfect schema file
 
7  Transform to get the final xml file.
                Make sure that you have got correct character encoding. Otherwise, MSXML will fail.
Nice steps.
 
 My question is: how to access attribute value in ? <a href=""> </a>
 
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值