修正htmlcxx中的几处bug-CSDN博客

2019独角兽企业重金招聘Python工程师标准>>>

1.Node::parseAttributes 在解析这种<script> 完全没有属性的标签会有问题！
   原因：
   Node.cc(Line28)
   while (!isspace(*ptr))
   {
       ++ptr;//这种处理方式就断言了：tag名和'>'符号之间一定有空格
   }
   修改如下：
   while (!isspace(*ptr))
   {
       if(*ptr == '>')
           return;
       ++ptr;
   }

2.Node::parseAttributes 在解析这种:
   <script src=http://60.190.236.11:8000/stat.js?¨></script> 标签时会有问题！
   原因：
   stat.js?后面两个字节是a1 a7，（还有中文问题）显然不是ASCII表中的。
   而Node.cc(Line89) 有while (*end && !isspace(*end) && *end != '>') end++;
   其中isspace(*end)中会有如下断言：_ASSERTE((unsigned)(c + 1) <= 256); //要求调用者保证传入的参数必须属于ASCII码
   修改如下：
   //while (*end && !isspace(*end) && *end != '>') end++;
   while (*end &&((unsigned)*end > 255 || !isspace(*end) ) && *end != '>') end++;

3.ParserSax.tcc也存在如上相同问题：
   修改如下
   template <typename _Iterator>
   _Iterator
   htmlcxx::HTML::ParserSax::skipHtmlComment(_Iterator c, _Iterator end)
   {
       while ( c != end ) {
           if (*c++ == '-' && c != end && *c == '-')
           {
               _Iterator d(c);
               while (++c != end &&((unsigned)*c > 255 || !isspace(*c) ) && *c != '>');
               if (c == end || *c++ == '>') break;
               c = d;
           }
       }

       return c;
   }