得到页面中的所有链接函数

最新推荐文章于 2019-06-12 14:37:04 发布

weixin_30666943

最新推荐文章于 2019-06-12 14:37:04 发布

阅读量65

点赞数

原文链接：http://www.cnblogs.com/ysuhy/archive/2008/08/31/1280442.html

版权

这个函数是以前做搜索引擎的时候遇见过的,

x1代表的是当前文档的http地址,而x2代表的是页面中的链接地址

Code
public string GetUrl(string x1,string x2)
{
           bool panduan = false;


           if (x1.IndexOf("http://") == -1)
           { x1 = "http://" + x1;}

           if (x2.IndexOf("http://") != -1)
           {
               panduan = true;
           }

            if (x2.IndexOf("/") == 0 && panduan==false)
             {
                 panduan = true;
                   if (x1.IndexOf("/", 7) != -1)
                   {
                       x1 = x1.Substring(0, x1.IndexOf("/", 7));

                          x2 = x1 +     x2;
                                      }
                   else
                   {
                       x2 = x1 + x2;
                   }
             }


              if (x2.IndexOf("../") == 0 && panduan==false)
               {
                   panduan = true;
                       x2 = x2.Substring(3);

                       if (x1.IndexOf("/", 7) != -1)
                       {
                           x1 = x1.Substring(0, x1.IndexOf("/", 7));

                           x2 = x1 + "/" + x2;

                       }
                       else
                       {

                           x2 = x1 + x2;

                       }
               }


               if (x2.IndexOf("/") != 0 && x2.IndexOf("../") != 0 && panduan==false)
               {

                                   x1 = x1.Substring(0, x1.LastIndexOf("/"));
                       x2 = x1 +"/"+ x2;

                           }


                 return x2;
}

"http://www.cnblogs.com/ 这样的没有再判断，一般也用不着。"

如此之后返回的是新链接的绝对地址

如果要得到页面中的所有链接,可以使用下面的代码

可以得到href.src的所有链接，包括带双引号，单引号，没有引号的链接，使用MTracer.exe测试通过

转载于:https://www.cnblogs.com/ysuhy/archive/2008/08/31/1280442.html

weixin_30666943

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
得到页面中的所有链接函数

这个函数是以前做搜索引擎的时候遇见过的,x1代表的是当前文档的http地址,而x2代表的是页面中的链接地址CodepublicstringGetUrl(stringx1,stringx2){boolpanduan=false;if(x1.IndexOf("http://")==-1)...
复制链接

扫一扫