html使用第三方标签,不使用第三方框架获取html页面某个标签的某个属性值

();

String reg = "]*?\\s" + attr + "=['\"]?(.*?)['\"]?(\\s.*?)?>";

Matcher m = Pattern.compile(reg).matcher(source);

while (m.find()) {

String r = m.group(1);

result.add(r);

}

return result;

}

```

调用demo:

```

public static void main(String[] args) {

String url = "https://www.dy2018.com/i/99671.html";

String params = "";

String html = httpSendGet(url,params,"gb2312");

Listlinks = match(html,"a","href");

System.out.println(links);

}

```

这里需要说明一下,httpSendGet的charsetName参数需要注意,不然你获取的html文本会是乱码。

最后展示一下结果(当然结果还不纯粹,需要过滤):

[/, /2/, /0/, /3/, /1/, /4/, /8/, /5/, /7/, /14/, /15/, /html/tv/hytv/index.html, /html/tv/oumeitv/index.html, /html/tv/rihantv/index.html, /html/zongyi2013/index.html, /html/dongman/index.html, /support/GuestBook.php, #, index.html, /, /html/tv/, /html/tv/hytv/, javascript:window.external.addFavorite('http://www.dy2018.com/','dy2018.com-电影天堂')"class="style11, /webPlay/play-id-99671-collection-37.html, /webPlay/play-id-99671-collection-36.html, /webPlay/play-id-99671-collection-35.html, /webPlay/play-id-99671-collection-34.html, /webPlay/play-id-99671-collection-33.html, /webPlay/play-id-99671-collection-32.html, /webPlay/play-id-99671-collection-31.html, /webPlay/play-id-99671-collection-30.html, /webPlay/play-id-99671-collection-29.html, /webPlay/play-id-99671-collection-28.html, /webPlay/play-id-99671-collection-27.html, /webPlay/play-id-99671-collection-26.html, /webPlay/play-id-99671-collection-25.html, /webPlay/play-id-99671-collection-24.html, /webPlay/play-id-99671-collection-23.html, /webPlay/play-id-99671-collection-22.html, /webPlay/play-id-99671-collection-21.html, /webPlay/play-id-99671-collection-20.html, /webPlay/play-id-99671-collection-19.html, /webPlay/play-id-99671-collection-18.html, /webPlay/play-id-99671-collection-17.html, /webPlay/play-id-99671-collection-16.html, /webPlay/play-id-99671-collection-15.html, /webPlay/play-id-99671-collection-14.html, /webPlay/play-id-99671-collection-13.html, /webPlay/play-id-99671-collection-12.html, /webPlay/play-id-99671-collection-11.html, /webPlay/play-id-99671-collection-10.html, /webPlay/play-id-99671-collection-9.html, /webPlay/play-id-99671-collection-8.html, /webPlay/play-id-99671-collection-7.html, /webPlay/play-id-99671-collection-6.html, /webPlay/play-id-99671-collection-5.html, /webPlay/play-id-99671-collection-4.html, /webPlay/play-id-99671-collection-3.html, /webPlay/play-id-99671-collection-2.html, /webPlay/play-id-99671-collection-1.html, /webPlay/play-id-99671-collection-0.html, ftp://g:g@tv.kaida365.com:2166/一千零一夜35.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜34.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜33.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜32.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜31.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜30.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜29.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜28.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜27.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜26.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜25.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜24.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜23.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜22.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜21.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜20.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜19.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜18.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜17.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜16.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜15.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜14.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜13.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜12.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜11.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜10.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜09.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜08.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜07.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜06.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜05.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜04.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜03.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜02.mp4, ftp://g:g@tv.kaida365.com:2166/一千零一夜01.mp4, /i/99743.html, /i/99734.html, /i/99733.html, /i/99725.html, /i/99720.html, /i/99719.html, /i/99716.html, /i/99708.html, /i/99704.html, /i/99695.html, /i/97129.html, /i/97575.html, /i/97041.html, /i/92091.html, /i/97637.html, /i/92020.html, /i/95187.html, /i/92000.html, /i/98343.html, /i/97363.html]

不想自己写正则表达式,可以使用第三方爬虫框架,这方面网上找很多的,我就不写了。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值