在本教程中,我们将向您展示如何从HTML页面提取超链接。 例如,要从以下内容获取链接:
this is text1 <a href='mkyong.com' target='_blank'>hello</a> this is text2...
- 首先从“价值”
a
标签-结果:a href='mkyong.com' target='_blank'
- 稍后从上面提取的值中获取“链接” –结果:
mkyong.com
1.正则表达式模式
提取标签正则表达式模式
(?i)<a([^>]+)>(.+?)</a>
从标签正则表达式模式中提取链接
\s*(?i)href\s*=\s*(\"([^"]*\")|'[^']*'|([^'">\s]+));
描述
( #start of group #1
?i # all checking are case insensive
) #end of group #1
<a #start with "<a"
( # start of group #2
[^>]+ # anything except (">"), at least one character
) # end of group #2
> # follow by ">"
(.+?) # match anything
</a> # end with "</a>
\s* #can start with whitespace
(?i) # all checking are case insensive
href # follow by "href" word
\s*=\s* # allows spaces on either side of the equal sign,
( # start of group #1
"([^"]*") # allow string with double quotes e