如何使用PowerShell提取任何网页上的链接

最新推荐文章于 2024-06-30 18:42:14 发布

culinxia2707

最新推荐文章于 2024-06-30 18:42:14 发布

阅读量1.1k

点赞数 1

文章标签： python linux java 人工智能 html

原文链接：https://www.howtogeek.com/124736/stupid-geek-tricks-extract-links-off-any-webpage-using-powershell/

版权

PowerShell 3 has a lot of new features, including some powerful new web-related features. They dramatically simplify automating the web, and today we are going to show you how you can extract every single link off a webpage, and optionally download the resource if you so wish.

PowerShell 3具有许多新功能，包括一些强大的与Web相关的新功能。它们极大地简化了网络的自动化，今天，我们将向您展示如何提取网页上的每个链接，并根据需要选择下载资源。

使用PowerShell爬网 (Scraping The Web With PowerShell)

There are two new cmdlets that make automating the web easier, Invoke-WebRequest which makes parsing human readable content easier, and Invoke-RestMethod which makes machine readable content easier to read. Since links are part of the HTML of a page they are part of the human readable stuff. All you have to do to get a webpage is use Invoke-WebRequest and give it a URL.

有两个新的cmdlet使Web自动化更加容易，Invoke-WebRequest使解析人类可读内容更容易，而Invoke-RestMethod使机器可读内容更易于阅读。由于链接是页面HTML的一部分，因此它们是人类可读内容的一部分。获取网页所需要做的就是使用Invoke-WebRequest并为其指定URL。

Invoke-WebRequest –Uri ‘http://howtogeek.com’

Invoke-WebRequest –Uri'http://howtogeek.com'

If you scroll down you will see the response has a links property, we can use PowerShell 3’s new member enumeration feature to filter these out.

如果向下滚动，您将看到响应具有links属性，我们可以使用PowerShell 3的新成员枚举功能将其过滤掉。

(Invoke-WebRequest –Uri ‘http://howtogeek.com’).Links

(Invoke-WebRequest –Uri'http://howtogeek.com')。链接

As you can see you get a lot of links back, this is where you need to use your imagination to find something unique to filter out the links you are looking for. Lets suppose we want a list of all articles on the front page.

如您所见，您会获得很多链接，在这里您需要发挥想象力，找到一些独特的东西来过滤出您正在寻找的链接。假设我们要在首页上列出所有文章。

((Invoke-WebRequest –Uri ‘http://howtogeek.com’).Links | Where-Object {$_.href -like “http*”} | Where class -eq “title”).Title

(((Invoke-WebRequest –Uri'http://howtogeek.com' )。链接 | Where-Object {$ _。href -like “ http *”} | Where类-eq“ title”)。标题

Another great thing you can do with the new cmdlets is automate everyday downloads. Lets look at automatically scraping the the image of the day off the Nat Geo website, to do this we will combine the new web cmdlets with Start-BitsTransfer.

使用新的cmdlet可以做的另一件事是使日常下载自动化。让我们看一下自动从Nat Geo网站上刮取当天的图像，为此，我们将新的Web cmdlet与Start-BitsTransfer结合在一起。

$IOTD = ((Invoke-WebRequest -Uri ‘http://photography.nationalgeographic.com/photography/photo-of-the-day/’).Links | Where innerHTML -like “*Download Wallpaper*”).href Start-BitsTransfer -Source $IOTD -Destination C:\IOTD\

$ IOTD =((调用-的WebRequest -uri' http://photography.nationalgeographic.com/photography/photo-of-the-day/').Links |哪里的innerHTML样‘*下载壁纸*’)的href开始-BitsTransfer-源$ IOTD-目标C：\ IOTD \

That’s all there is to it. Have any neat tricks of your own? Let us know in the comments.

这里的所有都是它的。有自己的巧妙技巧吗？让我们在评论中知道。

翻译自: https://www.howtogeek.com/124736/stupid-geek-tricks-extract-links-off-any-webpage-using-powershell/

culinxia2707

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
如何使用PowerShell提取任何网页上的链接

PowerShell 3 has a lot of new features, including some powerful new web-related features. They dramatically simplify automating the web, and today we are going to show you how you can extract every si...
复制链接

扫一扫