如何使用PowerShell提取任何网页上的链接

image

PowerShell 3 has a lot of new features, including some powerful new web-related features. They dramatically simplify automating the web, and today we are going to show you how you can extract every single link off a webpage, and optionally download the resource if you so wish.

PowerShell 3具有许多新功能,包括一些强大的与Web相关的新功能。 它们极大地简化了网络的自动化,今天,我们将向您展示如何提取网页上的每个链接,并根据需要选择下载资源。

使用PowerShell爬网 (Scraping The Web With PowerShell)

There are two new cmdlets that make automating the web easier, Invoke-WebRequest which makes parsing human readable content easier, and Invoke-RestMethod which makes machine readable content easier to read. Since links are part of the HTML of a page they are part of the human readable stuff. All you have to do to get a webpage is use Invoke-WebRequest and give it a URL.

有两个新的cmdlet使Web自动化更加容易,Invoke-WebRequest使解析人类可读内容更容易,而Invoke-RestMethod使机器可读内容更易于阅读。 由于链接是页面HTML的一部分,因此它们是人类可读内容的一部分。 获取网页所需要做的就是使用Invoke-WebRequest并为其指定URL。

Invoke-WebRequest –Uri ‘http://howtogeek.com’

Invoke-WebRequest –Uri'http://howtogeek.com'

image

If you scroll down you will see the response has a links property, we can use PowerShell 3’s new member enumeration feature to filter these out.

如果向下滚动,您将看到响应具有links属性,我们可以使用PowerShell 3的新成员枚举功能将其过滤掉。

(Invoke-WebRequest –Uri ‘http://howtogeek.com’).Links

(Invoke-WebRequest –Uri'http://howtogeek.com')。链接

image

As you can see you get a lot of links back, this is where you need to use your imagination to find something unique to filter out the links you are looking for. Lets suppose we want a list of all articles on the front page.

如您所见,您会获得很多链接,在这里您需要发挥想象力,找到一些独特的东西来过滤出您正在寻找的链接。 假设我们要在首页上列出所有文章。

((Invoke-WebRequest –Uri ‘http://howtogeek.com’).Links | Where-Object {$_.href -like “http*”} | Where class -eq “title”).Title

(((Invoke-WebRequest –Uri'http://howtogeek.com' )。链接 | Where-Object {$ _。href -like “ http *”} | Where类-eq“ title”)。标题

image

Another great thing you can do with the new cmdlets is automate everyday downloads. Lets look at automatically scraping the the image of the day off the Nat Geo website, to do this we will combine the new web cmdlets with Start-BitsTransfer.

使用新的cmdlet可以做的另一件事是使日常下载自动化。 让我们看一下自动从Nat Geo网站上刮取当天的图像,为此,我们将新的Web cmdlet与Start-BitsTransfer结合在一起。

$IOTD = ((Invoke-WebRequest -Uri ‘http://photography.nationalgeographic.com/photography/photo-of-the-day/’).Links | Where innerHTML -like “*Download Wallpaper*”).href Start-BitsTransfer -Source $IOTD -Destination C:\IOTD\

$ IOTD =((调用-的WebRequest -uri' http://photography.nationalgeographic.com/photography/photo-of-the-day/').Links |哪里的innerHTML样‘*下载壁纸*’)的href开始-BitsTransfer-源$ IOTD-目标C:\ IOTD \

That’s all there is to it. Have any neat tricks of your own? Let us know in the comments.

这里的所有都是它的。 有自己的巧妙技巧吗? 让我们在评论中知道。

翻译自: https://www.howtogeek.com/124736/stupid-geek-tricks-extract-links-off-any-webpage-using-powershell/

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值