缘起
今天查资料的时候,偶然发现了一位国外网友镜像了 TheOleNewThing
从 2003
年到 2019
年的博文(https://bytepointer.com/resources/old_new_thing/index.htm),竟然有 5000
多篇(真是高产)。值得注意的是,微软官方博客中许多链接都已经失效了。为了防止这位网友的镜像链接也失效,我决定赶紧将这些内容下载保存下来。手动保存显然不现实,毕竟有 5000
多篇文章呢!所以,我决定写脚本来自动下载,这才是明智之举!如果是几年前,我肯定要亲自动手写脚本,不过如今,AI
这么强大,我无需再费力。幸运的是,我利用 Cursor(https://www.cursor.so/) 自动生成下载脚本的全过程,并对其进行了简单的修改,成功地将全部 5000
多篇文章下载并压缩存档,并按年月分类打包好了!不得不赞叹一句,AI
真是太强大了。
下载地址
所有文章已经按年压缩,感兴趣的朋友可以自行下载。
百度云盘:https://pan.baidu.com/s/1gGx7mvLsumbtzoQbatPsfw?pwd=dbqt 提取码: dbqt
gitee
:
https://gitee.com/bianchengnan/my-blog/tree/master/download-the-old-new-thing-archive-blog-using-powershell-with-the-help-of-Cursor
注: 所有文章版权归原作者所有,我只是为了防止失联,转存一份。侵删!
下载
Q:
download url and save to disk and then parse inner urls and then save these urls to disk
A:
To download a URL, save it to disk, parse inner URLs, and then save these URLs to disk in PowerShell, you can use the Invoke-WebRequest cmdlet to download the webpage, Out-File cmdlet to save it to disk, and then use a regular expression to parse the inner URLs. Here is a simple example:
# Define the URL
$url = "http://example.com"
# Define the output file for the webpage
$outfile = "output.html"
# Define the output file for the inner URLs
$urlfile = "urls.txt"
# Download the webpage and save to disk
Invoke-WebRequest -Uri $url -OutFile $outfile
# Read the saved file
$content = Get-Content -Path $outfile
# Parse the inner URLs
$innerUrls = $content | Where-Objec