GNU Wget - The non-interactive network downloader

GNU Wget - The non-interactive network downloader

非交互式的网络文件下载工具。

1. DESCRIPTION

GNU Wget is a free utility for non-interactive download of files from the Web. It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies.

Wget is non-interactive, meaning that it can work in the background, while the user is not logged on. This allows you to start a retrieval and disconnect from the system, letting Wget finish the work. By contrast, most of the Web browsers require constant user’s presence, which can be a great hindrance when transferring a lot of data.
Wget 是非交互式的,这意味着它可以在后台运行,而用户未登录。这使您可以开始检索并断开与系统的连接,让 Wget 完成工作。相比之下,大多数 Web 浏览器都要求用户持续存在,这在传输大量数据时可能是一个很大的障碍。

Wget can follow links in HTML, XHTML, and CSS pages, to create local versions of remote web sites, fully recreating the directory structure of the original site. This is sometimes referred to as “recursive downloading.” While doing that, Wget respects the Robot Exclusion Standard (/robots.txt). Wget can be instructed to convert the links in downloaded files to point at the local files, for offline viewing.
Wget 可以跟踪 HTML,XHTML 和 CSS 页面中的链接,以创建远程网站的本地版本,从而完全重新创建原始网站的目录结构。有时将其称为递归下载。在此过程中,Wget 遵守机器人排除标准 (/robots.txt)。可以指示 Wget 将下载文件中的链接转换为指向本地文件,以供离线查看。

Wget has been designed for robustness over slow or unstable network connections; if a download fails due to a network problem, it will keep retrying until the whole file has been retrieved. If the server supports regetting, it will instruct the server to continue the download from where it left off.
Wget 被设计为在缓慢或不稳定的网络连接上具有鲁棒性。如果下载由于网络问题而失败,它将继续重试,直到检索到整个文件为止。如果服务器支持重新获取,它将指示服务器从上次中断的地方继续下载。

Wget does not support Client Revocation Lists (CRLs) so the HTTPS certificate you are connecting to might be revoked by the site owner.
Wget 不支持客户端吊销列表 (CRL),因此您所连接的 HTTPS 证书可能会被站点所有者吊销。

utility [juːˈtɪlɪtɪ]:n. 实用,效用,公共设施,功用 adj. 实用的,通用的,有多种用途的
protocol [ˈprəʊtəkɒl]:n. 协议,草案,礼仪 vt. 拟定 vi. 拟定
proxy [ˈprɒksi]:n. 代理人,委托书,代用品
hindrance [ˈhɪndrəns]:n. 障碍,妨碍,妨害,阻碍物
revoke [rɪˈvəʊk]:vt. 撤回,取消,废除 vi. 有牌不跟 n. 有牌不跟
startup [stɑ:tʌp]:n. 启动,开办 n. 初创企业

2. Basic Startup Options

-V
--version
Display the version of Wget. (显示 Wget 的版本信息。)

-h
--help
Print a help message describing all of Wget’s command-line options.

-b
--background
Go to background immediately after startup. If no output file is specified via the -o, output is redirected to wget-log. (启动后转入后台。)

3. Download Options

-t number
--tries=number
Set number of tries to number. Specify 0 or inf for infinite retrying. The default is to retry 20 times, with the exception of fatal errors like “connection refused” or “not found” (404), which are not retried. (设置重试次数为 number,0 or inf 代表无限制。)

retry [ˌriːˈtraɪ]:vt. 重试,重审 n. 重操作

-c
--continue
断点续传。
Continue getting a partially-downloaded file. This is useful when you want to finish up a download started by a previous instance of Wget, or by another program. For instance:
继续获取部分下载的文件。当您要完成由 Wget 的先前实例或另一个程序启动的下载时,此功能很有用。

wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z

If there is a file named ls-lR.Z in the current directory, Wget will assume that it is the first portion of the remote file, and will ask the server to continue the retrieval from an offset equal to the length of the local file.
如果当前目录中有一个名为 ls-lR.Z 的文件,则 Wget 将假定它是远程文件的第一部分,并要求服务器从等于本地文件长度的偏移量继续检索。

Note that you don’t need to specify this option if you just want the current invocation of Wget to retry downloading a file should the connection be lost midway through. This is the default behavior. -c only affects resumption of downloads started prior to this invocation of Wget, and whose local files are still sitting around.
请注意,如果您只是希望 Wget 的当前调用在连接途中丢失的情况下重试下载文件,则无需指定此选项。这是默认行为。-c 仅影响恢复调用 Wget 之前开始的下载,并且其本地文件仍然存在。

Without -c, the previous example would just download the remote file to ls-lR.Z.1, leaving the truncated ls-lR.Z file alone.
如果没有 -c,则前面的示例仅将远程文件下载到 ls-lR.Z.1,而将截短的 ls-lR.Z 文件保留下来。

On the other side of the coin, while using -c, any file that’s bigger on the server than locally will be considered an incomplete download and only (length(remote) - length(local)) bytes will be downloaded and tacked onto the end of the local file. This behavior can be desirable in certain cases - for instance, you can use wget -c to download just the new portion that’s been appended to a data collection or log file.
另一方面,使用 -c 时,服务器上大于本地的任何文件都将被视为不完整的下载,并且仅下载 (length(remote) - length(local)) 字节并将其添加到本地文件的末尾。在某些情况下,这种行为可能是理想的 - 例如,您可以使用 wget -c 下载仅附加到数据收集或日志文件中的新部分。

However, if the file is bigger on the server because it’s been changed, as opposed to just appended to, you’ll end up with a garbled file. Wget has no way of verifying that the local file is really a valid prefix of the remote file. You need to be especially careful of this when using -c in conjunction with -r, since every file will be considered as an “incomplete download” candidate.
但是,如果由于更改了文件而导致服务器上的文件变大 (而不是附加到文件中),您最终将得到乱码。Wget 无法验证本地文件是否确实是远程文件的有效前缀。当结合使用 -c-r 时,需要特别注意这一点,因为每个文件都将被视为“不完整的下载”候选对象。

Another instance where you’ll get a garbled file if you try to use -c is if you have a lame HTTP proxy that inserts a “transfer interrupted” string into the local file. In the future a “rollback” option may be added to deal with this case.
如果您尝试使用 -c,则将得到乱码文件的另一个实例是,如果您有一个a脚的 HTTP 代理,该代理将“传输中断的”字符串插入本地文件。将来可能会添加“回滚”选项来处理这种情况。

lame [leɪm]:adj. 跛足的,僵痛的,不完全的,无说服力的,差劲的,蹩脚的 vi. 变跛 vt. 使跛,使成残废
garble ['gɑːb(ə)l]:v. 断章取义,篡改,歪曲,让(话或信息) 变得含混不清,易引起误解 n. 断章取义,断章取义之事,混乱的记述,篡改

Note that -c only works with FTP servers and with HTTP servers that support the “Range” header.

-N
--timestamping
Turn on time-stamping.

-T seconds
--timeout=seconds
Set the network timeout to seconds seconds. This is equivalent to specifying --dns-timeout, --connect-timeout, and --read-timeout, all at the same time.
将网络超时设置为 seconds 秒。这等效于同时指定 --dns-timeout, --connect-timeout, and --read-timeout

When interacting with the network, Wget can check for timeout and abort the operation if it takes too long. This prevents anomalies like hanging reads and infinite connects. The only timeout enabled by default is a 900-second read timeout. Setting a timeout to 0 disables it altogether. Unless you know what you are doing, it is best not to change the default timeout settings.
与网络交互时,Wget 可以检查超时并在花费太长时间的情况下中止操作。这样可以防止异常情况,如挂起读取和无限连接。默认情况下,唯一启用的超时是 900 秒的读取超时。将超时设置为 0 将其完全禁用。除非您知道自己在做什么,否则最好不要更改默认超时设置。

All timeout-related options accept decimal values, as well as subsecond values. For example, 0.1 seconds is a legal (though unwise) choice of timeout. Subsecond timeouts are useful for checking server response times or for testing network latency.
所有与超时相关的选项都接受十进制值和亚秒级值。例如,0.1 秒是超时的合法选择 (尽管不明智)。亚秒级超时对于检查服务器响应时间或测试网络延迟非常有用。

-S
--server-response
Print the headers sent by HTTP servers and responses sent by FTP servers. (模拟下载打印服务器响应。)

(base) yongqiang@famu-sys:~$ wget -S www.baidu.com
--2019-11-19 09:31:58--  http://www.baidu.com/
正在解析主机 www.baidu.com (www.baidu.com)... 61.135.169.125, 61.135.169.121
正在连接 www.baidu.com (www.baidu.com)|61.135.169.125|:80... 已连接。
已发出 HTTP 请求,正在等待回应...
  HTTP/1.1 200 OK
  Accept-Ranges: bytes
  Cache-Control: private, no-cache, no-store, proxy-revalidate, no-transform
  Connection: Keep-Alive
  Content-Length: 2381
  Content-Type: text/html
  Date: Tue, 19 Nov 2019 01:31:58 GMT
  Etag: "588604c4-94d"
  Last-Modified: Mon, 23 Jan 2017 13:27:32 GMT
  Pragma: no-cache
  Server: bfe/1.0.8.18
  Set-Cookie: BDORZ=27315; max-age=86400; domain=.baidu.com; path=/
长度: 2381 (2.3K) [text/html]
正在保存至: “index.html”

index.html                               100%[================================>]   2.33K  --.-KB/s    in 0s

2019-11-19 09:31:59 (125 MB/s) - 已保存 “index.html” [2381/2381])

(base) yongqiang@famu-sys:~$

--spider
When invoked with this option, Wget will behave as a Web spider, which means that it will not download the pages, just check that they are there.
当使用此选项调用 Wget 时,它将充当 Web 蜘蛛,这意味着它不会下载页面,只需检查它们是否在那里。

For example, you can use Wget to check your bookmarks:

wget --spider --force-html -i bookmarks.html

This feature needs much more work for Wget to get close to the functionality of real web spiders.
该功能需要更多的工作才能使 Wget 接近真正的 Web Spider 的功能。

(base) yongqiang@famu-sys:~$ wget --spider www.baidu.com
开启 Spider 模式。检查是否存在远程文件。
--2019-11-19 09:35:14--  http://www.baidu.com/
正在解析主机 www.baidu.com (www.baidu.com)... 61.135.169.125, 61.135.169.121
正在连接 www.baidu.com (www.baidu.com)|61.135.169.125|:80... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度: 277 [text/html]
存在远程文件且该文件可能含有更深层的链接,
但不能进行递归操作 -- 无法获取。

(base) yongqiang@famu-sys:~$

-P prefix
--directory-prefix=prefix
Set directory prefix to prefix. The directory prefix is the directory where all other files and subdirectories will be saved to, i.e. the top of the retrieval tree.
The default is. (the current directory).
将目录前缀设置为 prefix。目录前缀是将所有其他文件和子目录保存到的目录,即检索树的顶部。The default is. (the current directory).

4. 断点续传 - 网络超时设置

(base) yongqiang@famu-sys:~/opencv_workspace/420-opencv_contrib$ wget -t 1000 -T 2000 -c https://github.com/opencv/opencv_contrib/archive/4.2.0.zip
--2020-02-14 15:48:57--  https://github.com/opencv/opencv_contrib/archive/4.2.0.zip
正在解析主机 github.com (github.com)... 13.250.177.223
正在连接 github.com (github.com)|13.250.177.223|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 302 Found
位置:https://codeload.github.com/opencv/opencv_contrib/zip/4.2.0 [跟随至新的 URL]
--2020-02-14 15:49:00--  https://codeload.github.com/opencv/opencv_contrib/zip/4.2.0
正在解析主机 codeload.github.com (codeload.github.com)... 54.251.140.56
正在连接 codeload.github.com (codeload.github.com)|54.251.140.56|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度: 未指定 [application/zip]
正在保存至: “4.2.0.zip”

4.2.0.zip                             [                                                       <=>    ]  59.63M  92.1KB/s    in 28m 0s  

2020-02-14 16:17:07 (36.3 KB/s) - “4.2.0.zip” 已保存 [62527980]

(base) yongqiang@famu-sys:~/opencv_workspace/420-opencv_contrib$

References

https://yongqiang.blog.csdn.net/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Yongqiang Cheng

梦想不是浮躁,而是沉淀和积累。

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值