Linux命令：wget

最新推荐文章于 2024-06-29 00:45:50 发布

小孩神游

最新推荐文章于 2024-06-29 00:45:50 发布

阅读量2.5k

点赞数

分类专栏： Linux 文章标签： linux

本文链接：https://blog.csdn.net/u010214003/article/details/50108529

版权

Linux 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

简介

   GNU Wget是一个在网络上进行下载的简单而强大的自由软件，其本身也是GNU计划的一部分。它的名字是
 “World Wide Web”和“Get”的结合，同时也隐含了软件的主要功能。目前它支持通过HTTP、HTTPS，以及
 FTP这三个最常见的TCP/IP协议协议下载。
                                                                      --wikipedia

特点

1. 支持递归下载
2. 恰当的转换页面中的链接
3. 生成可在本地浏览的页面镜像
4. 支持代理服务器

缺点

1. 支持的协议较少，特别是cURL相比。流行的流媒体协议mms和rtsp没有得到支持，还有广泛使用各种的P2P协议也没有涉及。
2. 支持协议过老。目前HTTP还是使用1.0版本，而HTML中通过JavaScript和CSS引用的文件不能下载。
3. 灵活性不强，扩展性不高。面对复杂的镜像站会出现问题。
4. 命令过于复杂，可选的设置项有上百个。
5. 安全问题。

详解

操作系统：Ubuntu14.04

wget --help
GNU Wget 1.15, a non-interactive network retriever.
Usage: wget [OPTION]... [URL]...
用法：wget [参数] [URL地址]
Mandatory arguments to long options are mandatory for short options too.

Startup:
  -V,  --version           display the version of Wget and exit.
                           显示wget的版本信息然后退出
  -h,  --help              print this help.
                           打印帮助信息
  -b,  --background        go to background after startup.
                           启动后后台执行
  -e,  --execute=COMMAND   execute a `.wgetrc'-style command.
                           执行时.wgetrc格式的命令

Logging and input file:
  -o,  --output-file=FILE    log messages to FILE.
                             记录信息到FILE文件
  -a,  --append-output=FILE  append messages to FILE.
                             追加信息到FILE文件末尾
  -d,  --debug               print lots of debugging information.
                             打印大量debug信息
  -q,  --quiet               quiet (no output).
                             安静模式（没有输出信息）
  -v,  --verbose             be verbose (this is the default).
                             冗长模式（缺省设置）
  -nv, --no-verbose          turn off verboseness, without being quiet.
                             关闭冗长模式，但不是安静模式
       --report-speed=TYPE   Output bandwidth as TYPE.  TYPE can be bits.
                             以TYPE为输出带宽单位，TYPE可以是位
  -i,  --input-file=FILE     download URLs found in local or external FILE.
                             在本地或外部文件中找到下载的地址
  -F,  --force-html          treat input file as HTML.
                             把输入文件当做html格式来解析
  -B,  --base=URL            resolves HTML input-file links (-i -F)
                             relative to URL.
                             将URL做前缀连接到-i -F指定的文件中的地址
       --config=FILE         Specify config file to use.
                             指定配置文件

Download:
  -t,  --tries=NUMBER            set number of retries to NUMBER (0 unlimits).
                                 设置下载重试次数为NUMBER（0 表示无限重试）
       --retry-connrefused       retry even if connection is refused.
                                 下载请求被拒绝后仍然重试
  -O,  --output-document=FILE    write documents to FILE.
                                 下载内容写入文件FILE
  -nc, --no-clobber              skip downloads that would download to
                                 existing files (overwriting them).
                                 跳过下载已存在的文件
  -c,  --continue                resume getting a partially-downloaded file.
                                 继续下载未完成的文件
       --progress=TYPE           select progress gauge type.
                                 设置下载进程条标记
  -N,  --timestamping            don't re-retrieve files unless newer than
                                 local.
                                 文件有更新才下载
  --no-use-server-timestamps     don't set the local file's timestamp by
                                 the one on the server.
                                 不使用服务器上的时间戳
  -S,  --server-response         print server response.
                                 打印服务的的回应
       --spider                  don't download anything.
                                 什么也不下载
  -T,  --timeout=SECONDS         set all timeout values to SECONDS.
                                 设定响应超时时间（超时时间单位均为秒）
       --dns-timeout=SECS        set the DNS lookup timeout to SECS.
                                 设定dns解析超时时间
       --connect-timeout=SECS    set the connect timeout to SECS.
                                 设定连接超时时间
       --read-timeout=SECS       set the read timeout to SECS.
                                 设定读取超时时间
  -w,  --wait=SECONDS            wait SECONDS between retrievals.
                                 设定两次尝试连接之间间隔时间
       --waitretry=SECONDS       wait 1..SECONDS between retries of a retrieval.
       --random-wait             wait from 0.5*WAIT...1.5*WAIT secs between retrievals.
       --no-proxy                explicitly turn off proxy.
                                 关闭代理
  -Q,  --quota=NUMBER            set retrieval quota to NUMBER.
                                 设置下载的容量
       --bind-address=ADDRESS    bind to ADDRESS (hostname or IP) on local host.
       --limit-rate=RATE         limit download rate to RATE.
                                 设置下载速度
       --no-dns-cache            disable caching DNS lookups.
                                 不换存dsn查找记录
       --restrict-file-names=OS  restrict chars in file names to ones OS allows.

       --ignore-case             ignore case when matching files/directories.
                                 匹配文件或目录时忽略缓存
  -4,  --inet4-only              connect only to IPv4 addresses.
                                 只连接IPv4地址
  -6,  --inet6-only              connect only to IPv6 addresses.
                                 只连接IPv6地址
       --prefer-family=FAMILY    connect first to addresses of specified family,
                                 one of IPv6, IPv4, or none.
                                 优先连接地址类型（IPv6、IPv4或者其它）
       --user=USER               set both ftp and http user to USER.
                                 设置ftp和http协议的用户
       --password=PASS           set both ftp and http password to PASS.
                                 设置ftp和http协议的用户密码
       --ask-password            prompt for passwords.
                                 设置密码提示
       --no-iri                  turn off IRI support.
                                 关闭IRI支持
       --local-encoding=ENC      use ENC as the local encoding for IRIs.
                                 使用ENC作为本地IRIS编码方式
       --remote-encoding=ENC     use ENC as the default remote encoding.
                                 使用ENC作为远程仓库默认编码
       --unlink                  remove file before clobber.
                                 忽略链接文件

Directories:
  -nd, --no-directories           don't create directories.
                                  不创建目录
  -x,  --force-directories        force creation of directories.
                                  强制创建目录
  -nH, --no-host-directories      don't create host directories.
                                  不创建主机目录
       --protocol-directories     use protocol name in directories.
                                  使用协议名称做目录
  -P,  --directory-prefix=PREFIX  save files to PREFIX/...
                                  保存文件到PREFIX目录下
       --cut-dirs=NUMBER          ignore NUMBER remote directory components.
                                  忽略NUMBER层之下的目录

HTTP options:
       --http-user=USER        set http user to USER.
                               设置http用户
       --http-password=PASS    set http password to PASS.
                               设置http用户密码
       --no-cache              disallow server-cached data.
                               不允许服务端缓存数据
       --default-page=NAME     Change the default page name (normally
                               this is `index.html'.).
                               更改默认获取页面（默认index.html）
  -E,  --adjust-extension      save HTML/CSS documents with proper extensions.
                               自适应保存HTML/CSS文件后缀
       --ignore-length         ignore `Content-Length' header field.
                               忽略http头部的Content-Length
       --header=STRING         insert STRING among the headers.
                               把STRING插入http头部
       --max-redirect          maximum redirections allowed per page.
                               每个页面最多重定向次数
       --proxy-user=USER       set USER as proxy username.
                               设置代理用户名
       --proxy-password=PASS   set PASS as proxy password.
                               设置代理用户密码
       --referer=URL           include `Referer: URL' header in HTTP request.
                               在http头部保存'Referer: URL'
       --save-headers          save the HTTP headers to file.
                               把http头部写入文件
  -U,  --user-agent=AGENT      identify as AGENT instead of Wget/VERSION.
                               设置代理
       --no-http-keep-alive    disable HTTP keep-alive (persistent connections).
                               不允许http长连接
       --no-cookies            don't use cookies.
                               不使用cookie
       --load-cookies=FILE     load cookies from FILE before session.
                               在会话开始前加载指定文件的cookie
       --save-cookies=FILE     save cookies to FILE after session.
                               会话结束保存cookie到指定文件
       --keep-session-cookies  load and save session (non-permanent) cookies.
                               加载和保存session，cookie(非永久)
       --post-data=STRING      use the POST method; send STRING as the data.
                               post方式发送字符串
       --post-file=FILE        use the POST method; send contents of FILE.
                               post方式发送文件
       --method=HTTPMethod     use method "HTTPMethod" in the header.
                               在http头部设置请求方式（get,post..）
       --body-data=STRING      Send STRING as data. --method MUST be set.
                               发送字符串，配合--method选项使用
       --body-file=FILE        Send contents of FILE. --method MUST be set.
                               发送文件，配合--method方法使用
       --content-disposition   honor the Content-Disposition header when
                               choosing local file names (EXPERIMENTAL).
                               当选中本地文件名时允许 Content-Disposition 头部(尚在实验)
       --content-on-error      output the received content on server errors.
                               输出服务端返回的错误信息
       --auth-no-challenge     send Basic HTTP authentication information
                               without first waiting for the server's
                               challenge.
                               发送不含服务器询问的首次等待的基本 HTTP 验证信息

HTTPS (SSL/TLS) options:
       --secure-protocol=PR     choose secure protocol, one of auto, SSLv2,
                                SSLv3, TLSv1 and PFS.
                                选择安全协议
       --https-only             only follow secure HTTPS links
                                只使用https连接
       --no-check-certificate   don't validate the server's certificate.
                                不验证服务器的证书
       --certificate=FILE       client certificate file.
                                客户端证书
       --certificate-type=TYPE  client certificate type, PEM or DER.
                                客户端证书类型
       --private-key=FILE       private key file.
                                私钥
       --private-key-type=TYPE  private key type, PEM or DER.
                                私钥类型
       --ca-certificate=FILE    file with the bundle of CA's.
                                CA 认证的文件
       --ca-directory=DIR       directory where hash list of CA's is stored.
                                保存CA认证文件的目录
       --random-file=FILE       file with random data for seeding the SSL PRNG.
                                SSL PRNG 生成的随机数据的文件
       --egd-file=FILE          file naming the EGD socket with random data.
                                用于命名带有随机数据的 EGD 套接字的文件
FTP options:
       --ftp-user=USER         set ftp user to USER.
                               设置ftp用户
       --ftp-password=PASS     set ftp password to PASS.
                               设置ftp用户密码
       --no-remove-listing     don't remove `.listing' files.
                               保留.listing文件
       --no-glob               turn off FTP file name globbing.
                               关闭FTP文件名通配符
       --no-passive-ftp        disable the "passive" transfer mode.
                               禁用"passive"传输模式
       --preserve-permissions  preserve remote file permissions.
                               保留远程文件的权限
       --retr-symlinks         when recursing, get linked-to files (not dir).
                               递归目录时，获取链接的文件(而非目录)

Recursive download:
  -r,  --recursive          specify recursive download.
                            指定递归下载文件名
  -l,  --level=NUMBER       maximum recursion depth (inf or 0 for infinite).
                            递归的层数（inf或0 表示无限制）
       --delete-after       delete files locally after downloading them.
                            下载完成后删除本地文件
  -k,  --convert-links      make links in downloaded HTML or CSS point to
                            local files.
                            转换链接到本地文件
       --backups=N          before writing file X,rotate up to N backup files.
  -K,  --backup-converted   before converting file X, back up as X.orig.
                            在转换文件 X 前先将它备份为 X.orig
  -m,  --mirror             shortcut for -N -r -l inf --no-remove-listing.
                            -N -r -l inf --no-remove-listing 的缩写形式
  -p,  --page-requisites    get all images, etc. needed to display HTML page.
                            下载所有用于显示 HTML 页面的图片之类的元素。
       --strict-comments    turn on strict (SGML) handling of HTML comments.
                            开启 HTML 注释的精确处理(SGML)

Recursive accept/reject:
  -A,  --accept=LIST               comma-separated list of accepted extensions.
                                   逗号分隔的可接受的扩展名列表
  -R,  --reject=LIST               comma-separated list of rejected extensions.
                                   逗号分隔的要拒绝的扩展名列表
       --accept-regex=REGEX        regex matching accepted URLs.
                                   可接受URL的正则表达式
       --reject-regex=REGEX        regex matching rejected URLs.
                                   不接受URL的正则表达式
       --regex-type=TYPE           regex type (posix).
                                   正则表达式类型
  -D,  --domains=LIST              comma-separated list of accepted domains.
                                   逗号分隔的可接受的域列表
       --exclude-domains=LIST      comma-separated list of rejected domains.
                                   逗号分隔的要拒绝的域列表
       --follow-ftp                follow FTP links from HTML documents.
                                   下载HTML中的ftp链接
       --follow-tags=LIST          comma-separated list of followed HTML tags.
                                   逗号分隔的跟踪的 HTML 标识列表
       --ignore-tags=LIST          comma-separated list of ignored HTML tags.
                                   逗号分隔的忽略的 HTML 标识列表
  -H,  --span-hosts                go to foreign hosts when recursive.
                                   递归时转向外部主机
  -L,  --relative                  follow relative links only.
                                   只跟踪有关系的链接
  -I,  --include-directories=LIST  list of allowed directories.
                                   可接受的目录列表
  --trust-server-names             use the name specified by the redirection
                                   url last component.
                                   用url重定向的最后一部分作为文件名
  -X,  --exclude-directories=LIST  list of excluded directories.
                                   不接受的目录的列表
  -np, --no-parent                 don't ascend to the parent directory.
                                   不追溯至父目录

Mail bug reports and suggestions to <bug-wget@gnu.org>.

常用实例

实例1：使用wget下载单个文件

wget http://www.minjieren.com/wordpress-3.1-zh_CN.zip

实例2：使用wget -O下载并以不同的文件名保存

wget -O wordpress.zip http://www.minjieren.com/download.aspx?id=1080

实例3：使用wget –limit -rate限速下载

wget --limit-rate=300k http://www.minjieren.com/wordpress-3.1-zh_CN.zip

实例4：使用wget -c断点续传

wget -c http://www.minjieren.com/wordpress-3.1-zh_CN.zip

实例5：使用wget -b后台下载

wget -b http://www.minjieren.com/wordpress-3.1-zh_CN.zip

实例6：伪装代理名称下载

wget --user-agent="Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) 
AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.204 Safari/534.16" 
http://www.minjieren.com/wordpress-3.1-zh_CN.zip
有些网站能通过根据判断代理名称不是浏览器而拒绝你的下载请求。不过你可以通过–user-agent
参数伪装。

实例7：使用wget –tries增加重试次数

wget --tries=40 URL

实例8：使用wget -i下载多个文件

wget -i filelist.txt
首先，保存一份下载链接文件
cat  filelist.txt
url1
url2
url3
url4
接着使用这个文件和参数-i下载

实例9：使用wget –mirror镜像网站

wget --mirror -p --convert-links -P ./LOCAL URL
下载整个网站到本地。
–miror:开户镜像下载
-p:下载所有为了html页面显示正常的文件
–convert-links:下载后，转换成本地的链接
-P ./LOCAL：保存所有文件和目录到本地指定目录

参考：
http://www.cnblogs.com/peida/archive/2013/03/18/2965369.html
http://blog.chinaunix.net/uid-25324849-id-3198560.html

小孩神游

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Linux命令：wget

简介 GNU Wget是一个在网络上进行下载的简单而强大的自由软件，其本身也是GNU计划的一部分。它的名字是“World Wide Web”和“Get”的结合，同时也隐含了软件的主要功能。目前它支持通过HTTP、HTTPS，以及FTP这三个最常见的TCP/IP协议协议下载。 --wikipedia
复制链接

扫一扫

专栏目录