[关闭]
@WillireamAngel 2018-08-07T14:43:22.000000Z 字数 2832 阅读 1469

Wget

Linux


简介

GNU Wget is a free utility for non-interactive download of files from the Web. It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies.

Wget’s features:
- Wget is non-interactive, meaning that it can work in the background, while the user is not logged on. Sometimes, some websites need login information and webbrowser,we can use http_proxy to achieve it.
- Wget can follow links in HTML, XHTML, and CSS pages, to create local versions of remote web sites, fully recreating the directory structure of the original site. Wget can be instructed to convert the links in downloaded files to point at the local files, for offline viewing.
- File name wildcard matching and recursive mirroring of directories are available when retrieving via FTP. Wget can read the time-stamp information given by both HTTP and FTP servers, and store it locally.
- Wget supports proxy servers, which can lighten the network load, speed up retrieval and provide access behind firewalls.
- GNU Wget is free software. This means that everyone may use it, redistribute it and/or modify it under the terms of the GNU General Public License, as published by the Free Software Foundation.

语法

wget [option]... [URL]...

选项

  1. -a<日志文件>:在指定的日志文件中记录资料的执行过程;
  2. -A<后缀名>:指定要下载文件的后缀名,多个后缀名之间使用逗号进行分隔;
  3. -b:进行后台的方式运行wget
  4. -B<连接地址>:设置参考的连接地址的基地地址;
  5. -c:继续执行上次终端的任务;
  6. -C<标志>:设置服务器数据块功能标志on为激活,off为关闭,默认值为on
  7. -d:调试模式运行指令;
  8. -D<域名列表>:设置顺着的域名列表,域名之间用“,”分隔;
  9. -e<指令>:作为文件“.wgetrc”中的一部分执行指定的指令;
  10. -h:显示指令帮助信息;
  11. -i<文件>:从指定文件获取要下载的URL地址;
  12. -l<目录列表>:设置顺着的目录列表,多个目录用“,”分隔;
  13. -L:仅顺着关联的连接;
  14. -r:递归下载方式;
  15. -nc:文件存在时,下载文件不覆盖原有文件;
  16. -nv:下载时只显示更新和出错信息,不显示指令的详细执行过程;
  17. -q:不显示指令执行过程;
  18. -nh:不查询主机名称;
  19. -v:显示详细执行过程;
  20. -V:显示版本信息;
  21. --passive-ftp:使用被动模式PASV连接FTP服务器;
  22. --follow-ftp:从HTML文件中下载FTP连接文件。

参见
Linux:

  1. man wget

或者
http://www.gnu.org/software/wget/manual/

案例

  1. 从网上下载单个文件
  1. wget http://www.gnu.org/software/wget/manual/wget.html
  1. 下载一个文件,但以不同的名字存为本地文件
  1. wget --output-document=filename.html http://www.gnu.org/software/wget/manual/wget.html
  1. 下载一个文件,存到指定的目录
  1. wget --directory-prefix=folder/subfolder http://www.gnu.org/software/wget/manual/wget.html
  1. 恢复断点下载
  1. wget --continue example.com/big.file.iso
  2. wget -c example.com/big.file.iso
  1. 镜像网站实现:
    (1)option
  1. -r 递归
  2. -p, --page-requisites(页面必需元素)
  3. -np, --no-parent(不追溯至父级)
  4. -k 将下载的HTML页面中的链接转换为相对链接即本地链接
  5. -m, --mirror(镜像下载)

(2)实现
- 下载一个网页,包括它所有的内容,比如样式表和包含的图片,它们是确保网页离线显示所必需的

  1. wget -m -k (-H) http://www.wohaoba.com/
  2. wget -page-requisites --span-hosts --convert-links --adjust-extension http://example.com/dir/file
  3. wget -p -H -k -E http://example.com/dir/file
  1. wget --execute robots=off --recursive --no-parent --continue --no-clobber http://example.com/
  2. wget -e robots=off -r -np -c -nc http://example.com/
  1. wget --referer=/5.0 --user-agent="Firefox/4.0.1" http://nytimes.com
  1. wget --http-user=labnol --http-password=hello123 http://example.com/secret/file.zip
  1. wget --continue --timestamping wordpress.org/latest.zip
添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注