@WillireamAngel 2018-08-07T14:43:22.000000Z 字数 2832 阅读 1494

Wget

Linux

简介

GNU Wget is a free utility for non-interactive download of files from the Web. It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies.

Wget’s features：
- Wget is non-interactive, meaning that it can work in the background, while the user is not logged on. Sometimes, some websites need login information and webbrowser，we can use http_proxy to achieve it.
- Wget can follow links in HTML, XHTML, and CSS pages, to create local versions of remote web sites, fully recreating the directory structure of the original site. Wget can be instructed to convert the links in downloaded files to point at the local files, for offline viewing.
- File name wildcard matching and recursive mirroring of directories are available when retrieving via FTP. Wget can read the time-stamp information given by both HTTP and FTP servers, and store it locally.
- Wget supports proxy servers, which can lighten the network load, speed up retrieval and provide access behind firewalls.
- GNU Wget is free software. This means that everyone may use it, redistribute it and/or modify it under the terms of the GNU General Public License, as published by the Free Software Foundation.

语法

wget [option]... [URL]...

选项

-a<日志文件>：在指定的日志文件中记录资料的执行过程； 
-A<后缀名>：指定要下载文件的后缀名，多个后缀名之间使用逗号进行分隔； 
-b：进行后台的方式运行wget； 
-B<连接地址>：设置参考的连接地址的基地地址； 
-c：继续执行上次终端的任务； 
-C<标志>：设置服务器数据块功能标志on为激活，off为关闭，默认值为on； 
-d：调试模式运行指令； 
-D<域名列表>：设置顺着的域名列表，域名之间用“，”分隔； 
-e<指令>：作为文件“.wgetrc”中的一部分执行指定的指令； 
-h：显示指令帮助信息； 
-i<文件>：从指定文件获取要下载的URL地址；
-l<目录列表>：设置顺着的目录列表，多个目录用“，”分隔； 
-L：仅顺着关联的连接； 
-r：递归下载方式； 
-nc：文件存在时，下载文件不覆盖原有文件； 
-nv：下载时只显示更新和出错信息，不显示指令的详细执行过程； 
-q：不显示指令执行过程； 
-nh：不查询主机名称； 
-v：显示详细执行过程； 
-V：显示版本信息； 
--passive-ftp：使用被动模式PASV连接FTP服务器； 
--follow-ftp：从HTML文件中下载FTP连接文件。

参见
Linux:

man wget

或者
http://www.gnu.org/software/wget/manual/

案例

从网上下载单个文件

wget http://www.gnu.org/software/wget/manual/wget.html

下载一个文件，但以不同的名字存为本地文件

wget --output-document=filename.html http://www.gnu.org/software/wget/manual/wget.html

wget --directory-prefix=folder/subfolder http://www.gnu.org/software/wget/manual/wget.html

恢复断点下载

wget --continue example.com/big.file.iso
wget -c example.com/big.file.iso

镜像网站实现：
（1）option

-r  递归
-p,  --page-requisites（页面必需元素）
-np, --no-parent（不追溯至父级）
-k  将下载的HTML页面中的链接转换为相对链接即本地链接
-m, --mirror(镜像下载）

（2）实现
- 下载一个网页，包括它所有的内容，比如样式表和包含的图片，它们是确保网页离线显示所必需的

wget -m -k (-H) http://www.wohaoba.com/
wget -page-requisites --span-hosts --convert-links --adjust-extension http://example.com/dir/file
wget -p -H -k -E http://example.com/dir/file

下载整个网站，包括它所有链接的页面和文件

wget --execute robots=off --recursive --no-parent --continue --no-clobber http://example.com/
wget -e robots=off -r -np -c -nc http://example.com/

下载网站上的文件，假设此网站检查User Agent和HTTP参照位址(referer)

wget --referer=/5.0 --user-agent="Firefox/4.0.1" http://nytimes.com

从密码保护网站上下载文件

wget --http-user=labnol --http-password=hello123 http://example.com/secret/file.zip

检查你的网站上的链接是否都可用。spider选项将令wget不会在本地保存网页
wget --output-file=logfile.txt --recursive --spider http://example.com
wget -O logfile.txt -r --spider http://example.com
下载一个文件，但只在服务器上的版本比本地版本新时才会真正执行

wget --continue --timestamping wordpress.org/latest.zip

Wget

简介

语法

选项

案例

内容目录