使用wget递归地获取包含任意文件的目录

我有一个web目录，我存储一些配置文件。我想使用wget将这些文件拉下来并保持它们当前的结构。例如，远程目录看起来像:

http://mysite.com/configs/.vim/

.vim包含多个文件和目录。我想用wget在客户端复制它。似乎无法找到正确的wget标志组合来完成这项工作。什么好主意吗?

当前回答

对于其他有类似问题的人。Wget遵循robots.txt，这可能不允许您抓取站点。不用担心，你可以把它关掉:

wget -e robots=off http://www.example.com/

http://www.gnu.org/software/wget/manual/html_node/Robot-Exclusion.html

2012-11-22 20:36:10

其他回答

wget -r http://mysite.com/configs/.vim/

对我有用。

也许你有一个。wgetrc干扰它?

2008-11-07 21:49:42

递归wget忽略机器人(用于网站)

wget -e robots=off -r -np --page-requisites --convert-links 'http://example.com/folder/'

-e robots=off使它忽略该域的robots.txt

-r使它递归

-np = no parent，所以它不会跟随链接到父文件夹

2020-06-25 22:01:11

下面的选项似乎是处理递归下载时的完美组合:

wget -nd -n -P -P /dest/dir -充值http://url/dir1 dir2

为方便起见，手册页中的相关片段:

   -nd
   --no-directories
       Do not create a hierarchy of directories when retrieving recursively.  With this option turned on, all files will get saved to the current directory, without clobbering (if a name shows up more than once, the
       filenames will get extensions .n).


   -np
   --no-parent
       Do not ever ascend to the parent directory when retrieving recursively.  This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded.

2019-09-07 15:07:53

你只要加一个-r就可以了

wget -r http://stackoverflow.com/

2008-11-07 21:50:44

您必须将-np/——no-parent选项传递给wget(当然，除了-r/——recursive之外)，否则它将遵循我的站点上目录索引中的链接到父目录。所以命令看起来是这样的:

wget --recursive --no-parent http://example.com/configs/.vim/

为了避免下载自动生成的index.html文件，使用-R/——reject选项:

wget -r -np -R "index.html*" http://example.com/configs/.vim/

2008-11-07 21:55:41

使用wget递归地获取包含任意文件的目录

推荐文章

最新文章

标签