使用wget递归地获取包含任意文件的目录

我有一个web目录，我存储一些配置文件。我想使用wget将这些文件拉下来并保持它们当前的结构。例如，远程目录看起来像:

http://mysite.com/configs/.vim/

.vim包含多个文件和目录。我想用wget在客户端复制它。似乎无法找到正确的wget标志组合来完成这项工作。什么好主意吗?

当前回答

您应该使用-m (mirror)标志，因为这样可以避免混淆时间戳并无限地递归。

wget -m http://example.com/configs/.vim/

如果你加上其他人在这篇文章中提到的要点，它将是:

wget -m -e robots=off --no-parent http://example.com/configs/.vim/

2014-02-24 09:21:09

其他回答

wget -r http://mysite.com/configs/.vim/

对我有用。

也许你有一个。wgetrc干扰它?

2008-11-07 21:49:42

你只要加一个-r就可以了

wget -r http://stackoverflow.com/

2008-11-07 21:50:44

您必须将-np/——no-parent选项传递给wget(当然，除了-r/——recursive之外)，否则它将遵循我的站点上目录索引中的链接到父目录。所以命令看起来是这样的:

wget --recursive --no-parent http://example.com/configs/.vim/

为了避免下载自动生成的index.html文件，使用-R/——reject选项:

wget -r -np -R "index.html*" http://example.com/configs/.vim/

2008-11-07 21:55:41

下面是完整的wget命令，用于从服务器目录下载文件(忽略robots.txt):

wget -e robots=off --cut-dirs=3 --user-agent=Mozilla/5.0 --reject="index.html*" --no-parent --recursive --relative --level=1 --no-directories http://www.example.com/archive/example/5.3.0/

2013-02-15 12:26:50

听起来你是想要镜像你的文件。虽然wget有一些有趣的FTP和SFTP用途，但一个简单的镜像应该可以工作。只是一些注意事项，以确保您能够正确下载文件。

尊重robots . txt

如果您的public_html、www或configs目录中有一个/robots.txt文件，请确保它不会阻止爬行。如果是这样，你需要在你的wget命令中使用以下选项来指示wget忽略它:

wget -e robots=off 'http://your-site.com/configs/.vim/'

将远程链接转换为本地文件。

此外，必须指示wget将链接转换为下载的文件。如果你正确地做了上面的所有事情，你在这里应该没问题。我发现的获取所有文件的最简单方法是使用mirror命令，前提是在非公共目录后面没有隐藏任何东西。

试试这个:

wget -mpEk 'http://your-site.com/configs/.vim/'

# If robots.txt is present:

wget -mpEk robots=off 'http://your-site.com/configs/.vim/'

# Good practice to only deal with the highest level directory you specify (instead of downloading all of `mysite.com` you're just mirroring from `.vim`

wget -mpEk robots=off --no-parent 'http://your-site.com/configs/.vim/'

Using -m instead of -r is preferred as it doesn't have a maximum recursion depth and it downloads all assets. Mirror is pretty good at determining the full depth of a site, however if you have many external links you could end up downloading more than just your site, which is why we use -p -E -k. All pre-requisite files to make the page, and a preserved directory structure should be the output. -k converts links to local files. Since you should have a link set up, you should get your config folder with a file /.vim.

镜像模式也适用于设置为ftp://的目录结构。

一般经验法则:

根据您要镜像的站点的哪一侧，您将向服务器发送许多调用。为了防止你被列入黑名单或被切断，使用等待选项来限制你的下载。

wget -mpEk --no-parent robots=off --random-wait 'http://your-site.com/configs/.vim/'

但是如果你只是下载../config/。Vim /文件，你不应该担心它，因为你忽略了父目录和下载单个文件。

2021-09-02 05:20:20

使用wget递归地获取包含任意文件的目录

推荐文章

最新文章

标签