使用Wget进行多个同时下载?

我使用wget下载网站内容，但是wget是一个一个下载文件的。

我怎么能让wget下载使用4个同时连接?

当前回答

如果你在做递归下载，你还不知道所有的url, wget是完美的。

如果您已经有了想要下载的每个URL的列表，那么可以跳过下面的cURL。

使用Wget递归地进行多个同时下载(未知的url列表)

# Multiple simultaneous donwloads

URL=ftp://ftp.example.com

for i in {1..10}; do
    wget --no-clobber --recursive "${URL}" &
done

上面的循环将启动10个wget，每个wget都递归地从同一个网站下载，但是它们不会重叠或下载同一文件两次。

使用——no-clobber可以防止10个wget进程中的每个进程两次下载同一个文件(包括完整的相对URL路径)。

& fork每个wget到后台，允许你运行多个同时下载从同一个网站使用wget。

从url列表中使用curl

如果你已经有了一个想要下载的url列表，curl -Z是并行的curl，默认一次运行50个下载。

然而，对于curl，列表必须是这样的格式:

url = https://example.com/1.html
-O
url = https://example.com/2.html
-O

因此，如果您已经有一个要下载的url列表，只需格式化该列表，然后运行cURL

cat url_list.txt
#https://example.com/1.html
#https://example.com/2.html

touch url_list_formatted.txt

while read -r URL; do
    echo "url = ${URL}" >> url_list_formatted.txt
    echo "-O" >> url_list_formatted.txt
done < url_list.txt

使用curl从url列表中并行下载:

curl -Z --parallel-max 100 -K url_list_formatted.txt

例如,

$ curl -Z --parallel-max 100 -K url_list_formatted.txt
DL% UL%  Dled  Uled  Xfers  Live   Qd Total     Current  Left    Speed
100 --   2512     0     2     0     0  0:00:01  0:00:01 --:--:--  1973

$ ls
1.html  2.html  url_list_formatted.txt  url_list.txt

2022-10-09 18:28:25

其他回答

尝试pcurl

http://sourceforge.net/projects/pcurl/

使用curl代替wget，并行下载10段。

2012-07-16 07:44:24

使用xargs使wget在多个文件中并行工作

#!/bin/bash

mywget()
{
    wget "$1"
}

export -f mywget

# run wget in parallel using 8 thread/connection
xargs -P 8 -n 1 -I {} bash -c "mywget '{}'" < list_urls.txt

Aria2选项，正确的工作方式与文件小于20mb

aria2c -k 2M -x 10 -s 10 [url]

-k 2M将文件分割成2mb的块

-k或——min-split-size的默认值是20mb，如果你不设置这个选项并且文件小于20mb，无论-x或-s的值是多少，它都只会在单个连接中运行

2018-07-28 02:57:13

考虑使用正则表达式或FTP Globbing。通过这种方法，您可以使用不同的文件名起始字符组多次启动wget，这取决于它们出现的频率。

这是我如何在两个NAS之间同步文件夹的例子:

wget --recursive --level 0 --no-host-directories --cut-dirs=2 --no-verbose --timestamping --backups=0 --bind-address=10.0.0.10 --user=<ftp_user> --password=<ftp_password> "ftp://10.0.0.100/foo/bar/[0-9a-hA-H]*" --directory-prefix=/volume1/foo &
wget --recursive --level 0 --no-host-directories --cut-dirs=2 --no-verbose --timestamping --backups=0 --bind-address=10.0.0.11 --user=<ftp_user> --password=<ftp_password> "ftp://10.0.0.100/foo/bar/[!0-9a-hA-H]*" --directory-prefix=/volume1/foo &

第一个wget同步所有以0,1,2…开头的文件/文件夹。F, G, H和第二个线程同步所有其他内容。

这是在带有一个10G以太网端口(10.0.0.100)的NAS和带有两个1G以太网端口(10.0.0.10和10.0.0.11)的NAS之间进行同步的最简单方法。我通过——bind-address将两个wget线程绑定到不同的以太网端口，并通过在每行末尾放置&将它们称为并行。通过这种方式，我能够复制2x 100mb /s = 200 MB/s的大文件。

2019-11-27 19:27:48

正如其他海报所提到的，我建议你看一看aria2。Ubuntu 1.16.1版本的手册页:

aria2 is a utility for downloading files. The supported protocols are HTTP(S), FTP, BitTorrent, and Metalink. aria2 can download a file from multiple sources/protocols and tries to utilize your maximum download bandwidth. It supports downloading a file from HTTP(S)/FTP and BitTorrent at the same time, while the data downloaded from HTTP(S)/FTP is uploaded to the BitTorrent swarm. Using Metalink's chunk checksums, aria2 automatically validates chunks of data while downloading a file like BitTorrent.

您可以使用-x标志来指定每个服务器的最大连接数(默认为1):

aria2c -x 16 [url]

如果同一文件可从多个位置下载，则可以选择从所有位置下载。使用-j标志指定每个静态URI的最大并行下载数量(默认为5)。

aria2c -j 5 [url] [url2]

更多信息请访问http://aria2.sourceforge.net/。对于使用信息，手册页是真正的描述性的，并在底部有一个小节提供了使用示例。在线版本可以在http://aria2.sourceforge.net/manual/en/html/README.html上找到。

2013-08-31 17:57:08