我使用wget下载网站内容,但是wget是一个一个下载文件的。

我怎么能让wget下载使用4个同时连接?


当前回答

我发现(可能) 一个解决方案

In the process of downloading a few thousand log files from one server to the next I suddenly had the need to do some serious multithreaded downloading in BSD, preferably with Wget as that was the simplest way I could think of handling this. A little looking around led me to this little nugget: wget -r -np -N [url] & wget -r -np -N [url] & wget -r -np -N [url] & wget -r -np -N [url] Just repeat the wget -r -np -N [url] for as many threads as you need... Now given this isn’t pretty and there are surely better ways to do this but if you want something quick and dirty it should do the trick...

注意:选项-N使wget只下载“更新的”文件,这意味着它不会覆盖或重新下载文件,除非它们在服务器上的时间戳发生了变化。

其他回答

正如其他海报所提到的,我建议你看一看aria2。Ubuntu 1.16.1版本的手册页:

aria2 is a utility for downloading files. The supported protocols are HTTP(S), FTP, BitTorrent, and Metalink. aria2 can download a file from multiple sources/protocols and tries to utilize your maximum download bandwidth. It supports downloading a file from HTTP(S)/FTP and BitTorrent at the same time, while the data downloaded from HTTP(S)/FTP is uploaded to the BitTorrent swarm. Using Metalink's chunk checksums, aria2 automatically validates chunks of data while downloading a file like BitTorrent.

您可以使用-x标志来指定每个服务器的最大连接数(默认为1):

aria2c -x 16 [url] 

如果同一文件可从多个位置下载,则可以选择从所有位置下载。使用-j标志指定每个静态URI的最大并行下载数量(默认为5)。

aria2c -j 5 [url] [url2]

更多信息请访问http://aria2.sourceforge.net/。对于使用信息,手册页是真正的描述性的,并在底部有一个小节提供了使用示例。在线版本可以在http://aria2.sourceforge.net/manual/en/html/README.html上找到。

为每个链接调用Wget并将其设置为在后台运行。

我尝试了这段Python代码

with open('links.txt', 'r')as f1:      # Opens links.txt file with read mode
  list_1 = f1.read().splitlines()      # Get every line in links.txt
for i in list_1:                       # Iteration over each link
  !wget "$i" -bq                       # Call wget with background mode

参数:

      b - Run in Background
      q - Quiet mode (No Output)

使用xargs使wget在多个文件中并行工作

#!/bin/bash

mywget()
{
    wget "$1"
}

export -f mywget

# run wget in parallel using 8 thread/connection
xargs -P 8 -n 1 -I {} bash -c "mywget '{}'" < list_urls.txt

Aria2选项,正确的工作方式与文件小于20mb

aria2c -k 2M -x 10 -s 10 [url]

-k 2M将文件分割成2mb的块

-k或——min-split-size的默认值是20mb,如果你不设置这个选项并且文件小于20mb,无论-x或-s的值是多少,它都只会在单个连接中运行

Wget不能在多个连接中下载,相反,您可以尝试使用其他程序,如aria2。

我发现(可能) 一个解决方案

In the process of downloading a few thousand log files from one server to the next I suddenly had the need to do some serious multithreaded downloading in BSD, preferably with Wget as that was the simplest way I could think of handling this. A little looking around led me to this little nugget: wget -r -np -N [url] & wget -r -np -N [url] & wget -r -np -N [url] & wget -r -np -N [url] Just repeat the wget -r -np -N [url] for as many threads as you need... Now given this isn’t pretty and there are surely better ways to do this but if you want something quick and dirty it should do the trick...

注意:选项-N使wget只下载“更新的”文件,这意味着它不会覆盖或重新下载文件,除非它们在服务器上的时间戳发生了变化。