我使用wget下载网站内容,但是wget是一个一个下载文件的。

我怎么能让wget下载使用4个同时连接?


当前回答

由于还没有提到GNU并行,让我给出另一种方式:

cat url.list | parallel -j 8 wget -O {#}.html {}

其他回答

Wget不能在多个连接中下载,相反,您可以尝试使用其他程序,如aria2。

我使用gnu并行

cat listoflinks.txt | parallel --bar -j ${MAX_PARALLEL:-$(nproc)} wget -nv {}

cat会将行分隔的url列表管道到parallel ——bar标志将显示并行执行进度条 MAX_PARALLEL env var是并行下载的最大数量,请谨慎使用,这里默认是当前cpu的数量

提示:使用——dry-run来查看如果执行命令会发生什么。 cat listfllinks .txt | parallel——dry-run——bar -j ${MAX_PARALLEL} wget -nv {}

use

aria2c -x 10 -i websites.txt >/dev/null 2>/dev/null &

在sites.txt中每行放一个url,例如:

https://www.example.com/1.mp4
https://www.example.com/2.mp4
https://www.example.com/3.mp4
https://www.example.com/4.mp4
https://www.example.com/5.mp4

我发现(可能) 一个解决方案

In the process of downloading a few thousand log files from one server to the next I suddenly had the need to do some serious multithreaded downloading in BSD, preferably with Wget as that was the simplest way I could think of handling this. A little looking around led me to this little nugget: wget -r -np -N [url] & wget -r -np -N [url] & wget -r -np -N [url] & wget -r -np -N [url] Just repeat the wget -r -np -N [url] for as many threads as you need... Now given this isn’t pretty and there are surely better ways to do this but if you want something quick and dirty it should do the trick...

注意:选项-N使wget只下载“更新的”文件,这意味着它不会覆盖或重新下载文件,除非它们在服务器上的时间戳发生了变化。

尝试pcurl

http://sourceforge.net/projects/pcurl/

使用curl代替wget,并行下载10段。