我使用wget下载网站内容,但是wget是一个一个下载文件的。
我怎么能让wget下载使用4个同时连接?
我使用wget下载网站内容,但是wget是一个一个下载文件的。
我怎么能让wget下载使用4个同时连接?
当前回答
您可以使用xargs
-P是进程数,例如设置-P 4,将同时下载4个链接,如果设置-P 0, xargs将启动尽可能多的进程,并下载所有的链接。
cat links.txt | xargs -P 4 -I{} wget {}
其他回答
我发现(可能) 一个解决方案
In the process of downloading a few thousand log files from one server to the next I suddenly had the need to do some serious multithreaded downloading in BSD, preferably with Wget as that was the simplest way I could think of handling this. A little looking around led me to this little nugget: wget -r -np -N [url] & wget -r -np -N [url] & wget -r -np -N [url] & wget -r -np -N [url] Just repeat the wget -r -np -N [url] for as many threads as you need... Now given this isn’t pretty and there are surely better ways to do this but if you want something quick and dirty it should do the trick...
注意:选项-N使wget只下载“更新的”文件,这意味着它不会覆盖或重新下载文件,除非它们在服务器上的时间戳发生了变化。
为每个链接调用Wget并将其设置为在后台运行。
我尝试了这段Python代码
with open('links.txt', 'r')as f1: # Opens links.txt file with read mode
list_1 = f1.read().splitlines() # Get every line in links.txt
for i in list_1: # Iteration over each link
!wget "$i" -bq # Call wget with background mode
参数:
b - Run in Background
q - Quiet mode (No Output)
使用咏叹调2:
aria2c -x 16 [url]
# |
# |
# |
# ----> the number of connections
http://aria2.sourceforge.net
我使用gnu并行
cat listoflinks.txt | parallel --bar -j ${MAX_PARALLEL:-$(nproc)} wget -nv {}
cat会将行分隔的url列表管道到parallel ——bar标志将显示并行执行进度条 MAX_PARALLEL env var是并行下载的最大数量,请谨慎使用,这里默认是当前cpu的数量
提示:使用——dry-run来查看如果执行命令会发生什么。 cat listfllinks .txt | parallel——dry-run——bar -j ${MAX_PARALLEL} wget -nv {}
make可以很容易地并行化(例如,make -j 4)。例如,这是一个简单的Makefile,我正在使用wget并行下载文件:
BASE=http://www.somewhere.com/path/to
FILES=$(shell awk '{printf "%s.ext\n", $$1}' filelist.txt)
LOG=download.log
all: $(FILES)
echo $(FILES)
%.ext:
wget -N -a $(LOG) $(BASE)/$@
.PHONY: all
default: all