如何通过HTTP下载文件?

我有一个小工具，我用来从一个网站上下载一个MP3文件，然后构建/更新一个播客XML文件，我已经添加到iTunes。

创建/更新XML文件的文本处理是用Python编写的。但是，我在Windows .bat文件中使用wget来下载实际的MP3文件。我更喜欢用Python编写整个实用程序。

我努力寻找一种用Python实际下载该文件的方法，因此我使用了wget。

那么，如何使用Python下载文件呢?

当前回答

使用wget模块:

import wget
wget.download('url')

2015-03-25 12:59:25

其他回答

为了这个目的，用纯Python编写了wget库。从2.0版开始，它就为urlretrieve注入了这些特性。

2013-09-25 17:55:16

如果速度对你来说很重要，我为urllib和wget模块做了一个小的性能测试，关于wget，我尝试了一次状态栏和一次没有状态栏。我使用了三个不同的500MB文件进行测试(不同的文件-以消除在底层进行缓存的可能性)。在debian机器上测试，使用python2。

首先，这些是结果(它们在不同的运行中是相似的):

$ python wget_test.py 
urlretrive_test : starting
urlretrive_test : 6.56
==============
wget_no_bar_test : starting
wget_no_bar_test : 7.20
==============
wget_with_bar_test : starting
100% [......................................................................] 541335552 / 541335552
wget_with_bar_test : 50.49
==============

我执行测试的方式是使用“profile”装饰器。这是完整的代码:

import wget
import urllib
import time
from functools import wraps

def profile(func):
    @wraps(func)
    def inner(*args):
        print func.__name__, ": starting"
        start = time.time()
        ret = func(*args)
        end = time.time()
        print func.__name__, ": {:.2f}".format(end - start)
        return ret
    return inner

url1 = 'http://host.com/500a.iso'
url2 = 'http://host.com/500b.iso'
url3 = 'http://host.com/500c.iso'

def do_nothing(*args):
    pass

@profile
def urlretrive_test(url):
    return urllib.urlretrieve(url)

@profile
def wget_no_bar_test(url):
    return wget.download(url, out='/tmp/', bar=do_nothing)

@profile
def wget_with_bar_test(url):
    return wget.download(url, out='/tmp/')

urlretrive_test(url1)
print '=============='
time.sleep(1)

wget_no_bar_test(url2)
print '=============='
time.sleep(1)

wget_with_bar_test(url3)
print '=============='
time.sleep(1)

Urllib似乎是最快的

2017-11-03 14:25:38

使用urllib.request.urlopen ():

import urllib.request
with urllib.request.urlopen('http://www.example.com/') as f:
    html = f.read().decode('utf-8')

这是使用库的最基本的方法，没有任何错误处理。您还可以执行更复杂的操作，例如更改头文件。

在Python 2中，该方法在urllib2中:

import urllib2
response = urllib2.urlopen('http://www.example.com/')
html = response.read()

2008-08-22 15:38:22

如果你安装了wget，你可以使用parallel_sync。

PIP安装parallel_sync

from parallel_sync import wget
urls = ['http://something.png', 'http://somthing.tar.gz', 'http://somthing.zip']
wget.download('/tmp', urls)
# or a single file:
wget.download('/tmp', urls[0], filenames='x.zip', extract=True)

道格: https://pythonhosted.org/parallel_sync/pages/examples.html

这是非常强大的。它可以并行下载文件，失败时重试，甚至可以在远程机器上下载文件。

2015-11-19 23:48:06

这可能有点晚了，但我看到了pabloG的代码，忍不住添加了一个os.system('cls')，使它看起来很棒!看看吧:

    import urllib2,os

    url = "http://download.thinkbroadband.com/10MB.zip"

    file_name = url.split('/')[-1]
    u = urllib2.urlopen(url)
    f = open(file_name, 'wb')
    meta = u.info()
    file_size = int(meta.getheaders("Content-Length")[0])
    print "Downloading: %s Bytes: %s" % (file_name, file_size)
    os.system('cls')
    file_size_dl = 0
    block_sz = 8192
    while True:
        buffer = u.read(block_sz)
        if not buffer:
            break

        file_size_dl += len(buffer)
        f.write(buffer)
        status = r"%10d  [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
        status = status + chr(8)*(len(status)+1)
        print status,

    f.close()

如果在Windows以外的环境中运行，你必须使用'cls'以外的东西。在MAC OS X和Linux中，它应该是“清晰的”。

2013-10-14 02:54:01

如何通过HTTP下载文件?

推荐文章

最新文章

标签