同意Blairg23的观点,使用urllib.request.urlretrieve是最简单的解决方案之一。
这里我想指出一点。有时它不会下载任何东西,因为请求是通过脚本(bot)发送的,如果你想解析来自谷歌图像或其他搜索引擎的图像,你需要先传递user-agent请求标题,然后再下载图像,否则,请求将被阻止并抛出错误。
传递user-agent,下载镜像:
opener=urllib.request.build_opener()
opener.addheaders=[('User-Agent','Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582')]
urllib.request.install_opener(opener)
urllib.request.urlretrieve(URL, 'image_name.jpg')
在线IDE中的代码,使用请求,bs4, urllib.requests从谷歌图像中抓取和下载图像。
或者,如果你的目标是从谷歌,Bing, Yahoo!, DuckDuckGo(和其他搜索引擎),然后你可以使用SerpApi。这是一个带有免费计划的付费API。
最大的区别是,不需要弄清楚如何绕过搜索引擎的块,或者如何从HTML或JavaScript中提取某些部分,因为这些已经为最终用户完成了。
要集成的示例代码:
import os, urllib.request
from serpapi import GoogleSearch
params = {
"api_key": os.getenv("API_KEY"),
"engine": "google",
"q": "pexels cat",
"tbm": "isch"
}
search = GoogleSearch(params)
results = search.get_dict()
print(json.dumps(results['images_results'], indent=2, ensure_ascii=False))
# download images
for index, image in enumerate(results['images_results']):
# print(f'Downloading {index} image...')
opener=urllib.request.build_opener()
opener.addheaders=[('User-Agent','Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582')]
urllib.request.install_opener(opener)
# saves original res image to the SerpApi_Images folder and add index to the end of file name
urllib.request.urlretrieve(image['original'], f'SerpApi_Images/original_size_img_{index}.jpg')
-----------
'''
]
# other images
{
"position": 100, # 100 image
"thumbnail": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQK62dIkDjNCvEgmGU6GGFZcpVWwX-p3FsYSg&usqp=CAU",
"source": "homewardboundnj.org",
"title": "pexels-helena-lopes-1931367 - Homeward Bound Pet Adoption Center",
"link": "https://homewardboundnj.org/upcoming-event/black-cat-appreciation-day/pexels-helena-lopes-1931367/",
"original": "https://homewardboundnj.org/wp-content/uploads/2020/07/pexels-helena-lopes-1931367.jpg",
"is_product": false
}
]
'''
免责声明,我为SerpApi工作。