我试图在脚本中从谷歌驱动器下载一个文件,我这样做有点麻烦。我要下载的文件在这里。
我在网上搜了很多,终于下载了其中一个。我得到了文件的uid,较小的文件(1.6MB)下载正常,但较大的文件(3.7GB)总是重定向到一个页面,询问我是否想在不进行病毒扫描的情况下继续下载。谁能帮我跳过那个屏幕?
下面是我如何让第一个文件工作-
curl -L "https://docs.google.com/uc?export=download&id=0Bz-w5tutuZIYeDU0VDRFWG9IVUE" > phlat-1.0.tar.gz
当我对另一个文件进行同样操作时,
curl -L "https://docs.google.com/uc?export=download&id=0Bz-w5tutuZIYY3h5YlMzTjhnbGM" > index4phlat.tar.gz
我得到以下输出-
我注意到在链接的第三行到最后一行,有一个&confirm=JwkK,这是一个随机的4个字符的字符串,但建议有一种方法添加到我的URL确认。我访问的一个链接建议&confirm=no_antivirus,但这不起作用。
我希望这里有人能帮忙!
——更新
要下载该文件,请从这里获取python的youtube-dl:
YouTube-DL: https://rg3.github.io/youtube-dl/download.html
或者用pip安装:
sudo python2.7 -m pip install --upgrade youtube_dl
# or
# sudo python3.6 -m pip install --upgrade youtube_dl
更新:
我刚刚发现了这个:
右击要从drive.google.com下载的文件
点击获取共享链接
开启链路共享
点击共享设置
点击顶部下拉菜单的选项
点击更多
选择[x]打开-任何有链接的人
复制链接
https://drive.google.com/file/d/3PIY9dCoWRs-930HHvY-3-FOOPrIVoBAR/view?usp=sharing
(This is not a real file address)
将id复制到https://drive.google.com/file/d/:之后
3PIY9dCoWRs-930HHvY-3-FOOPrIVoBAR
粘贴到命令行:
youtube-dl https://drive.google.com/open?id=
把id贴在后面?id =
youtube-dl https://drive.google.com/open?id=3PIY9dCoWRs-930HHvY-3-FOOPrIVoBAR
[GoogleDrive] 3PIY9dCoWRs-930HHvY-3-FOOPrIVoBAR: Downloading webpage
[GoogleDrive] 3PIY9dCoWRs-930HHvY-3-FOOPrIVoBAR: Requesting source file
[download] Destination: your_requested_filename_here-3PIY9dCoWRs-930HHvY-3-FOOPrIVoBAR
[download] 240.37MiB at 2321.53MiB/s (00:01)
希望能有所帮助
下面是我写的一个小bash脚本,它今天完成了这项工作。它适用于大文件,也可以恢复部分获取的文件。它有两个参数,第一个是file_id,第二个是输出文件的名称。与之前的答案相比,主要的改进是它可以在大文件上工作,只需要常用的工具:bash, curl, tr, grep, du, cut和mv。
#!/usr/bin/env bash
fileid="$1"
destination="$2"
# try to download the file
curl -c /tmp/cookie -L -o /tmp/probe.bin "https://drive.google.com/uc?export=download&id=${fileid}"
probeSize=`du -b /tmp/probe.bin | cut -f1`
# did we get a virus message?
# this will be the first line we get when trying to retrive a large file
bigFileSig='<!DOCTYPE html><html><head><title>Google Drive - Virus scan warning</title><meta http-equiv="content-type" content="text/html; charset=utf-8"/>'
sigSize=${#bigFileSig}
if (( probeSize <= sigSize )); then
virusMessage=false
else
firstBytes=$(head -c $sigSize /tmp/probe.bin)
if [ "$firstBytes" = "$bigFileSig" ]; then
virusMessage=true
else
virusMessage=false
fi
fi
if [ "$virusMessage" = true ] ; then
confirm=$(tr ';' '\n' </tmp/probe.bin | grep confirm)
confirm=${confirm:8:4}
curl -C - -b /tmp/cookie -L -o "$destination" "https://drive.google.com/uc?export=download&id=${fileid}&confirm=${confirm}"
else
mv /tmp/probe.bin "$destination"
fi
我无法让Nanoix的perl脚本工作,或者我看到的其他curl示例,所以我开始自己用python研究api。这适用于小文件,但大文件阻塞了可用的ram,所以我找到了一些其他不错的分块代码,使用api的部分下载功能。要点:
https://gist.github.com/csik/c4c90987224150e4a0b2
注意从API接口下载client_secret json文件到本地目录的部分。
源
$ cat gdrive_dl.py
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
"""API calls to download a very large google drive file. The drive API only allows downloading to ram
(unlike, say, the Requests library's streaming option) so the files has to be partially downloaded
and chunked. Authentication requires a google api key, and a local download of client_secrets.json
Thanks to Radek for the key functions: http://stackoverflow.com/questions/27617258/memoryerror-how-to-download-large-file-via-google-drive-sdk-using-python
"""
def partial(total_byte_len, part_size_limit):
s = []
for p in range(0, total_byte_len, part_size_limit):
last = min(total_byte_len - 1, p + part_size_limit - 1)
s.append([p, last])
return s
def GD_download_file(service, file_id):
drive_file = service.files().get(fileId=file_id).execute()
download_url = drive_file.get('downloadUrl')
total_size = int(drive_file.get('fileSize'))
s = partial(total_size, 100000000) # I'm downloading BIG files, so 100M chunk size is fine for me
title = drive_file.get('title')
originalFilename = drive_file.get('originalFilename')
filename = './' + originalFilename
if download_url:
with open(filename, 'wb') as file:
print "Bytes downloaded: "
for bytes in s:
headers = {"Range" : 'bytes=%s-%s' % (bytes[0], bytes[1])}
resp, content = service._http.request(download_url, headers=headers)
if resp.status == 206 :
file.write(content)
file.flush()
else:
print 'An error occurred: %s' % resp
return None
print str(bytes[1])+"..."
return title, filename
else:
return None
gauth = GoogleAuth()
gauth.CommandLineAuth() #requires cut and paste from a browser
FILE_ID = 'SOMEID' #FileID is the simple file hash, like 0B1NzlxZ5RpdKS0NOS0x0Ym9kR0U
drive = GoogleDrive(gauth)
service = gauth.service
#file = drive.CreateFile({'id':FILE_ID}) # Use this to get file metadata
GD_download_file(service, FILE_ID)