谷歌Colab:如何从我的谷歌驱动器读取数据?

问题很简单:我在gDrive上有一些数据，例如在 /项目/ my_project / my_data *。

我也有一个简单的笔记本在gColab。

所以，我想做的是:

for file in glob.glob("/projects/my_project/my_data*"):
    do_something(file)

不幸的是，所有的例子(例如https://colab.research.google.com/notebook#fileId=/v2/external/notebooks/io.ipynb)都建议只将所有必要的数据加载到笔记本中。

但是，如果我有很多数据，就会很复杂。有没有解决这个问题的机会?

谢谢你的帮助!

当前回答

@wenkesj

对我来说，我找到了一个解决方案，看起来像这样:

def copy_directory(source_id, local_target):
  try:
    os.makedirs(local_target)
  except: 
    pass
  file_list = drive.ListFile(
    {'q': "'{source_id}' in parents".format(source_id=source_id)}).GetList()
  for f in file_list:
    key in ['title', 'id', 'mimeType']]))
    if f["title"].startswith("."):
      continue
    fname = os.path.join(local_target, f['title'])
    if f['mimeType'] == 'application/vnd.google-apps.folder':
      copy_directory(f['id'], fname)
    else:
      f_ = drive.CreateFile({'id': f['id']})
      f_.GetContentFile(fname)

然而，我看起来像gDrive不喜欢复制太多的文件。

2018-01-25 07:20:19

其他回答

@wenkesj

对我来说，我找到了一个解决方案，看起来像这样:

def copy_directory(source_id, local_target):
  try:
    os.makedirs(local_target)
  except: 
    pass
  file_list = drive.ListFile(
    {'q': "'{source_id}' in parents".format(source_id=source_id)}).GetList()
  for f in file_list:
    key in ['title', 'id', 'mimeType']]))
    if f["title"].startswith("."):
      continue
    fname = os.path.join(local_target, f['title'])
    if f['mimeType'] == 'application/vnd.google-apps.folder':
      copy_directory(f['id'], fname)
    else:
      f_ = drive.CreateFile({'id': f['id']})
      f_.GetContentFile(fname)

然而，我看起来像gDrive不喜欢复制太多的文件。

2018-01-25 07:20:19

之前的大多数答案都有点(非常)复杂，

from google.colab import drive
drive.mount("/content/drive", force_remount=True)

我发现这是最简单和最快的方法来安装谷歌驱动器到CO实验室，你可以改变挂载目录的位置，只要改变drive.mount的参数。它会给你一个链接，接受与您的帐户的权限，然后你必须复制粘贴生成的密钥，然后驱动器将被安装在选定的路径。

Force_remount仅在必须挂载驱动器时使用，而不管之前是否加载了驱动器。如果不想强制挂载，可以忽略这个when参数

编辑:查看这篇文章，了解更多在colab https://colab.research.google.com/notebooks/io.ipynb中执行IO操作的方法

2019-05-08 06:19:48

我所做的是:

from google.colab import drive
drive.mount('/content/drive/')

Then

%cd /content/drive/My Drive/Colab Notebooks/

之后我就可以读取csv文件了

df = pd.read_csv("data_example.csv")

如果文件的位置不同，只需在“我的驱动器”后添加正确的路径

2020-08-19 09:46:48

有很多方法来读取你的colab笔记本(**.ipnb)中的文件，其中一些是:

在运行时的虚拟机中安装谷歌驱动器。这里&，这里使用google.colab.files.upload()。最简单的解决方案使用本地REST API; 使用诸如PyDrive之类的API包装器

方法一和方法二对我有用，其他的我就不知道了。如果有人可以，就像其他人在上面的帖子中尝试的那样，请写一个优雅的答案。提前谢谢你!

第一种方法:

我无法挂载我的谷歌驱动器，所以我安装了这些库

# Install a Drive FUSE wrapper.
# https://github.com/astrada/google-drive-ocamlfuse

!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse

from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass

!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}

安装和授权过程完成后，首先挂载您的驱动器。

!mkdir -p drive
!google-drive-ocamlfuse drive

安装后，我能够挂载谷歌驱动器，您的谷歌驱动器中的所有内容都从/content/驱动器开始

!ls /content/drive/ML/../../../../path_to_your_folder/

现在您可以使用上面的路径简单地将文件从path_to_your_folder文件夹读入pandas。

import pandas as pd
df = pd.read_json('drive/ML/../../../../path_to_your_folder/file.json')
df.head(5)

你假设你使用你收到的绝对路径，而不是使用/../..

第二种方法:

这很方便，如果你想要读取的文件在当前工作目录中。

如果你需要从本地文件系统上传任何文件，你可以使用下面的代码，否则就避免它。

from google.colab import files
uploaded = files.upload()
for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

假设你在谷歌驱动器中有以下文件夹层次结构:

/content/drive/ML/../../../../path_to_your_folder/

然后，您只需将下面的代码加载到pandas中。

import pandas as pd
import io
df = pd.read_json(io.StringIO(uploaded['file.json'].decode('utf-8')))
df

2018-12-09 21:28:38

我写了一个类来下载所有的数据到。’在colab服务器中的位置

整个事情可以从这里拉https://github.com/brianmanderson/Copy-Shared-Google-to-Colab

!pip install PyDrive


from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
import os

class download_data_from_folder(object):
    def __init__(self,path):
        path_id = path[path.find('id=')+3:]
        self.file_list = self.get_files_in_location(path_id)
        self.unwrap_data(self.file_list)
    def get_files_in_location(self,folder_id):
        file_list = drive.ListFile({'q': "'{}' in parents and trashed=false".format(folder_id)}).GetList()
        return file_list
    def unwrap_data(self,file_list,directory='.'):
        for i, file in enumerate(file_list):
            print(str((i + 1) / len(file_list) * 100) + '% done copying')
            if file['mimeType'].find('folder') != -1:
                if not os.path.exists(os.path.join(directory, file['title'])):
                    os.makedirs(os.path.join(directory, file['title']))
                print('Copying folder ' + os.path.join(directory, file['title']))
                self.unwrap_data(self.get_files_in_location(file['id']), os.path.join(directory, file['title']))
            else:
                if not os.path.exists(os.path.join(directory, file['title'])):
                    downloaded = drive.CreateFile({'id': file['id']})
                    downloaded.GetContentFile(os.path.join(directory, file['title']))
        return None
data_path = 'shared_path_location'
download_data_from_folder(data_path)

2019-05-17 16:51:47

谷歌Colab:如何从我的谷歌驱动器读取数据?

推荐文章

最新文章

标签