我如何能看到什么是在S3桶与boto3?(例如,写一个“ls”)?

做以下事情:

import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('some/path/')

返回:

s3.Bucket(name='some/path/')

我如何看到它的内容?


当前回答

#To print all filenames in a bucket
import boto3

s3 = boto3.client('s3')

def get_s3_keys(bucket):

    """Get a list of keys in an S3 bucket."""
    resp = s3.list_objects_v2(Bucket=bucket)
    for obj in resp['Contents']:
      files = obj['Key']
    return files

  
filename = get_s3_keys('your_bucket_name')

print(filename)

#To print all filenames in a certain directory in a bucket
import boto3

s3 = boto3.client('s3')

def get_s3_keys(bucket, prefix):

    """Get a list of keys in an S3 bucket."""
    resp = s3.list_objects_v2(Bucket=bucket, Prefix=prefix)
    for obj in resp['Contents']:
      files = obj['Key']
      print(files)
    return files

  
filename = get_s3_keys('your_bucket_name', 'folder_name/sub_folder_name/')

print(filename)

更新: 最简单的方法是使用awswrangler

import awswrangler as wr
wr.s3.list_objects('s3://bucket_name')

其他回答

一种更节俭的方法,而不是通过一个for循环来迭代,你也可以只打印原始对象,其中包含S3桶中的所有文件:

session = Session(aws_access_key_id=aws_access_key_id,aws_secret_access_key=aws_secret_access_key)
s3 = session.resource('s3')
bucket = s3.Bucket('bucket_name')

files_in_s3 = bucket.objects.all() 
#you can print this iterable with print(list(files_in_s3))

所以你在boto3中要求等同于aws s3 ls。这将列出所有顶级文件夹和文件。这是我能得到的最接近的结果;它只列出所有顶级文件夹。这么简单的操作居然这么难。

import boto3

def s3_ls():
  s3 = boto3.resource('s3')
  bucket = s3.Bucket('example-bucket')
  result = bucket.meta.client.list_objects(Bucket=bucket.name,
                                           Delimiter='/')
  for o in result.get('CommonPrefixes'):
    print(o.get('Prefix'))

ObjectSummary:

有两个标识符附加到ObjectSummary:

bucket_name 关键

boto3 S3: ObjectSummary

有关AWS S3文档中的对象键的更多信息:

Object Keys: When you create an object, you specify the key name, which uniquely identifies the object in the bucket. For example, in the Amazon S3 console (see AWS Management Console), when you highlight a bucket, a list of objects in your bucket appears. These names are the object keys. The name for a key is a sequence of Unicode characters whose UTF-8 encoding is at most 1024 bytes long. The Amazon S3 data model is a flat structure: you create a bucket, and the bucket stores objects. There is no hierarchy of subbuckets or subfolders; however, you can infer logical hierarchy using key name prefixes and delimiters as the Amazon S3 console does. The Amazon S3 console supports a concept of folders. Suppose that your bucket (admin-created) has four objects with the following object keys: Development/Projects1.xls Finance/statement1.pdf Private/taxdocument.pdf s3-dg.pdf Reference: AWS S3: Object Keys

下面是一些示例代码,演示如何获取桶名和对象键。

例子:

import boto3
from pprint import pprint

def main():

    def enumerate_s3():
        s3 = boto3.resource('s3')
        for bucket in s3.buckets.all():
             print("Name: {}".format(bucket.name))
             print("Creation Date: {}".format(bucket.creation_date))
             for object in bucket.objects.all():
                 print("Object: {}".format(object))
                 print("Object bucket_name: {}".format(object.bucket_name))
                 print("Object key: {}".format(object.key))

    enumerate_s3()


if __name__ == '__main__':
    main()
#To print all filenames in a bucket
import boto3

s3 = boto3.client('s3')

def get_s3_keys(bucket):

    """Get a list of keys in an S3 bucket."""
    resp = s3.list_objects_v2(Bucket=bucket)
    for obj in resp['Contents']:
      files = obj['Key']
    return files

  
filename = get_s3_keys('your_bucket_name')

print(filename)

#To print all filenames in a certain directory in a bucket
import boto3

s3 = boto3.client('s3')

def get_s3_keys(bucket, prefix):

    """Get a list of keys in an S3 bucket."""
    resp = s3.list_objects_v2(Bucket=bucket, Prefix=prefix)
    for obj in resp['Contents']:
      files = obj['Key']
      print(files)
    return files

  
filename = get_s3_keys('your_bucket_name', 'folder_name/sub_folder_name/')

print(filename)

更新: 最简单的方法是使用awswrangler

import awswrangler as wr
wr.s3.list_objects('s3://bucket_name')

在上面的注释中对@Hephaeastus的代码进行了少许修改,编写了下面的方法来列出给定路径中的文件夹和对象(文件)。类似s3 ls命令。

from boto3 import session

def s3_ls(profile=None, bucket_name=None, folder_path=None):
    folders=[]
    files=[]
    result=dict()
    bucket_name = bucket_name
    prefix= folder_path
    session = boto3.Session(profile_name=profile)
    s3_conn   = session.client('s3')
    s3_result =  s3_conn.list_objects_v2(Bucket=bucket_name, Delimiter = "/", Prefix=prefix)
    if 'Contents' not in s3_result and 'CommonPrefixes' not in s3_result:
        return []

    if s3_result.get('CommonPrefixes'):
        for folder in s3_result['CommonPrefixes']:
            folders.append(folder.get('Prefix'))

    if s3_result.get('Contents'):
        for key in s3_result['Contents']:
            files.append(key['Key'])

    while s3_result['IsTruncated']:
        continuation_key = s3_result['NextContinuationToken']
        s3_result = s3_conn.list_objects_v2(Bucket=bucket_name, Delimiter="/", ContinuationToken=continuation_key, Prefix=prefix)
        if s3_result.get('CommonPrefixes'):
            for folder in s3_result['CommonPrefixes']:
                folders.append(folder.get('Prefix'))
        if s3_result.get('Contents'):
            for key in s3_result['Contents']:
                files.append(key['Key'])

    if folders:
        result['folders']=sorted(folders)
    if files:
        result['files']=sorted(files)
    return result

这将列出给定路径下的所有对象/文件夹。Folder_path可以默认为None, method将列出桶根目录的即时内容。