我如何能看到什么是在S3桶与boto3?(例如,写一个“ls”)?

做以下事情:

import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('some/path/')

返回:

s3.Bucket(name='some/path/')

我如何看到它的内容?


当前回答

也可以这样做:

csv_files = s3.list_objects_v2(s3_bucket_path)
    for obj in csv_files['Contents']:
        key = obj['Key']

其他回答

下面是一个简单的函数,它返回所有文件的文件名或具有特定类型的文件,如'json', 'jpg'。

def get_file_list_s3(bucket, prefix="", file_extension=None):
            """Return the list of all file paths (prefix + file name) with certain type or all
            Parameters
            ----------
            bucket: str
                The name of the bucket. For example, if your bucket is "s3://my_bucket" then it should be "my_bucket"
            prefix: str
                The full path to the the 'folder' of the files (objects). For example, if your files are in 
                s3://my_bucket/recipes/deserts then it should be "recipes/deserts". Default : ""
            file_extension: str
                The type of the files. If you want all, just leave it None. If you only want "json" files then it
                should be "json". Default: None       
            Return
            ------
            file_names: list
                The list of file names including the prefix
            """
            import boto3
            s3 = boto3.resource('s3')
            my_bucket = s3.Bucket(bucket)
            file_objs =  my_bucket.objects.filter(Prefix=prefix).all()
            file_names = [file_obj.key for file_obj in file_objs if file_extension is not None and file_obj.key.split(".")[-1] == file_extension]
            return file_names

在上面的注释中对@Hephaeastus的代码进行了少许修改,编写了下面的方法来列出给定路径中的文件夹和对象(文件)。类似s3 ls命令。

from boto3 import session

def s3_ls(profile=None, bucket_name=None, folder_path=None):
    folders=[]
    files=[]
    result=dict()
    bucket_name = bucket_name
    prefix= folder_path
    session = boto3.Session(profile_name=profile)
    s3_conn   = session.client('s3')
    s3_result =  s3_conn.list_objects_v2(Bucket=bucket_name, Delimiter = "/", Prefix=prefix)
    if 'Contents' not in s3_result and 'CommonPrefixes' not in s3_result:
        return []

    if s3_result.get('CommonPrefixes'):
        for folder in s3_result['CommonPrefixes']:
            folders.append(folder.get('Prefix'))

    if s3_result.get('Contents'):
        for key in s3_result['Contents']:
            files.append(key['Key'])

    while s3_result['IsTruncated']:
        continuation_key = s3_result['NextContinuationToken']
        s3_result = s3_conn.list_objects_v2(Bucket=bucket_name, Delimiter="/", ContinuationToken=continuation_key, Prefix=prefix)
        if s3_result.get('CommonPrefixes'):
            for folder in s3_result['CommonPrefixes']:
                folders.append(folder.get('Prefix'))
        if s3_result.get('Contents'):
            for key in s3_result['Contents']:
                files.append(key['Key'])

    if folders:
        result['folders']=sorted(folders)
    if files:
        result['files']=sorted(files)
    return result

这将列出给定路径下的所有对象/文件夹。Folder_path可以默认为None, method将列出桶根目录的即时内容。

ObjectSummary:

有两个标识符附加到ObjectSummary:

bucket_name 关键

boto3 S3: ObjectSummary

有关AWS S3文档中的对象键的更多信息:

Object Keys: When you create an object, you specify the key name, which uniquely identifies the object in the bucket. For example, in the Amazon S3 console (see AWS Management Console), when you highlight a bucket, a list of objects in your bucket appears. These names are the object keys. The name for a key is a sequence of Unicode characters whose UTF-8 encoding is at most 1024 bytes long. The Amazon S3 data model is a flat structure: you create a bucket, and the bucket stores objects. There is no hierarchy of subbuckets or subfolders; however, you can infer logical hierarchy using key name prefixes and delimiters as the Amazon S3 console does. The Amazon S3 console supports a concept of folders. Suppose that your bucket (admin-created) has four objects with the following object keys: Development/Projects1.xls Finance/statement1.pdf Private/taxdocument.pdf s3-dg.pdf Reference: AWS S3: Object Keys

下面是一些示例代码,演示如何获取桶名和对象键。

例子:

import boto3
from pprint import pprint

def main():

    def enumerate_s3():
        s3 = boto3.resource('s3')
        for bucket in s3.buckets.all():
             print("Name: {}".format(bucket.name))
             print("Creation Date: {}".format(bucket.creation_date))
             for object in bucket.objects.all():
                 print("Object: {}".format(object))
                 print("Object bucket_name: {}".format(object.bucket_name))
                 print("Object key: {}".format(object.key))

    enumerate_s3()


if __name__ == '__main__':
    main()

我以前是这样做的:

import boto3
s3 = boto3.resource('s3')
bucket=s3.Bucket("bucket_name")
contents = [_.key for _ in bucket.objects.all() if "subfolders/ifany/" in _.key]

我只是这样做的,包括身份验证方法:

s3_client = boto3.client(
                's3',
                aws_access_key_id='access_key',
                aws_secret_access_key='access_key_secret',
                config=boto3.session.Config(signature_version='s3v4'),
                region_name='region'
            )

response = s3_client.list_objects(Bucket='bucket_name', Prefix=key)
if ('Contents' in response):
    # Object / key exists!
    return True
else:
    # Object / key DOES NOT exist!
    return False