使用Boto3将S3对象作为字符串打开

我知道与Boto 2，它可以打开一个S3对象作为字符串:get_contents_as_string()

boto3中是否有等效的函数?

当前回答

最快的方法

如文档中所述，download_fileobj使用并行化:

这是一个托管传输，它将在必要时在多个线程中执行多部分下载。

引用aws文档:

您可以通过在GetObjectRequest中指定部件号从S3检索对象的部件。TransferManager使用此逻辑异步下载对象的所有部分，并将它们写入单独的临时文件。然后将临时文件合并到用户提供的目标文件中。

这可以利用将数据保存在内存中，而不是将其写入文件。

@盖茨比·李展示的方法做到了，这就是为什么它是列出的方法中最快的。无论如何，使用Config参数可以进一步改进:

import io
import boto3

client = boto3.client('s3')
buffer = io.BytesIO()

# This is just an example, parameters should be fine tuned according to:
# 1. The size of the object that is being read (bigger the file, bigger the chunks)
# 2. The number of threads available on the machine that runs this code

config = TransferConfig(
    multipart_threshold=1024 * 25,   # Concurrent read only if object size > 25MB
    max_concurrency=10,              # Up to 10 concurrent readers
    multipart_chunksize=1024 * 25,   # 25MB chunks per reader
    use_threads=True                 # Must be True to enable multiple readers
)

# This method writes the data into the buffer
client.download_fileobj( 
    Bucket=bucket_name, 
    Key=object_key, 
    Fileobj=buffer,
    Config=config
)

str_value = buffer.getvalue().decode()

对于大于1GB的对象，就速度而言，这是值得的。

2022-11-24 13:59:07

其他回答

Python3 +使用boto3 API方法。

通过使用S3.Client。download_fileobj API和Python类文件对象，S3对象内容可以检索到内存。

由于检索到的内容是字节，为了将其转换为str，需要对其进行解码。

import io
import boto3

client = boto3.client('s3')
bytes_buffer = io.BytesIO()
client.download_fileobj(Bucket=bucket_name, Key=object_key, Fileobj=bytes_buffer)
byte_value = bytes_buffer.getvalue()
str_value = byte_value.decode() #python3, default decoding is utf-8

2019-06-29 03:43:01

将整个对象体解码为一个字符串:

obj = s3.Object(bucket, key).get()
big_str = obj['Body'].read().decode()

逐行解码对象主体为字符串:

obj = s3.Object(bucket, key).get()
reader = csv.reader(line.decode() for line in obj['Body'].iter_lines())

自Python 3以来，bytes' decode()中的默认编码已经是'utf-8'。

解码为JSON时，不需要转换为字符串，作为JSON。load也接受字节，从Python 3.6开始:

obj = s3.Object(bucket, key).get()
json.loads(obj['Body'].read())

2022-01-07 00:08:30

如果正文包含io。StringIO，你必须像下面这样做:

object.get()['Body'].getvalue()

2016-11-30 10:02:26

最快的方法

如文档中所述，download_fileobj使用并行化:

这是一个托管传输，它将在必要时在多个线程中执行多部分下载。

引用aws文档:

这可以利用将数据保存在内存中，而不是将其写入文件。

@盖茨比·李展示的方法做到了，这就是为什么它是列出的方法中最快的。无论如何，使用Config参数可以进一步改进:

import io
import boto3

client = boto3.client('s3')
buffer = io.BytesIO()

# This is just an example, parameters should be fine tuned according to:
# 1. The size of the object that is being read (bigger the file, bigger the chunks)
# 2. The number of threads available on the machine that runs this code

config = TransferConfig(
    multipart_threshold=1024 * 25,   # Concurrent read only if object size > 25MB
    max_concurrency=10,              # Up to 10 concurrent readers
    multipart_chunksize=1024 * 25,   # 25MB chunks per reader
    use_threads=True                 # Must be True to enable multiple readers
)

# This method writes the data into the buffer
client.download_fileobj( 
    Bucket=bucket_name, 
    Key=object_key, 
    Fileobj=buffer,
    Config=config
)

str_value = buffer.getvalue().decode()

对于大于1GB的对象，就速度而言，这是值得的。

2022-11-24 13:59:07

我在从S3读取/解析对象时遇到了一个问题，因为.get()在AWS Lambda中使用Python 2.7。

我将json添加到示例中，以显示它变得可解析:)

import boto3
import json

s3 = boto3.client('s3')

obj = s3.get_object(Bucket=bucket, Key=key)
j = json.loads(obj['Body'].read())

注意(对于python 2.7):我的对象都是ascii，所以我不需要.decode('utf-8')

注意(对于python 3):我们移动到python 3，发现read()现在返回字节，所以如果你想从中获得一个字符串，你必须使用:

j = json.loads（obj['Body'].read（）.decode（'utf-8'））

2017-03-11 15:52:50

使用Boto3将S3对象作为字符串打开

推荐文章

最新文章

标签