如何搜索亚马逊s3桶?

这不是一个技术性的答案，但我已经构建了一个允许通配符搜索的应用程序:https://bucketsearch.net/

它将异步索引您的bucket，然后允许您搜索结果。

它是免费使用的(捐赠软件)。

2020-06-11 13:04:54

S3没有原生的“搜索此桶”，因为实际内容是未知的-此外，由于S3是基于键/值的，因此没有原生的方法可以一次访问多个节点，而更传统的数据存储提供了一个(SELECT * FROM…(在SQL模型中)。

您需要做的是执行ListBucket以获得bucket中对象的列表，然后遍历每个项，执行您实现的自定义操作—这就是您的搜索。

2011-02-12 16:52:19

试试这个命令:

aws s3api list-objects --bucket your-bucket --prefix sub-dir-path --output text --query 'Contents[].{Key: Key}'

然后，您可以将其输送到grep中，以获得特定的文件类型，以便对它们做任何您想做的事情。

2016-01-12 21:41:57

虽然不是AWS的原生服务，但有Mixpeek，它在S3文件上运行文本提取，如Tika、Tesseract和ImageAI，然后将它们放在Lucene索引中，使它们可搜索。

在这里查看文档

积分如下:

Download the module: https://github.com/mixpeek/mixpeek-python Import the module and your API keys: from mixpeek import Mixpeek, S3 from config import mixpeek_api_key, aws Instantiate the S3 class (which uses boto3 and requests): s3 = S3( aws_access_key_id=aws['aws_access_key_id'], aws_secret_access_key=aws['aws_secret_access_key'], region_name='us-east-2', mixpeek_api_key=mixpeek_api_key ) Upload one or more existing S3 files: # upload all S3 files in bucket "demo" s3.upload_all(bucket_name="demo") # upload one single file called "prescription.pdf" in bucket "demo" s3.upload_one(s3_file_name="prescription.pdf", bucket_name="demo") Now simply search using the Mixpeek module: # mixpeek api direct mix = Mixpeek( api_key=mixpeek_api_key ) # search result = mix.search(query="Heartgard") print(result) Where result can be: [ { "_id": "REDACTED", "api_key": "REDACTED", "highlights": [ { "path": "document_str", "score": 0.8759502172470093, "texts": [ { "type": "text", "value": "Vetco Prescription\nVetcoClinics.com\n\nCustomer:\n\nAddress: Canine\n\nPhone: Australian Shepherd\n\nDate of Service: 2 Years 8 Months\n\nPrescription\nExpiration Date:\n\nWeight: 41.75\n\nSex: Female\n\n℞ " }, { "type": "hit", "value": "Heartgard" }, { "type": "text", "value": " Plus Green 26-50 lbs (Ivermectin 135 mcg/Pyrantel 114 mg)\n\nInstructions: Give one chewable tablet by mouth once monthly for protection against heartworms, and the treatment and\ncontrol of roundworms, and hookworms. " } ] } ], "metadata": { "date_inserted": "2021-10-07 03:19:23.632000", "filename": "prescription.pdf" }, "score": 0.13313256204128265 } ]

然后解析结果

2022-02-25 14:41:46

(至少)有两个不同的用例可以描述为“搜索桶”:

Search for something inside every object stored at the bucket; this assumes a common format for all the objects in that bucket (say, text files), etc etc. For something like this, you're forced to do what Cody Caughlan just answered. The AWS S3 docs has example code showing how to do this with the AWS SDK for Java: Listing Keys Using the AWS SDK for Java (there you'll also find PHP and C# examples). List item Search for something in the object keys contained in that bucket; S3 does have partial support for this, in the form of allowing prefix exact matches + collapsing matches after a delimiter. This is explained in more detail at the AWS S3 Developer Guide. This allows, for example, to implement "folders" through using as object keys something like folder/subfolder/file.txt If you follow this convention, most of the S3 GUIs (such as the AWS Console) will show you a folder view of your bucket.

2011-02-12 23:21:43

使用Amazon Athena查询S3桶。另外，加载数据到Amazon Elastic搜索。希望这能有所帮助。

2020-05-01 15:36:43

如何搜索亚马逊s3桶?

推荐文章

最新文章

标签