如何搜索亚马逊s3桶?

有多种选择，没有一个是简单的“一次性”全文解决方案:

Key name pattern search: Searching for keys starting with some string- if you design key names carefully, then you may have rather quick solution. Search metadata attached to keys: when posting a file to AWS S3, you may process the content, extract some meta information and attach this meta information in form of custom headers into the key. This allows you to fetch key names and headers without need to fetch complete content. The search has to be done sequentialy, there is no "sql like" search option for this. With large files this could save a lot of network traffic and time. Store metadata on SimpleDB: as previous point, but with storing the metadata on SimpleDB. Here you have sql like select statements. In case of large data sets you may hit SimpleDB limits, which can be overcome (partition metadata across multiple SimpleDB domains), but if you go really far, you may need to use another metedata type of database. Sequential full text search of the content - processing all the keys one by one. Very slow, if you have too many keys to process.

几年来，我们每天存储1440个版本的文件(每分钟一个)，使用版本化桶，这是很容易实现的。但要获得一些较旧的版本需要时间，因为人们必须一个版本一个版本地按顺序进行。有时我使用简单的CSV记录索引，显示发布时间和版本id，有了这个，我可以很快跳转到旧版本。

正如你所看到的，AWS S3并不是为全文搜索而设计的，它是一个简单的存储服务。

2013-02-09 09:19:05

这不是一个技术性的答案，但我已经构建了一个允许通配符搜索的应用程序:https://bucketsearch.net/

它将异步索引您的bucket，然后允许您搜索结果。

它是免费使用的(捐赠软件)。

2020-06-11 13:04:54

这是一个有点旧的话题——但也许可以帮助那些仍然在搜索的人——我就是那个搜索了一年的人。

解决方案可能是“AWS Athena”，您可以像这样搜索数据

'SELECT user_name FROM S3Object WHERE cast(age as int) > 20'

https://aws.amazon.com/blogs/developer/introducing-support-for-amazon-s3-select-in-the-aws-sdk-for-javascript/

目前1TB数据的价格是5美元——所以举例来说，如果你的查询搜索超过一个1TB文件的3倍你的成本是15美元——但举例来说，如果你想要读取的只有1列“转换柱状格式”，你将支付价格的1/3，即1.67美元/TB。

2019-07-04 13:57:50

S3没有原生的“搜索此桶”，因为实际内容是未知的-此外，由于S3是基于键/值的，因此没有原生的方法可以一次访问多个节点，而更传统的数据存储提供了一个(SELECT * FROM…(在SQL模型中)。

您需要做的是执行ListBucket以获得bucket中对象的列表，然后遍历每个项，执行您实现的自定义操作—这就是您的搜索。

2011-02-12 16:52:19

有多种选择，没有一个是简单的“一次性”全文解决方案:

Key name pattern search: Searching for keys starting with some string- if you design key names carefully, then you may have rather quick solution. Search metadata attached to keys: when posting a file to AWS S3, you may process the content, extract some meta information and attach this meta information in form of custom headers into the key. This allows you to fetch key names and headers without need to fetch complete content. The search has to be done sequentialy, there is no "sql like" search option for this. With large files this could save a lot of network traffic and time. Store metadata on SimpleDB: as previous point, but with storing the metadata on SimpleDB. Here you have sql like select statements. In case of large data sets you may hit SimpleDB limits, which can be overcome (partition metadata across multiple SimpleDB domains), but if you go really far, you may need to use another metedata type of database. Sequential full text search of the content - processing all the keys one by one. Very slow, if you have too many keys to process.

几年来，我们每天存储1440个版本的文件(每分钟一个)，使用版本化桶，这是很容易实现的。但要获得一些较旧的版本需要时间，因为人们必须一个版本一个版本地按顺序进行。有时我使用简单的CSV记录索引，显示发布时间和版本id，有了这个，我可以很快跳转到旧版本。

正如你所看到的，AWS S3并不是为全文搜索而设计的，它是一个简单的存储服务。

2013-02-09 09:19:05

下面是一个使用AWS CLI搜索文件名的简短而丑陋的方法:

aws s3 ls s3://your-bucket --recursive | grep your-search | cut -c 32-

2016-05-14 18:33:38

如何搜索亚马逊s3桶?

推荐文章

最新文章

标签