如何搜索亚马逊s3桶?

我有一个装满了上千份文件的桶。我如何搜索水桶?

当前回答

(至少)有两个不同的用例可以描述为“搜索桶”:

Search for something inside every object stored at the bucket; this assumes a common format for all the objects in that bucket (say, text files), etc etc. For something like this, you're forced to do what Cody Caughlan just answered. The AWS S3 docs has example code showing how to do this with the AWS SDK for Java: Listing Keys Using the AWS SDK for Java (there you'll also find PHP and C# examples). List item Search for something in the object keys contained in that bucket; S3 does have partial support for this, in the form of allowing prefix exact matches + collapsing matches after a delimiter. This is explained in more detail at the AWS S3 Developer Guide. This allows, for example, to implement "folders" through using as object keys something like folder/subfolder/file.txt If you follow this convention, most of the S3 GUIs (such as the AWS Console) will show you a folder view of your bucket.

2011-02-12 23:21:43

其他回答

考虑到你在AWS…我认为你会想要使用他们的CloudSearch工具。把你想要搜索的数据放到他们的服务中…让它指向S3密钥。

http://aws.amazon.com/cloudsearch/

2012-06-18 03:17:03

另一种选择是在您的web服务器上镜像S3桶并在本地遍历。诀窍在于本地文件是空的，只用作骨架。或者，本地文件可以保存您通常需要从S3获取的有用元数据(例如，文件大小、mimetype、作者、时间戳、uuid)。当您提供下载文件的URL时，在本地搜索，但要提供到S3地址的链接。

本地文件遍历很容易，而且这种用于S3管理的方法与语言无关。本地文件遍历还可以避免维护和查询文件数据库，或者延迟执行一系列远程API调用来验证和获取桶内容。

您可以允许用户通过FTP或HTTP直接将文件上传到您的服务器，然后在非高峰时段通过递归遍历任意大小文件的目录将一批新的和更新的文件传输到Amazon。在完成向Amazon的文件传输后，将web服务器文件替换为同名的空文件。如果一个本地文件有任何文件大小，那么直接提供它，因为它正在等待批量传输。

2011-09-20 17:43:48

使用Amazon Athena查询S3桶。另外，加载数据到Amazon Elastic搜索。希望这能有所帮助。

2020-05-01 15:36:43

这是一个有点旧的话题——但也许可以帮助那些仍然在搜索的人——我就是那个搜索了一年的人。

解决方案可能是“AWS Athena”，您可以像这样搜索数据

'SELECT user_name FROM S3Object WHERE cast(age as int) > 20'

https://aws.amazon.com/blogs/developer/introducing-support-for-amazon-s3-select-in-the-aws-sdk-for-javascript/

目前1TB数据的价格是5美元——所以举例来说，如果你的查询搜索超过一个1TB文件的3倍你的成本是15美元——但举例来说，如果你想要读取的只有1列“转换柱状格式”，你将支付价格的1/3，即1.67美元/TB。

2019-07-04 13:57:50

有多种选择，没有一个是简单的“一次性”全文解决方案:

Key name pattern search: Searching for keys starting with some string- if you design key names carefully, then you may have rather quick solution. Search metadata attached to keys: when posting a file to AWS S3, you may process the content, extract some meta information and attach this meta information in form of custom headers into the key. This allows you to fetch key names and headers without need to fetch complete content. The search has to be done sequentialy, there is no "sql like" search option for this. With large files this could save a lot of network traffic and time. Store metadata on SimpleDB: as previous point, but with storing the metadata on SimpleDB. Here you have sql like select statements. In case of large data sets you may hit SimpleDB limits, which can be overcome (partition metadata across multiple SimpleDB domains), but if you go really far, you may need to use another metedata type of database. Sequential full text search of the content - processing all the keys one by one. Very slow, if you have too many keys to process.

几年来，我们每天存储1440个版本的文件(每分钟一个)，使用版本化桶，这是很容易实现的。但要获得一些较旧的版本需要时间，因为人们必须一个版本一个版本地按顺序进行。有时我使用简单的CSV记录索引，显示发布时间和版本id，有了这个，我可以很快跳转到旧版本。

正如你所看到的，AWS S3并不是为全文搜索而设计的，它是一个简单的存储服务。

2013-02-09 09:19:05

如何搜索亚马逊s3桶?

推荐文章

最新文章

标签