我想存储一个JSON有效载荷到redis。有两种方法可以做到这一点:

一个使用简单的字符串键和值。 key:user, value:payload(整个JSON blob,可以是100-200 KB) SET用户:1个有效负载 使用散列 HSET用户:1个用户名“someone” HSET用户:1个位置“NY” HSET用户:1个生物“字符串超过100行”

请记住,如果我使用哈希,值的长度是不可预测的。它们并不都像上面的生物例子那样短。

哪个内存效率更高?使用字符串键和值,还是使用散列?


这取决于你如何访问数据:

选择选项1:

如果在大多数访问中使用大多数字段。 如果可能的键有差异

选择选项2:

如果您在大多数访问中只使用单个字段。 如果你总是知道哪些字段是可用的

附注:根据经验,选择在大多数用例上需要较少查询的选项。


这篇文章可以在这里提供很多见解:http://redis.io/topics/memory-optimization

在Redis中有很多方法来存储对象数组(剧透:我喜欢选项1的大多数用例):

Store the entire object as JSON-encoded string in a single key and keep track of all Objects using a set (or list, if more appropriate). For example: INCR id:users SET user:{id} '{"name":"Fred","age":25}' SADD users {id} Generally speaking, this is probably the best method in most cases. If there are a lot of fields in the Object, your Objects are not nested with other Objects, and you tend to only access a small subset of fields at a time, it might be better to go with option 2. Advantages: considered a "good practice." Each Object is a full-blown Redis key. JSON parsing is fast, especially when you need to access many fields for this Object at once. Disadvantages: slower when you only need to access a single field. Store each Object's properties in a Redis hash. INCR id:users HMSET user:{id} name "Fred" age 25 SADD users {id} Advantages: considered a "good practice." Each Object is a full-blown Redis key. No need to parse JSON strings. Disadvantages: possibly slower when you need to access all/most of the fields in an Object. Also, nested Objects (Objects within Objects) cannot be easily stored. Store each Object as a JSON string in a Redis hash. INCR id:users HMSET users {id} '{"name":"Fred","age":25}' This allows you to consolidate a bit and only use two keys instead of lots of keys. The obvious disadvantage is that you can't set the TTL (and other stuff) on each user Object, since it is merely a field in the Redis hash and not a full-blown Redis key. Advantages: JSON parsing is fast, especially when you need to access many fields for this Object at once. Less "polluting" of the main key namespace. Disadvantages: About same memory usage as #1 when you have a lot of Objects. Slower than #2 when you only need to access a single field. Probably not considered a "good practice." Store each property of each Object in a dedicated key. INCR id:users SET user:{id}:name "Fred" SET user:{id}:age 25 SADD users {id} According to the article above, this option is almost never preferred (unless the property of the Object needs to have specific TTL or something). Advantages: Object properties are full-blown Redis keys, which might not be overkill for your app. Disadvantages: slow, uses more memory, and not considered "best practice." Lots of polluting of the main key namespace.

整体总结

选项4通常不受欢迎。选项1和2非常相似,而且都很常见。我更喜欢选项1(一般来说),因为它允许你存储更复杂的对象(具有多层嵌套等),选项3用于当你真正关心不污染主键名称空间(即,你不希望在你的数据库中有很多键,你不关心像TTL,键分片,或任何事情)。

如果我在这里写错了什么,请考虑留下评论,并允许我在投票之前修改答案。谢谢!:)


对给定答案的补充:

首先,如果你想有效地使用Redis哈希,你必须知道 一个键计数的最大数量和值的最大大小-否则,如果他们打破哈希-max-ziplist-value或哈希-max-ziplist-entries Redis将转换为实际上通常的键/值对在引子下。(参见hash-max-ziplist-value, hash-max-ziplist-entries)并且从哈希选项中破坏是非常糟糕的,因为在Redis中每个常见的键/值对每对使用+90字节。

这意味着如果您从选项2开始,并且不小心打破max-hash-ziplist-value,您将在用户模型内部的每个属性中获得+90字节!(实际上不是+90,而是+70,见控制台输出)

 # you need me-redis and awesome-print gems to run exact code
 redis = Redis.include(MeRedis).configure( hash_max_ziplist_value: 64, hash_max_ziplist_entries: 512 ).new 
  => #<Redis client v4.0.1 for redis://127.0.0.1:6379/0> 
 > redis.flushdb
  => "OK" 
 > ap redis.info(:memory)
    {
                "used_memory" => "529512",
          **"used_memory_human" => "517.10K"**,
            ....
    }
  => nil 
 # me_set( 't:i' ... ) same as hset( 't:i/512', i % 512 ... )    
 # txt is some english fictionary book around 56K length, 
 # so we just take some random 63-symbols string from it 
 > redis.pipelined{ 10000.times{ |i| redis.me_set( "t:#{i}", txt[rand(50000), 63] ) } }; :done
 => :done 
 > ap redis.info(:memory)
  {
               "used_memory" => "1251944",
         **"used_memory_human" => "1.19M"**, # ~ 72b per key/value
            .....
  }
  > redis.flushdb
  => "OK" 
  # setting **only one value** +1 byte per hash of 512 values equal to set them all +1 byte 
  > redis.pipelined{ 10000.times{ |i| redis.me_set( "t:#{i}", txt[rand(50000), i % 512 == 0 ? 65 : 63] ) } }; :done 
  > ap redis.info(:memory)
   {
               "used_memory" => "1876064",
         "used_memory_human" => "1.79M",   # ~ 134 bytes per pair  
          ....
   }
    redis.pipelined{ 10000.times{ |i| redis.set( "t:#{i}", txt[rand(50000), 65] ) } };
    ap redis.info(:memory)
    {
             "used_memory" => "2262312",
          "used_memory_human" => "2.16M", #~155 byte per pair i.e. +90 bytes    
           ....
    }

对于TheHippo的答案,选项一的评论具有误导性:

Hgetall /hmset/hmget可以在需要所有字段或多个get/set操作时救场。

对于BMiner的答案。

第三个选项实际上非常有趣,对于max(id) < has-max-ziplist-value的数据集,这个解决方案具有O(N)的复杂性,因为,令人惊讶的是,Reddis将小哈希存储为长度/键/值对象的数组类容器!

但是很多时候哈希只包含几个字段。当哈希值很小时,我们可以将它们编码为O(N)数据结构,就像带有长度前缀键值对的线性数组一样。因为我们只在N很小的时候才这样做,HGET和HSET命令的平摊时间仍然是O(1):一旦哈希表中包含的元素数量增长太多,哈希表就会被转换成真正的哈希表

但是你不用担心,你很快就会打破hash-max-ziplist-entries你现在实际上得到了第一个解。

第二个选择很可能是第四个解决方案,因为问题指出:

请记住,如果我使用哈希,值的长度是不可预测的。它们并不都像上面的生物例子那样短。

正如你已经说过的:第四个解决方案是最昂贵的+70字节每个属性肯定。

我的建议是如何优化这样的数据集:

你有两个选择:

If you cannot guarantee max size of some user attributes then you go for first solution, and if memory matter is crucial then compress user json before storing in redis. If you can force max size of all attributes. Then you can set hash-max-ziplist-entries/value and use hashes either as one hash per user representation OR as hash memory optimization from this topic of a Redis guide: https://redis.io/topics/memory-optimization and store user as json string. Either way you may also compress long user attributes.


我们在生产环境中遇到了类似的问题,我们提出了一个想法,如果有效负载超过某个阈值KB,就压缩有效负载。

我有一个回购仅专用于这个Redis客户端库在这里

基本的思想是检测负载,如果大小大于某个阈值,然后gzip它,也base-64它,然后在redis中保持压缩字符串作为正常字符串。在检索时,检测字符串是否为有效的base-64字符串,如果是,则解压它。

整个压缩和解压将是透明的,加上您将获得接近50%的网络流量

压缩基准测试结果


BenchmarkDotNet=v0.12.1, OS=macOS 11.3 (20E232) [Darwin 20.4.0]
Intel Core i7-9750H CPU 2.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.201
  [Host] : .NET Core 3.1.13 (CoreCLR 4.700.21.11102, CoreFX 4.700.21.11602), X64 RyuJIT DEBUG


Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
WithCompressionBenchmark 668.2 ms 13.34 ms 27.24 ms - - - 4.88 MB
WithoutCompressionBenchmark 1,387.1 ms 26.92 ms 37.74 ms - - - 2.39 MB

要在Redis中存储JSON,您可以使用Redis JSON模块。

这会给你:

完全支持JSON标准 用于在文档中选择/更新元素的JSONPath语法 文档存储为树状结构中的二进制数据,允许快速访问子元素 所有JSON值类型的类型化原子操作

https://redis.io/docs/stack/json/

https://developer.redis.com/howtos/redisjson/getting-started/

https://redis.com/blog/redisjson-public-preview-performance-benchmarking/


你可以使用json模块:https://redis.io/docs/stack/json/ 它是完全支持的,并允许您在redis中使用json作为数据结构。 还有一些语言的Redis对象映射器:https://redis.io/docs/stack/get-started/tutorials/