如果一张图片值1000个单词,那么在140个字符中你能容纳多少图片?

Note: That's it folks! Bounty deadline is here, and after some tough deliberation, I have decided that Boojum's entry just barely edged out Sam Hocevar's. I will post more detailed notes once I've had a chance to write them up. Of course, everyone should feel free to continue to submit solutions and improve solutions for people to vote on. Thank you to everyone who submitted and entry; I enjoyed all of them. This has been a lot of fun for me to run, and I hope it's been fun for both the entrants and the spectators.

我偶然看到了一篇有趣的文章,是关于如何将图片压缩到Twitter评论中,许多人在那个帖子(以及Reddit上的一个帖子)对不同的方法提出了建议。所以,我认为这将是一个很好的编码挑战;让人们把他们的钱放在他们的嘴巴上,并展示他们关于编码的想法如何在有限的空间内带来更多细节。

我向您提出一个通用系统,将图像编码为140个字符的Twitter消息,然后再将它们解码为图像。您可以使用Unicode字符,因此每个字符可以获得8位以上的字节。然而,即使允许使用Unicode字符,也需要将图像压缩到非常小的空间;这肯定是一种有损压缩,因此必须对每个结果看起来有多好进行主观判断。

以下是原作者Quasimondo从编码中得到的结果(图片基于创作共用署名-非商业许可协议):

你能做得更好吗?

规则

Your program must have two modes: encoding and decoding. When encoding: Your program must take as input a graphic in any reasonable raster graphic format of your choice. We'll say that any raster format supported by ImageMagick counts as reasonable. Your program must output a message which can be represented in 140 or fewer Unicode code points; 140 code points in the range U+0000–U+10FFFF, excluding non-characters (U+FFFE, U+FFFF, U+nFFFE, U+nFFFF where n is 1–10 hexadecimal, and the range U+FDD0–U+FDEF) and surrogate code points (U+D800–U+DFFF). It may be output in any reasonable encoding of your choice; any encoding supported by GNU iconv will be considered reasonable, and your platform native encoding or locale encoding would likely be a good choice. See Unicode notes below for more details. When decoding: Your program should take as input the output of your encoding mode. Your program must output an image in any reasonable format of your choice, as defined above, though for output vector formats are OK as well. The image output should be an approximation of the input image; the closer you can get to the input image, the better. The decoding process may have no access to any other output of the encoding process other than the output specified above; that is, you can't upload the image somewhere and output the URL for the decoding process to download, or anything silly like that. For the sake of consistency in user interface, your program must behave as follows: Your program must be a script that can be set to executable on a platform with the appropriate interpreter, or a program that can be compiled into an executable. Your program must take as its first argument either encode or decode to set the mode. Your program must take input in one or more of the following ways (if you implement the one that takes file names, you may also read and write from stdin and stdout if file names are missing): Take input from standard in and produce output on standard out. my-program encode <input.png >output.txt my-program decode <output.txt >output.png Take input from a file named in the second argument, and produce output in the file named in the third. my-program encode input.png output.txt my-program decode output.txt output.png For your solution, please post: Your code, in full, and/or a link to it hosted elsewhere (if it's very long, or requires many files to compile, or something). An explanation of how it works, if it's not immediately obvious from the code or if the code is long and people will be interested in a summary. An example image, with the original image, the text it compresses down to, and the decoded image. If you are building on an idea that someone else had, please attribute them. It's OK to try to do a refinement of someone else's idea, but you must attribute them.

的指导方针

以下是一些可能被打破的规则、建议或评分标准:

Aesthetics are important. I'll be judging, and suggest that other people judge, based on: How good the output image looks, and how much it looks like the original. How nice the text looks. Completely random gobbledigook is OK if you have a really clever compression scheme, but I also want to see answers that turn images into mutli-lingual poems, or something clever like that. Note that the author of the original solution decided to use only Chinese characters, since it looked nicer that way. Interesting code and clever algorithms are always good. I like short, to the point, and clear code, but really clever complicated algorithms are OK too as long as they produce good results. Speed is also important, though not as important as how good a job compressing the image you do. I'd rather have a program that can convert an image in a tenth of a second than something that will be running genetic algorithms for days on end. I will prefer shorter solutions to longer ones, as long as they are reasonably comparable in quality; conciseness is a virtue. Your program should be implemented in a language that has a freely-available implementation on Mac OS X, Linux, or Windows. I'd like to be able to run the programs, but if you have a great solution that only runs under MATLAB or something, that's fine. Your program should be as general as possible; it should work for as many different images as possible, though some may produce better results than others. In particular: Having a few images built into the program that it matches and writes a reference to, and then produces the matching image upon decoding, is fairly lame and will only cover a few images. A program that can take images of simple, flat, geometric shapes and decompose them into some vector primitive is pretty nifty, but if it fails on images beyond a certain complexity it is probably insufficiently general. A program that can only take images of a particular fixed aspect ratio but does a good job with them would also be OK, but not ideal. You may find that a black and white image can get more information into a smaller space than a color image. On the other hand, that may limit the types of image it's applicable to; faces come out fine in black and white, but abstract designs may not fare so well. It is perfectly fine if the output image is smaller than the input, while being roughly the same proportion. It's OK if you have to scale the image up to compare it to the original; what's important is how it looks. Your program should produce output that could actually go through Twitter and come out unscathed. This is only a guideline rather than a rule, since I couldn't find any documentation on the precise set of characters supported, but you should probably avoid control characters, funky invisible combining characters, private use characters, and the like.

评分标准

作为我如何在选择我接受的解决方案时对解决方案进行排名的一般指南,让我们假设我可能会在25分的范围内评估解决方案(这是非常粗略的,我不会直接打分,只是将其作为一个基本指导方针):

15 points for how well the encoding scheme reproduces a wide range of input images. This is a subjective, aesthetic judgement 0 means that it doesn't work at all, it gives the same image back every time, or something 5 means that it can encode a few images, though the decoded version looks ugly and it may not work at all on more complicated images 10 means that it works on a wide range of images, and produces pleasant looking images which may occasionally be distinguishable 15 means that it produces perfect replicas of some images, and even for larger and more complex images, gives something that is recognizable. Or, perhaps it does not make images that are quite recognizable, but produces beautiful images that are clearly derived from the original. 3 points for clever use of the Unicode character set 0 points for simply using the entire set of allowed characters 1 point for using a limited set of characters that are safe for transfer over Twitter or in a wider variety of situations 2 points for using a thematic subset of characters, such as only Han ideographs or only right-to-left characters 3 points for doing something really neat, like generating readable text or using characters that look like the image in question 3 points for clever algorithmic approaches and code style 0 points for something that is 1000 lines of code only to scale the image down, treat it as 1 bit per pixel, and base64 encode that 1 point for something that uses a standard encoding technique and is well written and brief 2 points for something that introduces a relatively novel encoding technique, or that is surprisingly short and clean 3 points for a one liner that actually produces good results, or something that breaks new ground in graphics encoding (if this seems like a low number of points for breaking new ground, remember that a result this good will likely have a high score for aesthetics as well) 2 points for speed. All else being equal, faster is better, but the above criteria are all more important than speed 1 point for running on free (open source) software, because I prefer free software (note that C# will still be eligible for this point as long as it runs on Mono, likewise MATLAB code would be eligible if it runs on GNU Octave) 1 point for actually following all of the rules. These rules have gotten a bit big and complicated, so I'll probably accept otherwise good answers that get one small detail wrong, but I will give an extra point to any solution that does actually follow all of the rules

参考图片

有些人要求一些参考图片。这里有一些参考图片,你可以尝试一下;这里嵌入了较小的版本,如果你需要,它们都链接到较大版本的图像:

我提供500代表赏金(加上50 StackOverflow踢),我最喜欢的解决方案,基于上述标准。当然,我也鼓励其他人在这里投票选出他们最喜欢的解决方案。

截止日期说明

This contest will run until the bounty runs out, about 6 PM on Saturday, May 30. I can't say the precise time it will end; it may be anywhere from 5 to 7 PM. I will guarantee that I'll look at all entries submitted by 2 PM, and I will do my best to look at all entries submitted by 4 PM; if solutions are submitted after that, I may not have a chance to give them a fair look before I have to make my decision. Also, the earlier you submit, the more chance you will have for voting to be able to help me pick the best solution, so try and submit earlier rather than right at the deadline.

Unicode的笔记

There has also been some confusion on exactly what Unicode characters are allowed. The range of possible Unicode code points is U+0000 to U+10FFFF. There are some code points which are never valid to use as Unicode characters in any open interchange of data; these are the noncharacters and the surrogate code points. Noncharacters are defined in the Unidode Standard 5.1.0 section 16.7 as the values U+FFFE, U+FFFF, U+nFFFE, U+nFFFF where n is 1–10 hexadecimal, and the range U+FDD0–U+FDEF. These values are intended to be used for application-specific internal usage, and conforming applications may strip these characters out of text processed by them. Surrogate code points, defined in the Unicode Standard 5.1.0 section 3.8 as U+D800–U+DFFF, are used for encoding characters beyond the Basic Multilingual Plane in UTF-16; thus, it is impossible to represent these code points directly in the UTF-16 encoding, and it is invalid to encode them in any other encoding. Thus, for the purpose of this contest, I will allow any program which encodes images into a sequence of no more than 140 Unicode code points from the range U+0000–U+10FFFF, excluding all noncharacters and surrogate pairs as defined above.

I will prefer solutions that use only assigned characters, and even better ones that use clever subsets of assigned characters or do something interesting with the character set they use. For a list of assigned characters, see the Unicode Character Database; note that some characters are listed directly, while some are listed only as the start and end of a range. Also note that surrogate code points are listed in the database, but forbidden as mentioned above. If you would like to take advantage of certain properties of characters for making the text you output more interesting, there are a variety of databases of character information available, such as a list of named code blocks and various character properties.

Since Twitter does not specify the exact character set they support, I will be lenient about solutions which do not actually work with Twitter because certain characters count extra or certain characters are stripped. It is preferred but not required that all encoded outputs should be able to be transferred unharmed via Twitter or another microblogging service such as identi.ca. I have seen some documentation stating that Twitter entity-encodes <, >, and &, and thus counts those as 4, 4, and 5 characters respectively, but I have not tested that out myself, and their JavaScript character counter doesn't seem to count them that way.

提示和链接

The definition of valid Unicode characters in the rules is a bit complicated. Choosing a single block of characters, such as CJK Unified Ideographs (U+4E00–U+9FCF) may be easier. You may use existing image libraries, like ImageMagick or Python Imaging Library, for your image manipulation. If you need some help understanding the Unicode character set and its various encodings, see this quick guide or this detailed FAQ on UTF-8 in Linux and Unix. The earlier you get your solution in, the more time I (and other people voting) will have to look at it. You can edit your solution if you improve it; I'll base my bounty on the most recent version when I take my last look through the solutions. If you want an easy image format to parse and write (and don't want to just use an existing format), I'd suggest using the PPM format. It's a text based format that's very easy to work with, and you can use ImageMagick to convert to and from it.


当前回答

我的完整解决方案可以在http://caca.zoy.org/wiki/img2twit上找到。它具有以下特点:

压缩时间合理(高质量1分钟左右) 快速解压(几分之一秒) 保持原始图像大小(不仅仅是纵横比) 体面的重建质量(IMHO) 消息长度和字符集(ASCII, CJK,符号)可以在运行时选择 消息长度和字符集在解压时自动检测 非常有效的信息包装

http://caca.zoy.org/raw-attachment/wiki/img2twit/so-logo.png http://caca.zoy.org/raw-attachment/wiki/img2twit/twitter4.png Lizard 秓 鋖 chopsticks 聝 are the refuge of the weak rein 偺 腶 Huo Bu 祩 xi 靊 fallen 獜 world magic may 厎 趆 often 搇 梄 踥 桻 Richard 戂 pu ð ª after  Hong in Gu 骿 苸 high cut city 簶 ladies 粭 浧 turtle catching 弫 chao yan 蚙 Yue arashi 玧 霫 鏓 蓕 play debt 鼶 襋 躻 bending do foot division method 旍 凼 soaring drive according to 籂 掔 qing poetry yan ð «  including 婻 tsubaki 糢 墤 渽 Ruan give more when Yu wu 婩 缣 while (璙 cup 翉 珸 齸 Tuo star throw Benjamin Zhan 舥 攩 寉 鈶 兓 court 璱 Xing 鰀 dry PI 耓 庁 喆 蝞 Lie 葌 Jiong lie 蛪 曟 暙 silage with 媏 Hu Su 慸 jar Yin rein 殾 ð

以下是对编码过程的粗略概述:

The number of available bits is computed from desired message length and usable charset The source image is segmented into as many square cells as the available bits permit A fixed number of points (currently 2) is affected to each cell, with initial coordinates and colour values The following is repeated until a quality condition is met: A point is chosen a random An operation is performed at random on this point (moving it inside its cell, changing its colour) If the resulting image (see the decoding process below) is closer to the source image, the operation is kept The image size and list of points is encoded in UTF-8

这是解码的过程:

图像大小和点从UTF-8流中读取 对于目标图像中的每个像素: 计算出自然邻居的列表 像素的最终颜色被设定为其自然邻居颜色的加权平均值

What I believe is the most original part of the program is the bitstream. Instead of packing bit-aligned values (stream <<= shift; stream |= value), I pack arbitrary values that are not in power-of-two ranges (stream *= range; stream += value). This requires bignum computations and is of course a lot slower, but it gives me 2009.18 bits instead of 1960 when using the 20902 main CJK characters (that's three more points I can put in the data). And when using ASCII, it gives me 917.64 bits instead of 840.

我决定放弃一种需要重型武器(角落检测、特征提取、颜色量化……)的初始图像计算方法,因为一开始我不确定它是否真的有用。现在我意识到收敛是缓慢的(1分钟是可以接受的,但它仍然很慢),我可能会尝试改善这一点。

The main fitting loop is loosely inspired from the Direct Binary Seach dithering algorithm (where pixels are randomly swapped or flipped until a better halftone is obtained). The energy computation is a simple root-mean-square distance, but I perform a 5x5 median filter on the original image first. A Gaussian blur would probably better represent the human eye behaviour, but I didn't want to lose sharp edges. I also decided against simulated annealing or other difficult to tune methods because I don't have months to calibrate the process. Thus the "quality" flag just represents the number of iterations that are performed on each point before the encoder ends.

http://caca.zoy.org/raw-attachment/wiki/img2twit/Mona_Lisa_scaled.jpg http://caca.zoy.org/raw-attachment/wiki/img2twit/twitter2.png Pyrene 憗 chuai 嶕 繠 剳 腏 basket wet Chai 霮 墧 蒆 棌 杚 蓳 䌸 camphor given food 飗 when 砃 Qiao ren Tiao Tong 釰 fog Pi 貜 stubborn 掝 喗 讄 Rao 砙 矺 敨 鷾 Ying heng liao conclusion Yun 簵 Lu 嫤 hinge 俇 excitation 躙 Wu powers 甮 Kang Bei Buddha fool pig Shen Zong 嫥 ð « „ § Jue often live in 堭 property 箽 ochre 飉 nerd Cheng clamp 窂 ð «  ‹ Biao Gan the rafter Qiao navigation 玴 felt Shu 頢 lamb kai 墎 嬔 Cuan Pian 瑥 Jian 呍 Qu 抲 cerulean 秓 Bi velvet ester 嵞 Luan wu dirt then 酼 Biao 菛 qi coffins

尽管不是所有的图像都压缩得很好,但我对结果感到惊讶,我真的很想知道还有什么其他方法可以将图像压缩到250字节。

我还拥有编码器状态从随机初始状态和从“良好”初始状态演变的小电影。

编辑:这里是如何压缩方法与JPEG比较。左边是james的536字节图片。在右边,蒙娜丽莎使用这里描述的方法压缩到534字节(这里提到的字节指的是数据字节,因此忽略了使用Unicode字符浪费的比特):

http://caca.zoy.org/raw-attachment/wiki/img2twit/minimona.jpg http://caca.zoy.org/raw-attachment/wiki/img2twit/minimona2.png

编辑:刚刚将CJK文本替换为最新版本的图像。

其他回答

我的解决方案概述如下:

I start with calculating the maximum amount of raw data that you can fit into 140 utf8 characters. (I am assuming utf8, which is what the original website claimed twitter stored it's messages in. This differs from the problem statement above, which asks for utf16.) Using this utf8 faq, I calculate that the maximum number of bits you can encode in a single utf8 character is 31 bits. In order to do this, I would use all characters that are in the U-04000000 – U-7FFFFFFF range. (1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx, there are 31 x's, therefore I could encode up to 31 bits). 31 bits times 140 characters equals 4340 bits. Divide that by 8 to get 524.5, and round that down to 542 bytes. (If we restrict ourselves to utf16, then we could only store 2 bytes per character, which would equal 280 bytes). Compress the image down using standard jpg compression. Resize the image to be approximately 50x50px, then attempt to compress it at various compression levels until you have an image that is as close to 542 bytes as possible without going over. This is an example of the mona lisa compressed down to 536 bytes. Encode the raw bits of the compressed image into utf-8 characters. Replace each x in the following bytes: 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx, with the bits from the image. This part would probably be the part where the majority of the code would need to be written, because there isn't anything that currently exists that does this.

我知道你们要的是代码,但我不想花时间写代码。我想一个有效的设计至少可以激励其他人编写代码。

我认为我提出的解决方案的主要好处是,它重用了尽可能多的现有技术。尝试编写一个好的压缩算法可能很有趣,但肯定会有更好的算法,很可能是由拥有高等数学学位的人编写的。

另一个重要的注意事项是,如果决定utf16是首选编码,那么这个解决方案就会失败。当压缩到280字节时,jpeg就不能正常工作了。不过,对于这个特定的问题语句,可能有比jpg更好的压缩算法。

Roger Alsing写的这个遗传算法有一个很好的压缩比,代价是很长的压缩时间。得到的顶点向量可以使用有损或无损算法进一步压缩。

http://rogeralsing.com/2008/12/07/genetic-programming-evolution-of-mona-lisa/

这将是一个很有趣的程序,但我还是算了。

这里的压缩效果很好。

http://www.intuac.com/userport/john/apt/

http://img86.imageshack.us/img86/4169/imagey.jpg http://img86.imageshack.us/img86/4169/imagey.jpg

我使用以下批处理文件:

capt mona-lisa-large.pnm out.cc 20
dapt out.cc image.pnm
Pause

结果文件大小为559字节。

好吧,这是我的:nanocrunch.cpp和CMakeLists.txt文件,使用CMake构建它。它依赖magick++ ImageMagick API进行大部分图像处理。它还需要用于字符串编码的大字节算法的GMP库。

我基于分形图像压缩的解决方案,有一些独特的扭曲。基本思路是将图像缩小到50%,然后在各个方向上寻找与原始图像中不重叠的块相似的部分。它对这个搜索采取了非常强力的方法,但这只是让我更容易介绍我的修改。

第一个修改是,我的程序不仅考虑90度的旋转和翻转,还考虑45度的方向。每块多一个比特,但它极大地提高了图像质量。

另一件事是,为每个块的每个颜色组件存储对比度/亮度调整太昂贵了。相反,我存储了一个重度量化的颜色(调色板只有4 * 4 * 4 = 64种颜色),它只是以某种比例混合在一起。从数学上讲,这相当于对每种颜色进行可变亮度和恒定对比度的调整。不幸的是,这也意味着没有负对比来翻转颜色。

一旦计算出每个块的位置、方向和颜色,它就会将其编码为UTF-8字符串。首先,它生成一个非常大的bigum来表示块表中的数据和图像大小。这个方法类似于Sam Hocevar的解——一个基数随位置变化的大数。

Then it converts that into a base of whatever the size of the character set available is. By default, it makes full use of the assigned unicode character set, minus the less than, greater than, ampersand, control, combining, and surrogate and private characters. It's not pretty but it works. You can also comment out the default table and select printable 7-bit ASCII (again excluding <, >, and & characters) or CJK Unified Ideographs instead. The table of which character codes are available is stored a run-length encoded with alternating runs of invalid and valid characters.

Anyway, here are some images and times (as measured on my old 3.0GHz P4), and compressed to 140 characters in the full assigned unicode set described above. Overall, I'm fairly pleased with how they all turned out. If I had more time to work on this, I'd probably try to reduce the blockiness of the decompressed images. Still, I think the results are pretty good for the extreme compression ratio. The decompressed images are bit impressionistic, but I find it relatively easy to see how bits correspond to the original.

Stack Overflow Logo (8.6s编码,7.9s解码,485字节): http://i44.tinypic.com/2w7lok1.png

Lena(32.8秒编码,13.0秒解码,477字节): http://i42.tinypic.com/2rr49wg.png http://i40.tinypic.com/2rhxxyu.png

《蒙娜丽莎》(43.2秒编码,14.5秒解码,490字节) http://i41.tinypic.com/ekgwp3.png http://i43.tinypic.com/ngsxep.png

编辑:CJK统一字符

Sam在评论中问关于在CJK中使用这个。以下是从CJK统一字符集压缩到139个字符的《蒙娜丽莎》版本:

http://i43.tinypic.com/2yxgdfk.png Xu Lin 驞 wailing amidine ð ª ‰  according to 蛥 㶉 kink Qu 朖 辿 Korea storage epilepsy squid slanting Yi 㻅 ð ¦ ˆ  urea 蕜 held last releasing frequency smartweed debt 鑡 Zi Duo 靊 obscure ð ª ¸ 嚵 Yue poly Tui 慛 絖 makes 馿 Xie 櫰 Jue Yun Wen we Ying 粂 敽 tooth 稉 qing 蔍 螎 Xiang Bu 覧 light ð «   refrigerator vigilant 冧 笻 Ji mixed Yun core 譶 Hui dropping out Gui Yi 偞 媄 child Yu 梀 ð « – ’ scythe 猳 閺 Sheng and shan beak 伆 杇 婣 sucking 鐤 諽 鷍 owl ð « ˜ ¡ rob 毤 two-string fiddle 誖 terpene wish 珵 仾 xenon 熜 Huo 繴 Hui jin mustache 杍 嚖 Tong schoenberg 縿 bread 珝 dad 擸 萿

我使用的程序顶部的调优参数是:19,19,4,4,3,10,11,1000,1000。我还注释掉了number_assigned和codes的第一个定义,并取消注释掉了它们的最后一个定义,以选择CJK统一字符集。

关于这个挑战的编码/解码部分。 base16b.org是我试图指定一种标准方法,用于安全有效地在较高的Unicode平面中编码二进制数据。

一些特点:

只使用Unicode的私有用户区域 编码高达17位每个字符;几乎是Base64的三倍 提供了encode/decode的参考Javascript实现 包括一些示例编码,包括Twitter和Wordpress

对不起,这个答案对于最初的比赛来说太晚了。我独立于这篇文章开始了这个项目,这是我在中途发现的。