如果一张图片值1000个单词,那么在140个字符中你能容纳多少图片?

Note: That's it folks! Bounty deadline is here, and after some tough deliberation, I have decided that Boojum's entry just barely edged out Sam Hocevar's. I will post more detailed notes once I've had a chance to write them up. Of course, everyone should feel free to continue to submit solutions and improve solutions for people to vote on. Thank you to everyone who submitted and entry; I enjoyed all of them. This has been a lot of fun for me to run, and I hope it's been fun for both the entrants and the spectators.

我偶然看到了一篇有趣的文章,是关于如何将图片压缩到Twitter评论中,许多人在那个帖子(以及Reddit上的一个帖子)对不同的方法提出了建议。所以,我认为这将是一个很好的编码挑战;让人们把他们的钱放在他们的嘴巴上,并展示他们关于编码的想法如何在有限的空间内带来更多细节。

我向您提出一个通用系统,将图像编码为140个字符的Twitter消息,然后再将它们解码为图像。您可以使用Unicode字符,因此每个字符可以获得8位以上的字节。然而,即使允许使用Unicode字符,也需要将图像压缩到非常小的空间;这肯定是一种有损压缩,因此必须对每个结果看起来有多好进行主观判断。

以下是原作者Quasimondo从编码中得到的结果(图片基于创作共用署名-非商业许可协议):

你能做得更好吗?

规则

Your program must have two modes: encoding and decoding. When encoding: Your program must take as input a graphic in any reasonable raster graphic format of your choice. We'll say that any raster format supported by ImageMagick counts as reasonable. Your program must output a message which can be represented in 140 or fewer Unicode code points; 140 code points in the range U+0000–U+10FFFF, excluding non-characters (U+FFFE, U+FFFF, U+nFFFE, U+nFFFF where n is 1–10 hexadecimal, and the range U+FDD0–U+FDEF) and surrogate code points (U+D800–U+DFFF). It may be output in any reasonable encoding of your choice; any encoding supported by GNU iconv will be considered reasonable, and your platform native encoding or locale encoding would likely be a good choice. See Unicode notes below for more details. When decoding: Your program should take as input the output of your encoding mode. Your program must output an image in any reasonable format of your choice, as defined above, though for output vector formats are OK as well. The image output should be an approximation of the input image; the closer you can get to the input image, the better. The decoding process may have no access to any other output of the encoding process other than the output specified above; that is, you can't upload the image somewhere and output the URL for the decoding process to download, or anything silly like that. For the sake of consistency in user interface, your program must behave as follows: Your program must be a script that can be set to executable on a platform with the appropriate interpreter, or a program that can be compiled into an executable. Your program must take as its first argument either encode or decode to set the mode. Your program must take input in one or more of the following ways (if you implement the one that takes file names, you may also read and write from stdin and stdout if file names are missing): Take input from standard in and produce output on standard out. my-program encode <input.png >output.txt my-program decode <output.txt >output.png Take input from a file named in the second argument, and produce output in the file named in the third. my-program encode input.png output.txt my-program decode output.txt output.png For your solution, please post: Your code, in full, and/or a link to it hosted elsewhere (if it's very long, or requires many files to compile, or something). An explanation of how it works, if it's not immediately obvious from the code or if the code is long and people will be interested in a summary. An example image, with the original image, the text it compresses down to, and the decoded image. If you are building on an idea that someone else had, please attribute them. It's OK to try to do a refinement of someone else's idea, but you must attribute them.

的指导方针

以下是一些可能被打破的规则、建议或评分标准:

Aesthetics are important. I'll be judging, and suggest that other people judge, based on: How good the output image looks, and how much it looks like the original. How nice the text looks. Completely random gobbledigook is OK if you have a really clever compression scheme, but I also want to see answers that turn images into mutli-lingual poems, or something clever like that. Note that the author of the original solution decided to use only Chinese characters, since it looked nicer that way. Interesting code and clever algorithms are always good. I like short, to the point, and clear code, but really clever complicated algorithms are OK too as long as they produce good results. Speed is also important, though not as important as how good a job compressing the image you do. I'd rather have a program that can convert an image in a tenth of a second than something that will be running genetic algorithms for days on end. I will prefer shorter solutions to longer ones, as long as they are reasonably comparable in quality; conciseness is a virtue. Your program should be implemented in a language that has a freely-available implementation on Mac OS X, Linux, or Windows. I'd like to be able to run the programs, but if you have a great solution that only runs under MATLAB or something, that's fine. Your program should be as general as possible; it should work for as many different images as possible, though some may produce better results than others. In particular: Having a few images built into the program that it matches and writes a reference to, and then produces the matching image upon decoding, is fairly lame and will only cover a few images. A program that can take images of simple, flat, geometric shapes and decompose them into some vector primitive is pretty nifty, but if it fails on images beyond a certain complexity it is probably insufficiently general. A program that can only take images of a particular fixed aspect ratio but does a good job with them would also be OK, but not ideal. You may find that a black and white image can get more information into a smaller space than a color image. On the other hand, that may limit the types of image it's applicable to; faces come out fine in black and white, but abstract designs may not fare so well. It is perfectly fine if the output image is smaller than the input, while being roughly the same proportion. It's OK if you have to scale the image up to compare it to the original; what's important is how it looks. Your program should produce output that could actually go through Twitter and come out unscathed. This is only a guideline rather than a rule, since I couldn't find any documentation on the precise set of characters supported, but you should probably avoid control characters, funky invisible combining characters, private use characters, and the like.

评分标准

作为我如何在选择我接受的解决方案时对解决方案进行排名的一般指南,让我们假设我可能会在25分的范围内评估解决方案(这是非常粗略的,我不会直接打分,只是将其作为一个基本指导方针):

15 points for how well the encoding scheme reproduces a wide range of input images. This is a subjective, aesthetic judgement 0 means that it doesn't work at all, it gives the same image back every time, or something 5 means that it can encode a few images, though the decoded version looks ugly and it may not work at all on more complicated images 10 means that it works on a wide range of images, and produces pleasant looking images which may occasionally be distinguishable 15 means that it produces perfect replicas of some images, and even for larger and more complex images, gives something that is recognizable. Or, perhaps it does not make images that are quite recognizable, but produces beautiful images that are clearly derived from the original. 3 points for clever use of the Unicode character set 0 points for simply using the entire set of allowed characters 1 point for using a limited set of characters that are safe for transfer over Twitter or in a wider variety of situations 2 points for using a thematic subset of characters, such as only Han ideographs or only right-to-left characters 3 points for doing something really neat, like generating readable text or using characters that look like the image in question 3 points for clever algorithmic approaches and code style 0 points for something that is 1000 lines of code only to scale the image down, treat it as 1 bit per pixel, and base64 encode that 1 point for something that uses a standard encoding technique and is well written and brief 2 points for something that introduces a relatively novel encoding technique, or that is surprisingly short and clean 3 points for a one liner that actually produces good results, or something that breaks new ground in graphics encoding (if this seems like a low number of points for breaking new ground, remember that a result this good will likely have a high score for aesthetics as well) 2 points for speed. All else being equal, faster is better, but the above criteria are all more important than speed 1 point for running on free (open source) software, because I prefer free software (note that C# will still be eligible for this point as long as it runs on Mono, likewise MATLAB code would be eligible if it runs on GNU Octave) 1 point for actually following all of the rules. These rules have gotten a bit big and complicated, so I'll probably accept otherwise good answers that get one small detail wrong, but I will give an extra point to any solution that does actually follow all of the rules

参考图片

有些人要求一些参考图片。这里有一些参考图片,你可以尝试一下;这里嵌入了较小的版本,如果你需要,它们都链接到较大版本的图像:

我提供500代表赏金(加上50 StackOverflow踢),我最喜欢的解决方案,基于上述标准。当然,我也鼓励其他人在这里投票选出他们最喜欢的解决方案。

截止日期说明

This contest will run until the bounty runs out, about 6 PM on Saturday, May 30. I can't say the precise time it will end; it may be anywhere from 5 to 7 PM. I will guarantee that I'll look at all entries submitted by 2 PM, and I will do my best to look at all entries submitted by 4 PM; if solutions are submitted after that, I may not have a chance to give them a fair look before I have to make my decision. Also, the earlier you submit, the more chance you will have for voting to be able to help me pick the best solution, so try and submit earlier rather than right at the deadline.

Unicode的笔记

There has also been some confusion on exactly what Unicode characters are allowed. The range of possible Unicode code points is U+0000 to U+10FFFF. There are some code points which are never valid to use as Unicode characters in any open interchange of data; these are the noncharacters and the surrogate code points. Noncharacters are defined in the Unidode Standard 5.1.0 section 16.7 as the values U+FFFE, U+FFFF, U+nFFFE, U+nFFFF where n is 1–10 hexadecimal, and the range U+FDD0–U+FDEF. These values are intended to be used for application-specific internal usage, and conforming applications may strip these characters out of text processed by them. Surrogate code points, defined in the Unicode Standard 5.1.0 section 3.8 as U+D800–U+DFFF, are used for encoding characters beyond the Basic Multilingual Plane in UTF-16; thus, it is impossible to represent these code points directly in the UTF-16 encoding, and it is invalid to encode them in any other encoding. Thus, for the purpose of this contest, I will allow any program which encodes images into a sequence of no more than 140 Unicode code points from the range U+0000–U+10FFFF, excluding all noncharacters and surrogate pairs as defined above.

I will prefer solutions that use only assigned characters, and even better ones that use clever subsets of assigned characters or do something interesting with the character set they use. For a list of assigned characters, see the Unicode Character Database; note that some characters are listed directly, while some are listed only as the start and end of a range. Also note that surrogate code points are listed in the database, but forbidden as mentioned above. If you would like to take advantage of certain properties of characters for making the text you output more interesting, there are a variety of databases of character information available, such as a list of named code blocks and various character properties.

Since Twitter does not specify the exact character set they support, I will be lenient about solutions which do not actually work with Twitter because certain characters count extra or certain characters are stripped. It is preferred but not required that all encoded outputs should be able to be transferred unharmed via Twitter or another microblogging service such as identi.ca. I have seen some documentation stating that Twitter entity-encodes <, >, and &, and thus counts those as 4, 4, and 5 characters respectively, but I have not tested that out myself, and their JavaScript character counter doesn't seem to count them that way.

提示和链接

The definition of valid Unicode characters in the rules is a bit complicated. Choosing a single block of characters, such as CJK Unified Ideographs (U+4E00–U+9FCF) may be easier. You may use existing image libraries, like ImageMagick or Python Imaging Library, for your image manipulation. If you need some help understanding the Unicode character set and its various encodings, see this quick guide or this detailed FAQ on UTF-8 in Linux and Unix. The earlier you get your solution in, the more time I (and other people voting) will have to look at it. You can edit your solution if you improve it; I'll base my bounty on the most recent version when I take my last look through the solutions. If you want an easy image format to parse and write (and don't want to just use an existing format), I'd suggest using the PPM format. It's a text based format that's very easy to work with, and you can use ImageMagick to convert to and from it.


当前回答

以下并不是正式的提交,因为我的软件并没有针对指定的任务进行任何调整。DLI可以被描述为一种优化的通用有损图像编解码器。它是图像压缩的PSNR和MS-SSIM记录持有者,我想看看它在这个特定任务中的表现会很有趣。我使用提供的参考蒙娜丽莎图像,并将其缩小到100x150,然后使用DLI将其压缩到344字节。

蒙娜丽莎DLI http://i40.tinypic.com/2md5q4m.png

为了与JPEG和IMG2TWIT压缩样例进行比较,我也使用DLI将图像压缩到534字节。JPEG是536字节,IMG2TWIT是534字节。为了便于比较,图像被放大到大致相同的大小。左边是JPEG图像,中间是IMG2TWIT图像,右边是DLI图像。

比较http://i42.tinypic.com/302yjdg.png

DLI图像设法保留了一些面部特征,最著名的是著名的微笑:)。

其他回答

发布单色或灰度图像应该提高图像的大小,可以编码到那个空间,因为你不关心颜色。

可能会增加上传三张图像的挑战,当重新组合时,你会得到一张全彩图像,同时在每张单独的图像中仍然保持单色版本。

在上面添加一些压缩,它可以开始看起来可行…

不错! !你们引起了我的兴趣。今天剩下的时间都没有工作要做。

图像文件和python源代码(版本1和2)

版本1 这是我的第一次尝试。我会随时更新。

我已经把SO标志降到300个字符几乎无损。我的技术使用转换到SVG矢量艺术,所以它在直线艺术上效果最好。它实际上是一个SVG压缩器,它仍然需要原始美术经过矢量化阶段。

在我的第一次尝试中,我使用了一个在线服务来跟踪PNG,但是有许多免费和非免费的工具可以处理这部分,包括potrace(开源)。

以下是结果

原创SO Logo http://www.warriorhut.org/graphics/svg_to_unicode/so-logo.png原创 解码SO Logo http://www.warriorhut.org/graphics/svg_to_unicode/so-logo-decoded.png编码解码后

人物:300

时间:不可测量,但实际上是即时的(不包括矢量化/栅格化步骤)

下一阶段将为每个unicode字符嵌入4个符号(SVG路径点和命令)。目前,我的python构建没有广泛的字符支持UCS4,这限制了我的每个字符的分辨率。我还将最大范围限制在unicode保留范围0xD800的低端,然而,一旦我构建了允许字符的列表和一个过滤器来避免它们,理论上我可以将上面的logo所需的字符数量降低到70-100。

目前这种方法的一个局限性是输出大小不固定。它取决于向量化后的向量节点/点的数量。自动化这一限制将需要对图像进行像素化(这将消除向量的主要好处),或者在简化阶段重复运行路径,直到达到所需的节点数(这是我目前在Inkscape中手动执行的)。

版本2

更新:v2现在有资格竞争。变化:

命令行控制输入/输出和调试 使用XML解析器(lxml)来处理SVG而不是正则表达式 每个unicode符号打包2个路径段 文档和清理 支持style="fill:color"和fill="color" 文档宽度/高度打包成单个字符 路径颜色包装成单个字符 色彩压缩是通过 每次丢弃4位的颜色数据 颜色,然后包装成一个字符通过十六进制转换。

人物:133

时间:几秒钟

v2解码http://www.warriorhut.org/graphics/svg_to_unicode/so-logo-decoded-v2.png编码解码后(版本2)

正如您所看到的,这次有一些人工制品。这不是方法的限制,而是我转换中的某个错误。当点超出0.0 - 127.0范围时,就会发生工件,我试图约束它们的尝试有好有坏。解决方案是简单地缩放图像,但我有问题缩放实际的点,而不是画板或组矩阵,我现在太累了。简而言之,如果你的点在支持的范围内,它通常是可行的。

我相信中间的扭结是由于一个手柄移动到另一边的手柄连接。基本上,这些点一开始就靠得太近了。在压缩源图像之前运行一个简化过滤器可以修复这个问题,并去除一些不必要的字符。

更新: 这种方法适用于简单的对象,所以我需要一种方法来简化复杂的路径并减少噪音。我使用Inkscape来完成这个任务。我曾经用Inkscape剔除了一些不必要的路径,但没有时间尝试自动化。我使用Inkscape的“简化”功能来减少路径的数量,制作了一些样本svgs。

简化工作还可以,但它可能会很慢,因为有这么多路径。

自动跟踪示例http://www.warriorhut.org/graphics/svg_to_unicode/autotrace_16_color_manual_reduction.png康奈尔盒子http://www.warriorhut.com/graphics/svg_to_unicode/cornell_box_simplified.png莉娜http://www.warriorhut.com/graphics/svg_to_unicode/lena_std_washed_autotrace.png

缩略图追踪http://www.warriorhut.org/graphics/svg_to_unicode/competition_thumbnails_autotrace.png

这里有一些超低分辨率的照片。这些将更接近140个字符的限制,尽管一些聪明的路径压缩可能也需要。

培养http://www.warriorhut.org/graphics/svg_to_unicode/competition_thumbnails_groomed.png 简化和轻视。

trianglulated http://www.warriorhut.org/graphics/svg_to_unicode/competition_thumbnails_triangulated.png 简化,去斑点和三角化。

autotrace --output-format svg --output-file cornell_box.svg --despeckle-level 20 --color-count 64 cornell_box.png

上图:使用自动跟踪的简化路径。

不幸的是,我的解析器不处理自动跟踪输出,所以我不知道有多少点在使用或简化到什么程度,遗憾的是,在截止日期之前没有时间写它。它比inkscape输出更容易解析。

存储一堆参考图像的想法很有趣。存储25Mb的样本图像,并让编码器尝试使用这些位来组成图像,这是错误的吗?对于这么小的管道,两端的机器必然要比通过的数据量大得多,那么25Mb的代码和1Mb的代码和24Mb的图像数据之间有什么区别呢?

(请注意,最初的指导方针排除了将输入限制为库中已经存在的图像-我不是这样建议的)。

在最初的挑战中,大小限制的定义是,如果你将文本粘贴到Twitter的文本框中,并按下“更新”键,Twitter仍然允许你发送的内容。正如一些人正确地注意到的那样,这与你用手机发送的短信不同。

What is not explictily mentioned (but what my personal rule was) is that you should be able to select the tweeted message in your browser, copy it to the clipboard and paste it into a text input field of your decoder so it can display it. Of course you are also free to save the message as a text file and read it back in or write a tool which accesses the Twitter API and filters out any message that looks like an image code (special markers anyone? wink wink). But the rule is that the message has to have gone through Twitter before you are allowed to decode it.

祝你好运,这350个字节——我怀疑你能不能利用它们。

以下是我解决这个问题的方法,我必须承认这是一个非常有趣的项目,它绝对超出了我的正常工作范围,给了我一些新的东西去学习。

我的基本想法如下:

向下采样图像灰度,这样总共有16个不同的阴影 在映像上执行RLE 将结果打包为UTF-16字符 对打包的结果执行RLE以删除任何重复字符

事实证明,这是有效的,但从下面的示例图像中可以看到,这只是在有限的范围内。就输出而言,下面是一个示例推文,特别是示例中显示的Lena图像。

乤 乤 arcandor 唂 伂 倂 倁 qi nong 2 companies 倁 3 companies 倁 2 companies 伂 8 companies 伂 3 companies 伂 5 companies 倂 倃 伂 倁 3 qi tuan qi 2 伂 倃 5 companies 倁 3 companies 倃 4 companies 倂 enterprise 倁 enterprise 伂 2 companies 伂 5 companies 倁 enterprise 伂 쥹 皗 鞹 Bei 륶 䦽 阹 럆 䧜 tsubaki 籫 릹 靭 욶 옷 뎷 step 㰷 qian 䴗 Cuan 㞳 鞷 㬼 mongoose 鏙 돗 鍴 祳 㭾 뤶 plunge 焻 � 乹 Ꮛ Dai 䍼

正如你所看到的,我确实试着限制了字符集;然而,在存储图像颜色数据时,我遇到了这样做的问题。此外,这种编码方案也会浪费大量可以用于其他图像信息的数据位。

就运行时间而言,对于较小的图像,代码的运行速度非常快,对于所提供的示例图像大约为55ms,但对于较大的图像,时间确实会增加。对于512x512 Lena参考映像,运行时间为1182ms。我应该指出的是,代码本身并没有很好地优化性能(例如,所有东西都作为位图工作),所以在一些重构之后,时间可能会减少一些。

请随时为我提供任何建议,我可以做得更好或什么可能是错误的代码。运行时和示例输出的完整列表可以在以下位置找到:http://code-zen.info/twitterimage/

更新一个

我已经更新了压缩推文字符串时使用的RLE代码,以做一个基本的回顾,如果是这样,那么使用输出。这只适用于数字值对,但它确实保存了数据的几个字符。运行时间和图像质量大致相同,但推文往往更小一些。测试完成后我会在网站上更新图表。以下是一个推文字符串的例子,同样是莉娜的小版本:

乤乤万乐唂伂倂倁企侬2企倁3企倁禹伂8企伂丁伂5企倂倃伂倁具儁企2伂倃加倁分队倃4企倂企倁企伂配制伂近视眼倁企伂쥹皗鞹鐾륶䦽阹럆䧜椿籫릹第욶옷뎷步㰷歉䴗镩㞳鞷㬼獴鏙돗鍴祳㭾뤶殒焻�乹Ꮛ靆䍼

更新两个

Another small update, but I modified the code to pack the color shades into groups of three as opposed to four, this uses some more space, but unless I'm missing something it should mean that "odd" characters no longer appear where the color data is. Also, I updated the compression a bit more so it can now act upon the entire string as opposed to just the color count block. I'm still testing the run times, but they appear to be nominally improved; however, the image quality is still the same. What follows is the newest version of the Lena tweet:

2 乤 arcandor 唂 伂 倂 倁 qi nong 2 companies 倁 3 companies 倁 ウ 伂 8 companies 伂 エ 伂 5 companies 倂 倃 伂 倁 グ jung enterprise 2 伂 倃 ガ 倁 ジ 倃 4 companies 倂 enterprise 倁 enterprise 伂 ツ 伂 ス 倁 enterprise 伂 坹 live keeps smashing strand 刾 啩 RongLi blow 婩 媷 advised 圿 Guo live 妛 putting casket choke 婣 cold nowadays, camp with polish serve Ji collapse are compared, the female 媗 definitely xing zong Yao 夽 xing 唹 until cold 圶 埫 奫 ð ª after he drank kratos appears

StackOverflow标志http://code-zen.info/twitterimage/images/stackoverflow-logo.bmp康奈尔盒子http://code-zen.info/twitterimage/images/cornell-box.bmp莉娜http://code-zen.info/twitterimage/images/lena.bmp蒙娜丽莎http://code-zen.info/twitterimage/images/mona-lisa.bmp