zlib中使用的压缩算法本质上与gzip和zip中使用的压缩算法相同。gzip和zip是什么?它们有什么不同之处,又有什么相同之处?


当前回答

ZIP是一种文件格式,用于存储任意数量的文件和文件夹以及无损压缩。它对所使用的压缩方法没有严格的假设,但最常用于DEFLATE。

Gzip既是一种基于DEFLATE的压缩算法,但较少受到潜在专利等的阻碍,也是一种用于存储单个压缩文件的文件格式。当与tar结合使用时,它支持压缩任意数量的文件和文件夹。生成的文件扩展名为.tgz或.tar.gz,通常称为tarball。

zlib是一个函数库,它将DEFLATE封装在最常见的LZ77版本中。

其他回答

ZIP是一种文件格式,用于存储任意数量的文件和文件夹以及无损压缩。它对所使用的压缩方法没有严格的假设,但最常用于DEFLATE。

Gzip既是一种基于DEFLATE的压缩算法,但较少受到潜在专利等的阻碍,也是一种用于存储单个压缩文件的文件格式。当与tar结合使用时,它支持压缩任意数量的文件和文件夹。生成的文件扩展名为.tgz或.tar.gz,通常称为tarball。

zlib是一个函数库,它将DEFLATE封装在最常见的LZ77版本中。

最重要的区别是,gzip只能压缩单个文件,而zip可以逐个压缩多个文件,然后将它们归档为单个文件。 因此,gzip在大多数情况下与tar一起出现(不过也有其他可能性)。这也带来了一些好处。

如果您有一个很大的存档,并且只需要其中的一个文件,那么您必须解压缩整个gzip文件才能获得该文件。如果您有一个zip文件,这是不需要的。

另一方面,如果压缩10个相似甚至完全相同的文件,zip归档将会大得多,因为每个文件都是单独压缩的,而在gzip中结合tar压缩单个文件,如果文件相似(相等)则更有效。

简式:

.zip是一种存档格式,通常使用Deflate压缩方法。.gz gzip格式适用于单个文件,也使用Deflate压缩方法。通常,gzip与tar结合使用来生成压缩归档格式。tar.gz。zlib库提供了Deflate压缩和解压代码,供zip、gzip、png(在Deflate数据上使用zlib包装器)和许多其他应用程序使用。

长形式:

ZIP格式是由Phil Katz开发的,是一种具有开放规范的开放格式,他的实现PKZIP是共享软件。它是一种存档格式,存储文件及其目录结构,其中每个文件都单独压缩。文件类型为“。zip”。这些文件以及目录结构都可以被加密。

ZIP格式支持以下几种压缩方法:

    0 - The file is stored (no compression)
    1 - The file is Shrunk
    2 - The file is Reduced with compression factor 1
    3 - The file is Reduced with compression factor 2
    4 - The file is Reduced with compression factor 3
    5 - The file is Reduced with compression factor 4
    6 - The file is Imploded
    7 - Reserved for Tokenizing compression algorithm
    8 - The file is Deflated
    9 - Enhanced Deflating using Deflate64(tm)
   10 - PKWARE Data Compression Library Imploding (old IBM TERSE)
   11 - Reserved by PKWARE
   12 - File is compressed using BZIP2 algorithm
   13 - Reserved by PKWARE
   14 - LZMA
   15 - Reserved by PKWARE
   16 - IBM z/OS CMPSC Compression
   17 - Reserved by PKWARE
   18 - File is compressed using IBM TERSE (new)
   19 - IBM LZ77 z Architecture 
   20 - deprecated (use method 93 for zstd)
   93 - Zstandard (zstd) Compression 
   94 - MP3 Compression 
   95 - XZ Compression 
   96 - JPEG variant
   97 - WavPack compressed data
   98 - PPMd version I, Rev 1
   99 - AE-x encryption marker (see APPENDIX E)

Methods 1 to 7 are historical and are not in use. Methods 9 through 98 are relatively recent additions and are in varying, small amounts of use. The only method in truly widespread use in the ZIP format is method 8, Deflate, and to some smaller extent method 0, which is no compression at all. Virtually every .zip file that you will come across in the wild will use exclusively methods 8 and 0, likely just method 8. (Method 8 also has a means to effectively store the data with no compression and relatively little expansion, and Method 0 cannot be streamed whereas Method 8 can be.)

ISO/IEC 21320-1:2015文件容器标准是一种受限的zip格式,例如用于Java归档文件(.jar), Office Open XML文件(Microsoft Office .docx, .xlsx, .pptx), Office文档格式文件(.odt, .ods, .odp)和EPUB文件(. EPUB)。该标准将压缩方法限制为0和8,以及不加密或签名等其他约束。

大约在1990年,Info-ZIP小组编写了可移植的、免费的、开源的zip和unzip实用程序实现,支持使用Deflate格式进行压缩,并支持该格式和早期格式的解压缩。这极大地扩展了.zip格式的使用。

In the early '90s, the gzip format was developed as a replacement for the Unix compress utility, derived from the Deflate code in the Info-ZIP utilities. Unix compress was designed to compress a single file or stream, appending a .Z to the file name. compress uses the LZW compression algorithm, which at the time was under patent and its free use was in dispute by the patent holders. Though some specific implementations of Deflate were patented by Phil Katz, the format was not, and so it was possible to write a Deflate implementation that did not infringe on any patents. That implementation has not been so challenged in the last 20+ years. The Unix gzip utility was intended as a drop-in replacement for compress, and in fact is able to decompress compress-compressed data (assuming that you were able to parse that sentence). gzip appends a .gz to the file name. gzip uses the Deflate compressed data format, which compresses quite a bit better than Unix compress, has very fast decompression, and adds a CRC-32 as an integrity check for the data. The header format also permits the storage of more information than the compress format allowed, such as the original file name and the file modification time.

Though compress only compresses a single file, it was common to use the tar utility to create an archive of files, their attributes, and their directory structure into a single .tar file, and then compress it with compress to make a .tar.Z file. In fact, the tar utility had and still has the option to do the compression at the same time, instead of having to pipe the output of tar to compress. This all carried forward to the gzip format, and tar has an option to compress directly to the .tar.gz format. The tar.gz format compresses better than the .zip approach, since the compression of a .tar can take advantage of redundancy across files, especially many small files. .tar.gz is the most common archive format in use on Unix due to its very high portability, but there are more effective compression methods in use as well, so you will often see .tar.bz2 and .tar.xz archives.

与.tar不同,.zip在末尾有一个中心目录,它提供了一个内容列表。这和单独的压缩提供了对.zip文件中各个条目的随机访问。为了构建一个目录,.tar文件必须从头到尾进行解压缩和扫描,这就是.tar文件列出的方式。

Shortly after the introduction of gzip, around the mid-1990s, the same patent dispute called into question the free use of the .gif image format, very widely used on bulletin boards and the World Wide Web (a new thing at the time). So a small group created the PNG losslessly compressed image format, with file type .png, to replace .gif. That format also uses the Deflate format for compression, which is applied after filters on the image data expose more of the redundancy. In order to promote widespread usage of the PNG format, two free code libraries were created. libpng and zlib. libpng handled all of the features of the PNG format, and zlib provided the compression and decompression code for use by libpng, as well as for other applications. zlib was adapted from the gzip code.

上述专利现已全部失效。

The zlib library supports Deflate compression and decompression, and three kinds of wrapping around the deflate streams. Those are no wrapping at all ("raw" deflate), zlib wrapping, which is used in the PNG format data blocks, and gzip wrapping, to provide gzip routines for the programmer. The main difference between zlib and gzip wrapping is that the zlib wrapping is more compact, six bytes vs. a minimum of 18 bytes for gzip, and the integrity check, Adler-32, runs faster than the CRC-32 that gzip uses. Raw deflate is used by programs that read and write the .zip format, which is another format that wraps around deflate compressed data.

Zlib目前广泛应用于数据传输和存储。例如,服务器和浏览器的大多数HTTP事务使用zlib压缩和解压缩数据,特别是HTTP报头Content-Encoding: deflate表示封装在zlib数据格式中的deflate压缩方法。

Different implementations of deflate can result in different compressed output for the same input data, as evidenced by the existence of selectable compression levels that allow trading off compression effectiveness for CPU time. zlib and PKZIP are not the only implementations of deflate compression and decompression. Both the 7-Zip archiving utility and Google's zopfli library have the ability to use much more CPU time than zlib in order to squeeze out the last few bits possible when using the deflate format, reducing compressed sizes by a few percent as compared to zlib's highest compression level. The pigz utility, a parallel implementation of gzip, includes the option to use zlib (compression levels 1-9) or zopfli (compression level 11), and somewhat mitigates the time impact of using zopfli by splitting the compression of large files over multiple processors and cores.