Zip vs Tar
TIL
763
White

Triet Pham viết ngày 24/02/2018

Take note from this issue
https://github.com/golang/go/issues/24057#issuecomment-367964910

Having dealt extensively with both the TAR and ZIP formats, I have concluded that both are terrible formats and consistent support for them is awful. However, in terms of better world-wide support, I vote for TAR.

Here is my assessment of the advantage and disadvantage of each:

  • ZIP has its heritage in Windows and better supports Windowisms.
  • TAR has its heritage in Unix and better supports Unixisms.
  • ZIP was designed to be written in a random-access manner, but can be written in a streaming manner.
  • ZIP must be read in a random-access manner, but some readers incorrectly assume you can read in a streaming manner.
  • TAR is written in a streaming manner.
  • TAR is read in a streaming manner.
  • ZIP allows random-access reading between files.
  • ZIP does not allow random-access reading within a file (if compression is used).
  • TAR does not allow random-access reading between files
  • TAR does not allow random-access reading within a file.
  • ZIP has one primary format which is well-specified, but attempts to be extension friendly with its "extra" fields, which has ironically led to a huge number of variants (too many to mention). Many variants conflict with each other, but nothing prevents you from placing multiple conflicting "extra" fields together. The specifications for these extensions are not always easy to find.
  • TAR has 3 main competing formats (USTAR, PAX, and GNU). USTAR is entirely a subset of PAX; so really two competing formats. The two most common tools GNU tar and BSD tar have strong support for both formats. The PAX format is standardized, and the GNU format is well-documented.
  • ZIP has issues with character encoding, making exact representation of filenames difficult (especially when it comes to foreign languages). Support for the UTF-8 flag is fairly poor.
  • TAR has better support for character encodings. The USTAR format is always ASCII, PAX format is always UTF-8, but unfortunately GNU format is specified as "local variant of 8-bit ASCII".
  • ZIP supports symlinks via certain "extra" header extensions, but I highly discourage them as being widely-compatible in any way.
  • TAR supports for symlinks.
  • TAR and ZIP can both support file sizes up to 18.4EiB.
  • ZIP has max path names of 64KiB.
  • TAR supports unlimited path names (via GNU or PAX formats).
  • ZIP has DEFLATE compression built-in, but wide support for other compression algorithms is poor.
  • TAR has no compression. However, it is very common to compress an archive as the GZIP, BZIP2, XZ, or (upcoming) ZSTD formats. GZIP and ZSTD are well-specified. BZIP2 and XZ are "specified" according to the reference implementation.
  • ZIP compresses on a per-file basis, while usually the entire TAR archive is compressed. Thus, TAR tends to have smaller archives. Since these Go source-code archives usually contains many small files, compressed TAR can gain a decent size reduction over compressed ZIP.
  • ZIP has poor support for Unix permissions (via the various competing Unix "extra" fields).
  • TAR has good support for Unix permissions.
  • ZIP has builtin CRC protection for the data.
  • TAR has no CRC protection for the data.
  • ZIP has poor support for accurate timestamps (the original format stored the local date at 2s resolution without storing the timezone). Various "extra" fields store the timestamps as seconds since Unix epoch.
  • TAR has good support for accurate timestamps.
  • ZIP has no support for sparse files.
  • TAR has some support for sparse files.
  • The main advantage of ZIP is the ability to random-access between files. For which, I'm not sure if that feature is a deal breaker. There are ways to stripe through a TAR archive once and build an index to provide random access between files and within a file.

triet 24-02-2018

Bình luận


White
{{ comment.user.name }}
Bỏ hay Hay
{{comment.like_count}}
Male avatar
{{ comment_error }}
Hủy
   

Hiển thị thử

Chỉnh sửa

White

Triet Pham

7 bài viết.
95 người follow
Kipalog
{{userFollowed ? 'Following' : 'Follow'}}
Cùng một tác giả
White
135 45
Bắt đầu chuyển sang dùng Vim làm editor chính một cách nghiêm túc sau nhiều lần thử, bỏ cuộc và quay trở về Sublime Text. Còn về nguyên nhân bỏ cuộ...
Triet Pham viết hơn 3 năm trước
135 45
White
36 7
(Ảnh) Data system Ngày nay do sự phát triển rất nhanh về phần cứng nên hầu hết các ứng dụng không còn phát triển theo hướng tối ưu hóa về tốc đ...
Triet Pham viết 3 năm trước
36 7
White
25 0
(Ảnh) Slow query là gì? Khi các câu query chậm hơn một thời gian nhất định tùy theo bạn định nghĩa, ví dụ chậm hơn 50ms, thì các câu query đó đư...
Triet Pham viết hơn 3 năm trước
25 0
Bài viết liên quan
White
0 4
fCC: Technical Documentation Page note So I have finished the HTML part of this exercise and I want to come here to lament about the lengthy HTML ...
HungHayHo viết hơn 2 năm trước
0 4
White
4 0
I used Spring boot, Hibernate few times back then at University, I'v started using it again recently. In this (Link), I want to check how Spring J...
Rey viết hơn 1 năm trước
4 0
White
23 1
Toán tử XOR có tính chất: + A XOR A = 0 + 0 XOR A = A Với tính chất này, có thể cài đặt bài toán sau với độ phức tạp O(N) về runtime, và với O(1)...
kiennt viết gần 4 năm trước
23 1
{{like_count}}

kipalog

{{ comment_count }}

bình luận

{{liked ? "Đã kipalog" : "Kipalog"}}


White
{{userFollowed ? 'Following' : 'Follow'}}
7 bài viết.
95 người follow

 Đầu mục bài viết

Vẫn còn nữa! x

Kipalog vẫn còn rất nhiều bài viết hay và chủ đề thú vị chờ bạn khám phá!