tar文件格式_今天使用Tar文件格式有什么优势?

tar文件格式

tar文件格式

The tar archiving format is, in computing years, a veritable Methuselah yet it is still in heavy use today. What makes the tar format so useful long after its inception?

在计算年代,tar存档格式是名副其实的Methuselah,但今天仍在大量使用。 是什么让tar格式在其诞生很久之后就如此有用?

Today’s Question & Answer session comes to us courtesy of SuperUser—a subdivision of Stack Exchange, a community-driven grouping of Q&A web sites.

今天的“问答”环节由SuperUser提供,它是Stack Exchange的一个分支,该社区是由社区驱动的Q&A网站分组。

问题 (The Question)

SuperUser reader MarcusJ is curious about the tar format and why we’re still using it after all these years:

超级用户读者MarcusJ对tar格式以及为什么这些年来我们仍在使用它感到好奇:

I know that tar was made for tape archives back in the day, but today we have archive file formats that both aggregate files and perform compression within the same logical file format.

我知道tar是用于磁带存档的,但是今天我们有了存档文件格式,该格式既可以聚合文件,又可以在同一逻辑文件格式中执行压缩。

Questions:

问题:

  • Is there a performance penalty during the aggregation/compression/decompression stages for using tar encapsulated in gzip or bzip2, when compared to using a file format that does aggregation and compression in the same data structure? Assume the runtime of the compressor being compared is identical (e.g. gzip and Deflate are similar).

    与使用在同一数据结构中进行聚合和压缩的文件格式相比,使用在gzip或bzip2中封装的tar在聚合/压缩/解压缩阶段是否会降低性能? 假设被比较的压缩器的运行时间是相同的(例如gzip和Deflate相似)。
  • Are there features of the tar file format that other file formats, such as .7z and .zip do not have?

    tar文件格式是否具有其他文件格式(例如.7z和.zip)没有的功能?
  • Since tar is such an old file format, and newer file formats exist today, why is tar (whether encapsulated in gzip, bzip2 or even the new xz) still so widely used today on GNU/Linux, Android, BSD, and other such UNIX operating systems, for file transfers, program source and binary downloads, and sometimes even as a package manager format?

    由于tar是一种较旧的文件格式,并且今天存在较新的文件格式,所以为什么tar(无论封装在gzip,bzip2甚至是新的xz中)在当今仍在GNU / Linux,Android,BSD和其他类似UNIX上如此广泛地使用操作系统,用于文件传输,程序源和二进制下载,有时甚至是软件包管理器格式?

That’s a perfectly reasonable question; so much has changed in the computing world in the last thirty years but we’re still using the tar format. What’s the story?

这是一个完全合理的问题。 在过去的30年中,计算世界发生了很大变化,但我们仍在使用tar格式。 来龙去脉是什么?

答案 (The Answer)

SuperUser contributor Allquixotic offers some insight into the longevity and functionality of the tar format:

超级用户贡献者Allquixotic提供了有关tar格式的寿命和功能的一些见解:

Part 1: Performance

第1部分:效果

Here is a comparison of two separate workflows and what they do.

这是两个单独的工作流程及其作用的比较。

You have a file on disk blah.tar.gz which is, say, 1 GB of gzip-compressed data which, when uncompressed, occupies 2 GB (so a compression ratio of 50%).

您在磁盘blah.tar.gz上有一个文件,即1 GB的gzip压缩数据,如果未压缩,则占用2 GB(因此压缩率为50%)。

The way that you would create this, if you were to do archiving and compression separately, would be:

如果要分别进行归档和压缩,则创建此方法的方式将是:

tar cf blah.tar files ...
tar cf blah.tar files ...

This would result in blah.tar which is a mere aggregation of the files ... in uncompressed form.

这将导致blah.tar ,它仅仅是files ...聚集files ...以未压缩的形式存在。

Then you would do

那你会做

gzip blah.tar

This would read the contents of blah.tar from disk, compress them through the gzip compression algorithm, write the contents to blah.tar.gz, then unlink (delete) the file blah.tar.

这将从磁盘读取blah.tar的内容,通过gzip压缩算法对其进行压缩,然后将内容写入blah.tar.gz ,然后取消链接(删除)文件blah.tar

Now, let’s decompress!

现在,让我们解压缩!

Way 1

方式1

You have blah.tar.gz, one way or another.

您拥有blah.tar.gz ,一种或另一种方式。

You decide to run:

您决定运行:

gunzip blah.tar.gz
gunzip blah.tar.gz

This will

这将

  • READ the 1GB compressed data contents of blah.tar.gz.

    读取blah.tar.gz的1GB压缩数据内容。

  • PROCESS the compressed data through the gzip decompressor in memory.

    通过内存中的gzip解压缩器处理压缩数据。

  • As the memory buffer fills up with “a block” worth of data, WRITE the uncompressed data into the fileblah.tar on disk and repeat until all the compressed data is read.

    当内存缓冲区中充满了“一块”数据时,将未压缩的数据写入磁盘上的文件blah.tar ,然后重复进行直到读取所有压缩数据为止。

  • Unlink (delete) the file blah.tar.gz.

    取消链接(删除)文件blah.tar.gz

Now, you have blah.tar on disk, which is uncompressed but contains one or more files within it, with very low data structure overhead. The file size is probably a couple bytes larger than the sum of all the file data would be.

现在,您在磁盘上具有blah.tar ,该磁盘未压缩,但其中包含一个或多个文件,而数据结构的开销却非常低。 文件大小可能比所有文件数据的总和大几个字节

You run:

你跑:

tar xvf blah.tar

This will

这将

  • READ the 2GB of uncompressed data contents of blah.tar and the tar file format’s data structures, including information about file permissions, file names, directories, etc.

    读取blah.tar的2GB未压缩数据内容和tar文件格式的数据结构,包括有关文件许可权,文件名,目录等的信息。

  • WRITE to disk the 2GB of data plus the metadata. This involves: translating the data structure / metadata information into creating new files and directories on disk as appropriate, or rewriting existing files and directories with new data contents.

    写入磁盘以存储2GB数据和元数据。 这涉及:将数据结构/元数据信息转换为在磁盘上适当地创建新文件和目录,或者用新数据内容重写现有文件和目录。

The total data we READ from disk in this process was 1GB (for gunzip) + 2GB (for tar) = 3GB.

在此过程中,我们从磁盘读取的总数据为1GB(对于gunzip)+ 2GB(对于tar)= 3GB。

The total data we WROTE to disk in this process was 2GB (for gunzip) + 2GB (for tar) + a few bytes for metadata = about 4GB.

在此过程中,我们写入磁盘的总数据为2GB(用于gunzip)+ 2GB(用于tar)+元数据的几个字节=大约4GB。

Way 2

方式二

You have blah.tar.gz, one way or another.

您拥有blah.tar.gz ,一种或另一种方式。

You decide to run:

您决定运行:

tar xvzf blah.tar.gz
tar xvzf blah.tar.gz

This will

这将

  • READ the 1GB compressed data contents of blah.tar.gz, a block at a time, into memory.

    一次读取一个块blah.tar.gz的1GB压缩数据内容到内存中。

  • PROCESS the compressed data through the gzip decompressor in memory.

    通过内存中的gzip解压缩器处理压缩数据。

  • As the memory buffer fills up, it will pipe that data, in memory, through to the tar file format parser, which will read the information about metadata, etc. and the uncompressed file data.

    当内存缓冲区填满时,它将把内存中的数据通过管道传递到tar文件格式解析器,后者将读取有关元数据等信息以及未压缩的文件数据。

  • As the memory buffer fills up in the tar file parser, it will WRITE the uncompressed data to disk, by creating files and directories and filling them up with the uncompressed contents.

    当内存缓冲区填充到tar文件解析器中时,它将通过创建文件和目录并用未压缩的内容填充将未压缩的数据写入磁盘。

The total data we READ from disk in this process was 1GB of compressed data, period.

在此过程中,我们从磁盘读取的总数据为1GB压缩数据。

The total data we WROTE to disk in this process was 2GB of uncompressed data + a few bytes for metadata = about 2GB.

在此过程中,我们写入磁盘的总数据为2GB的未压缩数据+几个字节的元数据=约2GB。

If you notice, the amount of disk I/O in Way 2 is identical to the disk I/O performed by, say, the Zip or7-Zip programs, adjusting for any differences in compression ratio.

如果您注意到, 方法2中的磁盘I / O数量与通过Zip7-Zip程序执行的磁盘I / O 相同 ,可以调整压缩率的任何差异。

And if compression ratio is your concern, use the Xz compressor to encapsulate tar, and you have LZMA2’ed TAR archive, which is just as efficient as the most advanced algorithm available to 7-Zip :-)

而且,如果您要考虑压缩率,请使用Xz压缩器封装tar ,并且您拥有LZMA2的TAR归档文件,该归档文件的效率与7-Zip可用的最先进的算法一样高效:-)

Part 2: Features

第2部分:功能

tar stores UNIX permissions within its file metadata, and is very well known and tested for successfully packing up a directory with all kinds of different permissions, symbolic links, etc. There’s more than a few instances where one might need to glob a bunch of files into a single file or stream, but not necessarily compress it (although compression is useful and often used).

tar将UNIX权限存储在其文件元数据中,并且众所周知,它已经成功地通过各种不同的权限,符号链接等成功打包了目录。在许多情况下,可能需要遍历一堆文件压缩成单个文件或流,但不一定要压缩它(尽管压缩是有用且经常使用的)。

Part 3: Compatibility

第3部分:兼容性

Many tools are distributed in source or binary form as .tar.gz or .tar.bz2 because it is a “lowest common denominator” file format: much like most Windows users have access to .zip or .rar decompressors, most Linux installations, even the most basic, will have access to at least tar and gunzip, no matter how old or pared down. Even Android firmwares have access to these tools.

许多工具以.tar.gz或.tar.bz2的源代码或二进制格式分发,因为它是“最低公分母”文件格式:与大多数Windows用户可以访问.zip或.rar解压缩器,大多数Linux安装,即使最基本,无论年龄多大或缩减,都至少可以访问tar和gunzip。 甚至Android固件也可以使用这些工具。

New projects targeting audiences running modern distributions may very well distribute in a more modern format, such as .tar.xz (using the Xz (LZMA) compression format, which compresses better than gzip or bzip2), or .7z, which is similar to the Zip or Rar file formats in that it both compresses and specifies a layout for encapsulating multiple files into a single file.

针对运行现代发行版的受众的新项目可能会以更现代的格式发行,例如.tar.xz(使用Xz(LZMA)压缩格式,比gzip或bzip2更好地压缩)或.7z,类似于Zip或Rar文件格式,它既压缩又指定了用于将多个文件封装为一个文件的布局。

You don’t see .7z used more often for the same reason that music isn’t sold from online download stores in brand new formats like Opus, or video in WebM. Compatibility with people running ancient or very basic systems.

您不会看到.7z的使用频率更高,其原因与在线下载商店未以全新的格式(例如Opus)或WebM中的视频出售音乐的原因相同。 与运行古老或非常基本的系统的人的兼容性。



Have something to add to the explanation? Sound off in the the comments. Want to read more answers from other tech-savvy Stack Exchange users? Check out the full discussion thread here.

有什么补充说明吗? 在评论中听起来不对。 是否想从其他精通Stack Exchange的用户那里获得更多答案? 在此处查看完整的讨论线程

翻译自: https://www.howtogeek.com/142023/what-is-the-advantage-of-using-the-tar-file-format-today/

tar文件格式

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值