[kwlug-disc] So why not tar -cf tarball.tar a.xz b.xz c.xz, instead of tar -cJf tarball.tar.xz a b c ?
B.S.
bs27975.2 at gmail.com
Fri Oct 28 19:00:20 EDT 2016
On 10/28/2016 02:16 AM, Chris Frey wrote:
> On Fri, Oct 28, 2016 at 01:54:25AM -0400, Chris Frey wrote:
>> In my tests, a 1M split size seemed too big for tar to recover from
>> by itself, but fixtar was able to do it:
>>
>> https://github.com/BestSolution-at/fixtar
>
> Putting this together with gzrecover (gzrt) may be enough to not
> have to worry about this question at all. If it is possible to
> recover from both corrupt gzip blocks and corrupt tar blocks to
> get at the data on the other side of the file, then going to the
> trouble of compressing first may not gain much.
You lost me there, at "then going to the trouble of compressing first
may not gain much." - seems to be saying not worth going through the
trouble of compressing.
Which seems to be inconsistent with where you've been coming from / what
you've been saying - so I doubt that's what you mean. (Not that you've
beating anything in particular particularly.)
It does seem arguable to not compress at all, though, given compressing
/ deduping filesystems.
Let's bear in mind, that none of this conversation has been about mere
file transfer. i.e. I can see value in compress for transmission over
slow links. Uncompress on other side, if errors, retransmit. As links
get faster, though, e.g. within premises, I expect there must come some
point where time to compress + time to decompress > time to transmit
uncompressed. Let alone with today's fast processors, or compressed net
traffic in the first place.
But, in this problem case of long term stored (compressed / deduped
filesystem), bit rotted along the way, I've not yet encountered anything
contesting the intuitive idea of not compressing.
And let's also not lose sight that what's compelling about compress is
more about integrity failure detection than file size. And that often
de/compress inherently keeps but one file - i.e. no sidecar files to
additionally keep track of. For those purposes of rot detection, things
like md5sum's serve just as well. So, --to-command='md5sum -' at time of
creation, periodically run and outputs diff'ed, seem to take us to an
equivalent place. (Regardless of compressing on the fly to file.tar.gz,
or gzip file.tar.)
What isn't compelling is 'gzip file.tar' going bad, with zip metadata
throughout, rendering the entire tar broken. vs. Individual gzip tar'red
- broken gzip files being easier to skip over in tar, by having tar just
skip to the next file header.
And gzip rot off the table entirely (if on compressing / deduping
filesystem), while maintaining file rot detection via md5sum.
P.S. Did just note from tar manual, that tar does keep an internal
checksum of its entirety. And that tar --list will report a checksum
failure - so at least you'll know if you have a broken tar. You won't
know where, though - which is where md5, or sha1 comes into play. (Or
compressing.)
More information about the kwlug-disc
mailing list