[kwlug-disc] So why not tar -cf tarball.tar a.xz b.xz c.xz, instead of tar -cJf tarball.tar.xz a b c ?
B.S.
bs27975.2 at gmail.com
Wed Oct 26 15:49:45 EDT 2016
On 10/26/2016 11:24 AM, bob+kwlug at softscape.ca wrote:
>> Yep. Came to a similar conclusion myself, using --to-command.
>>
> Cool. I hadn't noticed this option before. Will have to grok that for
> a bit and see how it could be useful.
--to-command='md5sum -' ... run and saved at time of creation, run any
other time, diff the two ... fini.
> Yeah, this always bothered me about tar | gzip but not enough to do
> anything about it. I guess I've never been burned by it yet to have
> spent cycles thinking about it.
And should / when burned ... TOO LATE! <sigh>
>> D'OH! My eyes have passed over that I don't know how many times.
>> I've been saying there oughta be a --in-command or --from-command
>> to match the --to-command. Double D'OH!
>
> Did a quick experiment today as follows:
>
> tar -cvf /tmp/1.tar --use-compress-program /tmp/PROG.sh /some/dir
>
> where PROG.sh was just 'tee /tmp/2.out'
>
> What was cool was that 1.tar and 2.out were IDENTICAL!
Check me on this, please:
Eventually I got around to reading the blurb on this (man tar) - note,
pay particular attention to 'bugs' - I skipped over it too often. In
particular, use https://www.gnu.org/software/tar/manual/ not man -
although it's not much better. (Lines in man were curiously cut off
strangely ... then I read the bugs bit.)
I have not yet had a chance to conduct my own similar experiments, but
when I read the man tar blurb, I came away with the impression that this
isn't going to do what we're talking about.
My expectation (your experience bears out?) is that this gets called
once per invocation - to un/zip it all up. NOT on a per file within the
archive basis.
Moreover, IIReadC, the facility is too limited (receive and output
stdout), to do what I wrote, such as add sidecar files automatically.
(Although one could tar the input and sidecar files into one stdout
stream.) And, in the process, probably lose the metadata. Only testing
will show - my guess is that tar must re-wrap the filename around
things, which would rename this 'sub-'tar above to the original
filename; and how would the untarrer know to treat specially when
processing. Unclear is whether tar also re-wraps the metadata around it.
e.g. If links are just pointer (strings) within the tar stream to
elsewhere, ...
>
> I don't think it would take much to add some computation of the
> stream and data into the stream inside of PROG.sh to checksum the
> individual files. Just need to know enough about the data structures
> of a tar file to do this as it flies by.
Probably not (need to know internal structure) - although details are at
that link, search TAR_FILENAME within it and you'll find the env. vars.
that are probably of only use.
> Hear, hear! Tar, cpio and rsync. Essentials.
Some day I'll dig in to cpio, took me long enough to dig into tar, and I
sense much more than I can grok at the moment - see backups, and dump.
(Which may incorporate the --from-command functionality?)
Perhaps instead of, or in addition to, cpio, afio. IIRC, mondoarchive
uses it over cpio, and the reasoning for it I came across somewhere in
the forum at some point. (Bruno, I trust. Way too much goodness and
thinking obviously evident within the mondo scripts to not have 100%
confidence in him.)
However, when I glanced over man afio and cpio couple days back, I
didn't quickly see the --remove-files functionality of tar, which I
think is a deal breaker.
> Although, I think I'd prefer to have the check summing in-line rather
> than in a sidecar file. I'll need to think about that for a bit.
So would I, but absolutely not possible / not going to happen.
As commented earlier in thread, tar way TOO far widespread for such
radical changes. Any change proposal would have to suit everything from
DOS to Win to HP to Sun to Unix to ...
Good luck with that.
They can't even keep stdout / stderr straight - and there's no way I
could possibly be the first to bring that up, yet it is the way it is.
e.g. tar --checkpoint-action=dots --verbose -cf mytar.tar files* | tee
mytar.filelist ... sends what are supposed to be stderr dots into
mytar.filelist. (Append another --verbose in there for an ls -l form of
filelist. FWIW.)
> Although, I wonder if any of these:
>
> -H, --format FORMAT create archive of the given formatFORMAT is one
> of the following:
...
> --format=v7 old V7 tar format
I looked at that, thought it was =sysv7 noted having checksum's, then
when I tried something like tar -H=sysv7 -acf mytar.tar.xz files
it fell over saying couldn't compress such format, I think.
Perhaps I need to revisit, if compressing is off the table (with btrfs
checksumming, compressing, and dedup'ing.)
Quick look through link shows no 'sysv7' or 'compress' (related) search
result, so sysv7 wasn't it. 'compress' is very present as a search term,
and the problematic nature discussed.
Note: Just saw 'Secondly, multi-volume archives cannot be compressed.'
Let alone: 'Compressed archives are easily corrupted, because compressed
files have little redundancy. The adaptive nature of the compression
scheme means that the compression tables are implicitly spread all over
the archive. If you lose a few blocks, the dynamic construction of the
compression tables becomes unsynchronized, and there is little chance
that you could recover later in the archive.'
Read: Lose a block, lose the (tar) file - compression tables being
throughout means horrible death upon the bad block / likely no getting
past it. Now go back to your 'recover a damaged gzip' article.
> The '-I PROG.sh' approach might be a valuable plug-in unto itself
> such that it could make a tar archive with compression potentially
> mostly salvageable and bakes in integrity checks.
Only if called per file, not once per archive. (I expect.)
More information about the kwlug-disc
mailing list