[kwlug-disc] How GIT stores stuff.
Bob B
bob at softscape.ca
Tue May 3 10:36:23 EDT 2022
Folks,
I know I said I'd report on this next month, but I figured this was as good a place as any to discuss what I found.
A quick google found this at Stack Overflow: https://stackoverflow.com/questions/8198105/how-does-git-store-files
It has some references to deeper information that look enticing, but in summary I think it confirms what I said in that GIT stores complete files, not deltas. At least not deltas as 'diffs' of text files.
>From a quick interpretation of what that post says, GIT stores snapshots of the tree for a commit, but where a file has not changed, it will only store a reference (kinda like a hard link to a file). So in a sense, the snapshots _are_ deltas containing complete changed files, but only references to unchanged files so as to not keep duplicate data. (file level dedupe in a sense)
After the call last night, I remembered some context for why this was the case. All of the older version control systems that stored diff's of changes to files would get slow as the changes grew. For example, RCS would store reverse deltas of files. So if you wanted the latest revision it was immediately available because it was complete in the rcs file. But, if you wanted to go back in revisions, rcs would start with the latest file and apply patches to it from its history to modify the file back to the revision you called for. Which is great (and an improvement over its predecessor SCCS) if you wanted recent revisions but could get slow if you wanted to go further back in history.
This was GIT's solution to the speed issue... store the files in their complete forms so you didn't have to apply patches to get to a specific revision.
Bob.
RCS wiki page: https://en.wikipedia.org/wiki/Revision_Control_System
More information about the kwlug-disc
mailing list