[NCLUG] crazy git question

Bob Proulx bob at proulx.com
Wed Dec 5 19:45:44 MST 2018


Gabriel L. Somlo wrote:
> Is there some automated way to locate the "closest" upstream commit to
> the state of my (probably hacked) tarball ?
> 
> Right now, I went back in the commit log to around the day, month, and
> year associated with the files in the tarball, and I'm checking for
> the presence of individual commits' changes in my tarball, trying to
> find the first upstream commit that's *not* reflected in the tarball :)
> 
> That's slow and awkward, and I'm hoping there's a better way...
> 
> Tried articulating this in a google search, but that didn't get me very
> far :)
> 
> Any ideas much appreciated!

I will brainstorm along with you.  This isn't something I have ever
needed to do before so not something I have a known working recipe for
doing.  But just, this is the way I think and would attack the
problem.  Some of these will probably pretty silly after other people
suggest even better ideas and point out why this won't work! :-)

Several different thoughts occur to me...

An idea is that you could generate the hashes just from the command
line.  It is just the string "blob" with the length followed by a null
appended followed by the file contents.  But that can be a little
tricky to get right.  But there is "git hash-object" that can compute
it for you.

  git hash-object file1
  31aba918e21c03b4754d4b69fb891c55c5fd48f6

Then look for that git hash, the first few characters of it, in the
upstream git logs.  I would use git-whatchanged for that but
unfortunately that does not work in the presence of branches.  So some
scrambling around there will be needed.

If it does not exist there then it is a file you have modified
locally.  If it changed since your copy then you have a later limit to
what your version diverged from it.  The earliest appearance of the
hash marks the earliest your copy could have diverged from it.

Another idea would be to clone the upstream.  Get a list of all of the
revlist hashes.  Check out each of them in turn and diff them against
your copy.  Since there won't be a zero diff match use diffstat to get
a quantitive account of the differences.  Figure out what version has
the smallest number of differences.

Something like:

  for r in $(git rev-list master); do
    git checkout $r
    diff -r . ../localcopy | diffstat
  done |
  grep changed.*insertions.*deletions |
  tee fileofstuff

And then sort through there looking for the smallest nubmer of
differences.

Those are my initial ideas!

Bob


More information about the NCLUG mailing list