from Hacker News

Rename files to match hash of contents

by SimplGy on 11/27/16, 12:39 AM with 22 comments

  • by sgentle on 11/27/16, 2:56 AM

    Fun! This is similar to how git stores files internally. You can do some neat tricks like this:

      $ ls
      01.jpg      03.jpg      03_copy.jpg 04.jpg      05.jpg
    
      $ git init
      Initialized empty Git repository in /tmp/test/.git/
    
      $ git hash-object -w *
      82f7d50fc89d2fd47150aff539ea4acf45ec1589
      0080672bc4f248c400d569cce1a2a3d743eb1331
      0080672bc4f248c400d569cce1a2a3d743eb1331
      58db57b10c219b9b71f0223e58a6dc0d51cfe207
      05dcde743807bddaf55ad1231572c1365d4db4af
    
      $ find .git/objects -type f
      .git/objects/00/80672bc4f248c400d569cce1a2a3d743eb1331
      .git/objects/05/dcde743807bddaf55ad1231572c1365d4db4af
      .git/objects/58/db57b10c219b9b71f0223e58a6dc0d51cfe207
      .git/objects/82/f7d50fc89d2fd47150aff539ea4acf45ec1589
    
    If you're curious, you can read more about how it works here: https://git-scm.com/book/en/v1/Git-Internals-Git-Objects
  • by stirner on 11/27/16, 5:32 AM

  • by sliken on 11/27/16, 3:46 AM

    Be warned that this (by default) only looks at part of the file. Seems like a poor default.
  • by askvictor on 11/27/16, 4:27 AM

    Don't modern filesystems allow you to store metadata like this separately to the filename or file data?
  • by m0atz on 11/27/16, 7:21 AM

    Nirsofts 'hashmyfiles' has this functionality built in already, known as duplicate search mode. Works extremely well. http://www.nirsoft.net/utils/hash_my_files.html
  • by zokier on 11/27/16, 11:06 AM

    Content-addressable storage is always neat. Does anyone know if using truncated md5 like this is somehow more robust than using some non-crypto hash like siphash, which already produces 64bit hashes.
  • by zbuf on 11/27/16, 9:34 AM

    duff (duplicate file finder) is another useful tool for this with flags to operate once duplicates are found:

    http://duff.dreda.org/

  • by tucaz on 11/27/16, 2:29 AM

    It would be nice to turn this into a program that stores the previous name so they can be renamed back after deduplicating.

    Very cool!

  • by tscs37 on 11/27/16, 7:00 AM

    I wrote something similar once, but only for gifs and it also fixes file extensions for a few mimetypes.
  • by dschiptsov on 11/27/16, 8:24 AM

    wouldn't symbolic links be more appropriate?