hashl - Create database with partial file hashes, check if other files are in it
hashl [-fnx] [-d dbfile] [-s read-size] action [args]
This manual documents hashl version 1.01
Actions:
Copy all files in the current directory which are not in any database to newdir.
List all files which are already in any database. Scans either the current directory or directory.
List all files which are not in any database. Scans either the current directory or directory.
Add all files in directory (or the current directory) as "ignored" to the database. This means that hashl will save the file's hash and skip matching files for copy or find-new.
Show information on file (or the database, if file is not specified).
List all files and their hashes. The list format is hash size file
.
If regex (a perl regular expression) is specifed, only matching files will be listed.
List all filenames, one file per line.
List ignored hashes.
List all files using an ls-style output format.
If regex (a perl regular expression) is specifed, only matching files will be listed.
Update or create hash database. Iterates over all files below the current directory.
Use dbfile instead of .hashl.db
Use dbfile in addition to .hashl.db / -d. May be specified several times.
Database files specified with this option will be opened read-only and ignored by writing actions (such as update or ignore).
For use with hashl add
: If there are ignored files in the directory,
unignore and add them.
Do not show progress information. Most useful with hashl find-new
.
Change size of the part of each file which is hashed. By default, hashl
hashes the first 4 MiB. Note that this option only makes sense when using hashl update
to create a new database.
A size of 0 (zero) makes hashl read whole files, i.e. turning it into sha1sum with a database.
Print version information.
Do not cross filesystem boundaries when processing files. At the time of this writing, this may not prevent hashl from recursing into other filesystems, but they will never be hashed, copied or otherwise processed.
Unless an error occured, hashl will always return zero.
None, so far
Unknown.
First, create a database of your local files:
cd /media/videos; hashl update
Now, assume you have a (possibly slow) external share mounted at /tmp/mnt/ext. You do not want to copy all files to your disk and then use fdupes or similar to weed out the duplicates. Since you just used hashl to create a database with the hashes of the first 4MB of all your files, you can now use it to check if you (very probably) already have any remote file. For that, you only need to leech the first 4MB of every file on the share, and not the whole file. For example:
cd /tmp/mnt/ext; hashl copy /media/videos/incoming
Personally, I have all my videos on an external hard disk, which I usually do not carry with me. So, when I get new videos, I put them into ~/lib/videos on my netbook, and then later copy them to the external disk. Of course, it can always happen that I get a movie I already have, or forget to move something from ~/lib/videos to the external disk, especially since I also always have some stuff from the disk in ~/lib/videos.
However, I can use hashl to conveniently solve this issue. Run periodically:
cd /media/argon; hashl -d ~/lib/video/.argon update
Now, I always have a list of files on the external disk with me. When I get a new file:
hashl -d ~/lib/video/.argon new-file $file
And to find out which files are not on the external disk:
cd ~/lib/video; print -l **/*(.) | hashl -d .argon new-file
Copyright (C) 2010-2017 by Daniel Friesel <derf@finalrewind.org>
0. You just DO WHAT THE FUCK YOU WANT TO.