NAME

hashl - Create database with partial file hashes, check if other files are in it

SYNOPSIS

hashl [-fnx] [-d dbfile] [-s read-size] action [args]

VERSION

This manual documents hashl version 1.00

DESCRIPTION

Actions:

copy newdir

Copy all files in the current directory which are not in any database to newdir.

find-known [directory]

List all files which are already in any database. Scans either the current directory or directory.

find-new [directory]

List all files which are not in any database. Scans either the current directory or directory.

ignore [directory]

Add all files in directory (or the current directory) as "ignored" to the database. This means that hashl will save the file's hash and skip matching files for copy or find-new.

info [file]

Show information on file (or the database, if file is not specified).

list [regex]

List all files and their hashes. The list format is hash size file.

If regex (a perl regular expression) is specifed, only matching files will be listed.

list-files

List all filenames, one file per line.

list-ignored

List ignored hashes.

update

Update or create hash database. Iterates over all files below the current directory.

OPTIONS

-d|--database dbfile

Use dbfile instead of .hashl.db

-e|--extra-db dbfile

Use dbfile in addition to .hashl.db / -d. May be specified several times.

Database files specified with this option will be opened read-only and ignored by writing actions (such as update or ignore).

-f|--force

For use with hashl add: If there are ignored files in the directory, unignore and add them.

-n|--no-progress

Do not show progress information. Most useful with hashl find-new.

-s|--read-size kibibytes

Change size of the part of each file which is hashed. By default, hashl hashes the first 4 MiB. Note that this option only makes sense when using hashl update to create a new database.

A size of 0 (zero) makes hashl read whole files, i.e. turning it into sha1sum with a database.

-V|--version

Print version information.

-x|--one-file-system

Do not cross filesystem boundaries when processing files. At the time of this writing, this may not prevent hashl from recursing into other filesystems, but they will never be hashed, copied or otherwise processed.

EXIT STATUS

Unless an error occured, hashl will always return zero.

CONFIGURATION

None, so far

DEPENDENCIES

* Digest::SHA
* List::MoreUtils
* Time::Progress

BUGS AND LIMITATIONS

Unknown.

EXAMPLES

LEECHING

First, create a database of your local files:

cd /media/videos; hashl update

Now, assume you have a (possibly slow) external share mounted at /tmp/mnt/ext. You do not want to copy all files to your disk and then use fdupes or similar to weed out the duplicates. Since you just used hashl to create a database with the hashes of the first 4MB of all your files, you can now use it to check if you (very probably) already have any remote file. For that, you only need to leech the first 4MB of every file on the share, and not the whole file. For example:

cd /tmp/mnt/ext; hashl copy /media/videos/incoming

EXTERNAL HARD DISK

Personally, I have all my videos on an external hard disk, which I usually do not carry with me. So, when I get new videos, I put them into ~/lib/videos on my netbook, and then later copy them to the external disk. Of course, it can always happen that I get a movie I already have, or forget to move something from ~/lib/videos to the external disk, especially since I also always have some stuff from the disk in ~/lib/videos.

However, I can use hashl to conveniently solve this issue. Run periodically:

cd /media/argon; hashl -d ~/lib/video/.argon update

Now, I always have a list of files on the external disk with me. When I get a new file:

hashl -d ~/lib/video/.argon new-file $file

And to find out which files are not on the external disk:

cd ~/lib/video; print -l **/*(.) | hashl -d .argon new-file

AUTHOR

Copyright (C) 2010 by Daniel Friesel <derf@finalrewind.org>

LICENSE

  0. You just DO WHAT THE FUCK YOU WANT TO.