dupemerge
14 years agodupemerge: clean up stats_blob output if --progress
Zygo Blaxell [Fri, 8 Jan 2010 15:07:32 +0000 (10:07 -0500)]
dupemerge: clean up stats_blob output if --progress

14 years agodupemerge: don't stat during the file collection loop
Zygo Blaxell [Fri, 8 Jan 2010 14:16:02 +0000 (09:16 -0500)]
dupemerge: don't stat during the file collection loop

Remove the lstat from the find output reading loop.  It's a redundant
copy of the same code in merge_files.

Adjust merge_files to filter out possible non-files that will now leak
through from the find output.

14 years agodupemerge: only report '.' for non-trivial merges
Zygo Blaxell [Fri, 8 Jan 2010 14:06:00 +0000 (09:06 -0500)]
dupemerge: only report '.' for non-trivial merges

14 years agodupemerge: document --progress option, and expand tabs in rest of usage
Zygo Blaxell [Fri, 8 Jan 2010 14:00:57 +0000 (09:00 -0500)]
dupemerge: document --progress option, and expand tabs in rest of usage

14 years agodupemerge: add --progress option
Zygo Blaxell [Fri, 8 Jan 2010 14:00:40 +0000 (09:00 -0500)]
dupemerge: add --progress option

14 years agodupemerge: merge with waya-zblaxell, fix warning message
Zygo Blaxell [Wed, 6 Jan 2010 18:58:11 +0000 (13:58 -0500)]
dupemerge: merge with waya-zblaxell, fix warning message

Merges up to c92b00c812898fcba6e5ebde14ef83ae597756ee, including
these commits:

6038ff7 dupemerge: have find tell us the device too
fd1d958 dupemerge: maybe improve seek performance by sorting perl hashes
9bf1c63 dupemerge: merge with state-of-the-art on Serenity
a27ea1f dupemerge: update copyright year to 2010
593804d dupemerge: sort incumbent inodes too
1df8571 dupemerge: make inode sort order strictly numeric
24fecbb dupemerge: make sure 'sort -znr' still considers dev/inode numeric
c92b00c dupemerge: inodes are now non-numeric

14 years agodupemerge: inodes are now non-numeric
Zygo Blaxell [Wed, 6 Jan 2010 18:22:38 +0000 (13:22 -0500)]
dupemerge: inodes are now non-numeric

14 years agodupemerge: make sure 'sort -znr' still considers dev/inode numeric
Zygo Blaxell [Wed, 6 Jan 2010 17:23:28 +0000 (12:23 -0500)]
dupemerge: make sure 'sort -znr' still considers dev/inode numeric

14 years agodupemerge: make inode sort order strictly numeric
Zygo Blaxell [Wed, 6 Jan 2010 16:59:49 +0000 (11:59 -0500)]
dupemerge: make inode sort order strictly numeric

14 years agodupemerge: sort incumbent inodes too
Zygo Blaxell [Wed, 6 Jan 2010 16:57:42 +0000 (11:57 -0500)]
dupemerge: sort incumbent inodes too

14 years agodupemerge: update copyright year to 2010
Zygo Blaxell [Wed, 6 Jan 2010 16:56:31 +0000 (11:56 -0500)]
dupemerge: update copyright year to 2010

14 years agoMerge branch 'performance'
Zygo Blaxell [Sat, 9 Jan 2010 01:51:45 +0000 (20:51 -0500)]
Merge branch 'performance'

Conflicts:
faster-dupemerge

14 years agoUpdate copyright year and email address
Zygo Blaxell [Sat, 9 Jan 2010 01:08:45 +0000 (20:08 -0500)]
Update copyright year and email address

It helps my spam filter if I can keep track of which web page the
spammers have scraped my email address from.

14 years agoWork around new fileutils output
Zygo Blaxell [Sat, 9 Jan 2010 01:08:45 +0000 (20:08 -0500)]
Work around new fileutils output

findutils now appends a redundant ".000000000" to the %T@ output.
I've apparently missed the window to get findutils to fix this, so
I've worked around it.

14 years agoUpdate copyright year
Zygo Blaxell [Sat, 9 Jan 2010 02:21:44 +0000 (21:21 -0500)]
Update copyright year

14 years agoProperly handle cases where multiple files have the same hash
root [Sun, 26 Nov 2006 22:05:51 +0000 (22:05 +0000)]
Properly handle cases where multiple files have the same hash
(e.g. because --skip-hash is used).  This version now generates all N^2
combinations of comparisons.

git-svn-id: svn+ssh://svn.furryterror.org/r/trunk/mokona/zblaxell@6218 a5e33b96-951a-0410-ae88-c0fe16d076bb

git-svn-id: file:///root/SVN@4 f049ffa3-53c0-42dd-8896-c8778eaba0c5

git-svn-id: file:///root/SVN@10 f049ffa3-53c0-42dd-8896-c8778eaba0c5

14 years agodupemerge: maybe improve seek performance by sorting perl hashes
Zygo Blaxell [Wed, 6 Jan 2010 16:10:04 +0000 (11:10 -0500)]
dupemerge: maybe improve seek performance by sorting perl hashes

Thank Johannes Niess <Linux@johannes-niess.de> for this idea.

To improve seek performance, choose inodes for linking in a fixed order.
This will mean that two directories with multiple identical files will
end up with links to the copies with lower inode numbers.  This is an
improvement over the previous result, which was that both directories
would end up with randomly chosen files from both directories.

The sort order isn't strictly numeric; however, it's hopefully close
enough.

As a crude heuristic, we assume that inode numbers approximate file
position on disk, and file names approximate typical usage patterns.
Previously we used the perl hash semantics, which are mostly random
and might change depending on the numbers of files considered.

14 years agodupemerge: have find tell us the device too
Zygo Blaxell [Wed, 6 Jan 2010 16:07:24 +0000 (11:07 -0500)]
dupemerge: have find tell us the device too

faster-dupemerge cannot be used to link files on multiple filesystems
because the hardlinks will fail; however, if this is attempted anyway
then files with identical weak keys (size+timestamp+permissions) and
identical inode numbers might be considered as identical for hashing
and comparing purposes when they are not.  That would be bad.

14 years agoRemove ad-hoc copyright notice, add formal copyright statement and GPL
Zygo Blaxell [Sat, 9 Jan 2010 01:59:05 +0000 (20:59 -0500)]
Remove ad-hoc copyright notice, add formal copyright statement and GPL

git-svn-id: svn+ssh://svn.furryterror.org/r/trunk/mokona/zblaxell@3269 a5e33b96-951a-0410-ae88-c0fe16d076bb

14 years agotick_quote: properly quote the string '\''
Zygo Blaxell [Sat, 9 Jan 2010 01:58:48 +0000 (20:58 -0500)]
tick_quote: properly quote the string '\''

14 years agoImplement --dry-run and --humane options
cvs [Sat, 7 Jan 2006 08:44:02 +0000 (08:44 +0000)]
Implement --dry-run and --humane options

git-svn-id: svn+ssh://svn.furryterror.org/r/trunk/mokona/zblaxell@4518 a5e33b96-951a-0410-ae88-c0fe16d076bb

14 years agodigest: Fix incorrect statistics when hashes fail
cvs [Mon, 5 May 2003 04:20:14 +0000 (04:20 +0000)]
digest: Fix incorrect statistics when hashes fail

An order-of-operations bug can lead to files being counted as hashed
when they are not (e.g. due to I/O error or the file disappearing).
Calculate the digest, then increment the statistics.

git-svn-id: svn+ssh://svn.furryterror.org/r/trunk/mokona/zblaxell@3332 a5e33b96-951a-0410-ae88-c0fe16d076bb

14 years agoReplace --trust with --skip-compare and add --skip-hash and copyright statement
Zygo Blaxell [Sat, 9 Jan 2010 01:04:54 +0000 (20:04 -0500)]
Replace --trust with --skip-compare and add --skip-hash and copyright statement

git-svn-id: svn+ssh://svn.furryterror.org/r/trunk/mokona/zblaxell@3225 a5e33b96-951a-0410-ae88-c0fe16d076bb

Conflicts:

faster-dupemerge

14 years agoInitial commit
root [Tue, 23 Dec 2008 19:52:04 +0000 (14:52 -0500)]
Initial commit