From: Zygo Blaxell Date: Sat, 9 Jan 2010 04:01:23 +0000 (-0500) Subject: Add a threshold to skip-hash X-Git-Tag: dm6-0.20100514~17 X-Git-Url: http://git.hungrycats.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=a433cb10cd4e633289a448649f9509e1b24d9c7e;hp=a433cb10cd4e633289a448649f9509e1b24d9c7e;p=dupemerge Add a threshold to skip-hash Hashing usually performs well except in special cases where many large files have the same size but very different contents near the beginning of the file. In these cases, it is usually faster to execute the O(N^2) comparisons between all the files, thereby avoiding reading most of their data. In cases where there are many small files, the opposite setting of the skip-hash option is usually better, so skip-hash is now a threshold parameter with a default of 1MB. Unfortunately it is not trivial to detect this condition without doing sufficient work to negate the benefit; hence, we require the operator to specify a preference. ---