Re: xxHash: xxh3? 64 or 128 bit?
Posted: Wed Aug 12, 2020 8:21 am
Hm, what i dont really understand is how the "speed of the algorithm" could be of interest here when it is actually the speed of the file reading which is the bottleneck (at least up to a approx. 2-3 Gbit/s). I guess the problem is more that most developers dont care about the art of reading large files
Of course a faster algorithm will use less CPU with that "slow" file reading but as long as the file reading was not optimized, the final tools is not guaranteed to be faster than a tool with a slower algorithm but optimized file reading.
Here is an MD5 tool i wrote quickly in python that should be a little faster than the xxh stuff at least at read speeds below ~3 Gbit/s:
https://github.com/emcodem/fast_md5/blo ... st_md5.exe
The difference compared to most other known to me checksum tools is that the file reading and the checksum is done in separate threads, also the file is being read in 8MB chunks instead of 65kB as the xxhsum tool does. So in difference to the xxhsum tool, this md5 tool should run exactly at the same speed than a file copy using windows explorer can be done.
Note that when you benchmark and you check the same file twice in a row, the second time you do it, there will be some caches helping the reading, so you can typically not compare the very first try with the second one (with the same input file from same location). Also the xxh tool will recognize the file was already checked and does not read the file a second time at all.
Of course a faster algorithm will use less CPU with that "slow" file reading but as long as the file reading was not optimized, the final tools is not guaranteed to be faster than a tool with a slower algorithm but optimized file reading.
Here is an MD5 tool i wrote quickly in python that should be a little faster than the xxh stuff at least at read speeds below ~3 Gbit/s:
https://github.com/emcodem/fast_md5/blo ... st_md5.exe
The difference compared to most other known to me checksum tools is that the file reading and the checksum is done in separate threads, also the file is being read in 8MB chunks instead of 65kB as the xxhsum tool does. So in difference to the xxhsum tool, this md5 tool should run exactly at the same speed than a file copy using windows explorer can be done.
Note that when you benchmark and you check the same file twice in a row, the second time you do it, there will be some caches helping the reading, so you can typically not compare the very first try with the second one (with the same input file from same location). Also the xxh tool will recognize the file was already checked and does not read the file a second time at all.