In article <4815f164$0$21072$(E-Mail Removed)>,
Noob <root@localhost> wrote:
>Suppose I transfer a large file, say 20-50 GB, using TCP, over a noisy
>wireless channel, and suppose the link layer does not compute any CRC.
>
>Then, I imagine that there is a very high probability that TCP's
>checksum will not detect every instance of data corruption, and the
>receiver's copy of the file will differ from the original file.
What is a "very high probability"?
It should depend on your application.
>Even when the link layer does compute a CRC, it has been shown (*)
>that corrupted packets do reach the receiver. Therefore, I imagine it
>is possible for silent data corruption to occur?
Silent data corruption is always possible, even if the CRC is twice
or even 100 times as long as the data itself. It is all matter of
what you consider a "very high probability".
>(*) http://citeseer.ist.psu.edu/stone00when.html
>
>Have there been other studies of silent data corruption despite CRCs
>and TCP's checksum?
I think that's the best published study.
>I suppose I need to use a (cryptographic?) hash function if I want to
>be certain, beyond any reasonable doubt, that the receiver's copy is
>the same as the original file?
You need to quantify "reasonable doubt" and decide what kind of errors
you are worried about. Are the errors you care about isolate single
bit changes, drop-outs (a block N bits all changed to 0 or 1), bursts
of static (a block M bits changed randomly), or something else? How
many errors occur in a packet? Are the errors uniformly distributed?
Do you only want to detect errors and rely on TCP to recover by
retransmitting or are the errors frequent enough that the costs of
forward error correction are worthwhile?
>SHA-512 produces a 512-bit hash.
>One chance in 2^512 seems small enough :-)
That fundamental misunderstanding of cryptographic hash functions is
one of my pet peeves. Cryptographic hash functions are not necessarily
better at detecting changes than other hash functions, CRCs, FCSs, etc.
Cryptographic graphic hash functions are mostly designed to be very
hard to analyze so that adversaries cannot reverse them; considerations
of how many and what kinds of changes are they detect are secondary.
You can say things about error detection functions like "CRC-X detects
any single burst of errors of N or fewer bits in a block of Y bits,"
but you cannot say anything similar about cryptographic hash functions
(except for trivial cases of N and Y). You cannot even say, for example,
that "the detection failure rate of SHA-512 is one in 2^512 changes"
(of course with suitable definitions for "changes" including type, size,
and distribution).
It is almost (but not quite) true that if you could say that
"Crypto-Hash CH() detects all N bit errors" then CH would be "broken"
on the grounds that you know it doesn't detect all N+1 bit errors,
and so some of those undetected N+1 bit changes could be used for evil.
Never mind that most people who use "broken" in that context are wrong,
as they are blather authoritative sounding nonsense about MD5 being
"broken." MD5 and some other cryptographic hashes are "broken" only
for some uses and not others. The big problem there are only vague
hopes that SHA-512 or any other hash function you might name are not
just as "breakable." That "hard to analyze" requirement on every
crypto-hash function is at least so far and perhaps forever a fundamental
weakness.
Vernon Schryver
(E-Mail Removed)