Networking Forums

Networking Forums > Computer Networking > Linux Networking > Silent data corruption despite TCP

Reply
Thread Tools Display Modes

Silent data corruption despite TCP

 
 
Noob
Guest
Posts: n/a

 
      04-28-2008, 03:46 PM
Hello everyone,

Suppose I transfer a large file, say 20-50 GB, using TCP, over a noisy
wireless channel, and suppose the link layer does not compute any CRC.

Then, I imagine that there is a very high probability that TCP's
checksum will not detect every instance of data corruption, and the
receiver's copy of the file will differ from the original file.

Even when the link layer does compute a CRC, it has been shown (*)
that corrupted packets do reach the receiver. Therefore, I imagine it
is possible for silent data corruption to occur?

(*) http://citeseer.ist.psu.edu/stone00when.html

Have there been other studies of silent data corruption despite CRCs
and TCP's checksum?

I suppose I need to use a (cryptographic?) hash function if I want to
be certain, beyond any reasonable doubt, that the receiver's copy is
the same as the original file?

SHA-512 produces a 512-bit hash.
One chance in 2^512 seems small enough :-)

Regards.
 
Reply With Quote
 
 
 
 
Dan McDonald
Guest
Posts: n/a

 
      04-28-2008, 04:07 PM
In article <4815f164$0$21072$(E-Mail Removed)>,
Noob <root@localhost> wrote:
>Hello everyone,
>
>Suppose I transfer a large file, say 20-50 GB, using TCP, over a noisy
>wireless channel, and suppose the link layer does not compute any CRC.


That's gotta be some link layer... :-P

>Even when the link layer does compute a CRC, it has been shown (*)
>that corrupted packets do reach the receiver. Therefore, I imagine it
>is possible for silent data corruption to occur?
>
>(*) http://citeseer.ist.psu.edu/stone00when.html


I've seen it happen due to pre-release NFS bugs internally. It's not
pretty.

>I suppose I need to use a (cryptographic?) hash function if I want to
>be certain, beyond any reasonable doubt, that the receiver's copy is
>the same as the original file?
>
>SHA-512 produces a 512-bit hash.
>One chance in 2^512 seems small enough :-)


We detected said NFS bug only because a few of our NFS clients were using
IPsec to protect the packets. IPsec's data-integrity/packet-authentication
(i.e. its use of HMAC-{MD5,SHA1,SHA2}) helps immensely here. Combine that
with a TCP that retransmits, and the use of IPsec can make up for your very
flaky link-layer.

You could also hash the file after transmission. This is a cat that you can
skin any number of ways.


--
Daniel L. McDonald - Solaris Security & Networking Engineering
Mail: (E-Mail Removed) | * MY OPINIONS ARE NOT NECESSARILY SUN'S! *
35 Network Drive Burlington, MA |"rising falling at force ten
http://blogs.sun.com/danmcd/ | we twist the world and ride the wind" - Rush
 
Reply With Quote
 
Vernon Schryver
Guest
Posts: n/a

 
      04-28-2008, 04:34 PM
In article <4815f164$0$21072$(E-Mail Removed)>,
Noob <root@localhost> wrote:

>Suppose I transfer a large file, say 20-50 GB, using TCP, over a noisy
>wireless channel, and suppose the link layer does not compute any CRC.
>
>Then, I imagine that there is a very high probability that TCP's
>checksum will not detect every instance of data corruption, and the
>receiver's copy of the file will differ from the original file.


What is a "very high probability"?
It should depend on your application.

>Even when the link layer does compute a CRC, it has been shown (*)
>that corrupted packets do reach the receiver. Therefore, I imagine it
>is possible for silent data corruption to occur?


Silent data corruption is always possible, even if the CRC is twice
or even 100 times as long as the data itself. It is all matter of
what you consider a "very high probability".


>(*) http://citeseer.ist.psu.edu/stone00when.html
>
>Have there been other studies of silent data corruption despite CRCs
>and TCP's checksum?


I think that's the best published study.


>I suppose I need to use a (cryptographic?) hash function if I want to
>be certain, beyond any reasonable doubt, that the receiver's copy is
>the same as the original file?


You need to quantify "reasonable doubt" and decide what kind of errors
you are worried about. Are the errors you care about isolate single
bit changes, drop-outs (a block N bits all changed to 0 or 1), bursts
of static (a block M bits changed randomly), or something else? How
many errors occur in a packet? Are the errors uniformly distributed?
Do you only want to detect errors and rely on TCP to recover by
retransmitting or are the errors frequent enough that the costs of
forward error correction are worthwhile?


>SHA-512 produces a 512-bit hash.
>One chance in 2^512 seems small enough :-)


That fundamental misunderstanding of cryptographic hash functions is
one of my pet peeves. Cryptographic hash functions are not necessarily
better at detecting changes than other hash functions, CRCs, FCSs, etc.
Cryptographic graphic hash functions are mostly designed to be very
hard to analyze so that adversaries cannot reverse them; considerations
of how many and what kinds of changes are they detect are secondary.
You can say things about error detection functions like "CRC-X detects
any single burst of errors of N or fewer bits in a block of Y bits,"
but you cannot say anything similar about cryptographic hash functions
(except for trivial cases of N and Y). You cannot even say, for example,
that "the detection failure rate of SHA-512 is one in 2^512 changes"
(of course with suitable definitions for "changes" including type, size,
and distribution).

It is almost (but not quite) true that if you could say that
"Crypto-Hash CH() detects all N bit errors" then CH would be "broken"
on the grounds that you know it doesn't detect all N+1 bit errors,
and so some of those undetected N+1 bit changes could be used for evil.

Never mind that most people who use "broken" in that context are wrong,
as they are blather authoritative sounding nonsense about MD5 being
"broken." MD5 and some other cryptographic hashes are "broken" only
for some uses and not others. The big problem there are only vague
hopes that SHA-512 or any other hash function you might name are not
just as "breakable." That "hard to analyze" requirement on every
crypto-hash function is at least so far and perhaps forever a fundamental
weakness.


Vernon Schryver (E-Mail Removed)
 
Reply With Quote
 
David Schwartz
Guest
Posts: n/a

 
      04-28-2008, 04:43 PM
On Apr 28, 8:46 am, Noob <root@localhost> wrote:

> Suppose I transfer a large file, say 20-50 GB, using TCP, over a noisy
> wireless channel, and suppose the link layer does not compute any CRC.


I think that's a completely unrealistic hypothetical. Typical TCP-over-
wireless implementations have a 32-bit CRC at the wireless layer and a
16-bit CRC at the TCP layer. No sane person would implement a "noisy
wireless channel" with a link layer that "does not compute any CRC".
If you did, file transfer over TCP would be only one of your many
problems.

DS
 
Reply With Quote
 
Rick Jones
Guest
Posts: n/a

 
      04-28-2008, 05:34 PM
In comp.protocols.tcp-ip Noob <root@localhost> wrote:
> I suppose I need to use a (cryptographic?) hash function if I want
> to be certain, beyond any reasonable doubt, that the receiver's copy
> is the same as the original file?


It depends entirely on your definition of a reasonable doubt. It
would/could certainly help considerably.

> SHA-512 produces a 512-bit hash.
> One chance in 2^512 seems small enough :-)


I'm not sure the math works _exactly_ that way but it would be better
than just relying on TCP's checksum alone. Might be belts, suspenders
and duct-tape, but some data calls for that.

IIRC the emerging SCTP uses a rather stronger 32 bit checksum of some
sort.

rick jones
--
The computing industry isn't as much a game of "Follow The Leader" as
it is one of "Ring Around the Rosy" or perhaps "Duck Duck Goose."
- Rick Jones
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
 
Reply With Quote
 
David Schwartz
Guest
Posts: n/a

 
      04-28-2008, 05:37 PM
On Apr 28, 9:34 am, v...@calcite.rhyolite.com (Vernon Schryver) wrote:

> That fundamental misunderstanding of cryptographic hash functions is
> one of my pet peeves. Cryptographic hash functions are not necessarily
> better at detecting changes than other hash functions, CRCs, FCSs, etc.
> Cryptographic graphic hash functions are mostly designed to be very
> hard to analyze so that adversaries cannot reverse them; considerations
> of how many and what kinds of changes are they detect are secondary.
> You can say things about error detection functions like "CRC-X detects
> any single burst of errors of N or fewer bits in a block of Y bits,"
> but you cannot say anything similar about cryptographic hash functions
> (except for trivial cases of N and Y). You cannot even say, for example,
> that "the detection failure rate of SHA-512 is one in 2^512 changes"
> (of course with suitable definitions for "changes" including type, size,
> and distribution).


If you have a block of data with a 512-bit cryptographic hash, the
probability that random changes to the data and/or the hash will leave
things such that the hash is still the correct hash of the data is
fairly close to 1 in 2^512 for practical purposes. This is one of the
design criteria for cryptographic hashes and is definitely true of
commonly-used hashes such as SHA-512.

This can be true of a cryptographic hash, and if it's not, then the
hash is at least somewhat broken. Commonly-used cryptographic hashes
are not broken.

Again, this is specifically one of the design criteria for
cryptographic hashes. The hashes are supposed to be randomly
distributed over the available hash space and any change in the input
is supposed to avalanche over the output.

DS
 
Reply With Quote
 
Unruh
Guest
Posts: n/a

 
      04-28-2008, 05:57 PM
Noob <root@localhost> writes:

>Hello everyone,


>Suppose I transfer a large file, say 20-50 GB, using TCP, over a noisy
>wireless channel, and suppose the link layer does not compute any CRC.


>Then, I imagine that there is a very high probability that TCP's
>checksum will not detect every instance of data corruption, and the
>receiver's copy of the file will differ from the original file.



>Even when the link layer does compute a CRC, it has been shown (*)
>that corrupted packets do reach the receiver. Therefore, I imagine it
>is possible for silent data corruption to occur?


>(*) http://citeseer.ist.psu.edu/stone00when.html


>Have there been other studies of silent data corruption despite CRCs
>and TCP's checksum?


>I suppose I need to use a (cryptographic?) hash function if I want to
>be certain, beyond any reasonable doubt, that the receiver's copy is
>the same as the original file?


>SHA-512 produces a 512-bit hash.
>One chance in 2^512 seems small enough :-)


I would say 1 in 2^128 is good enough.
You do not need a cryptographic checksum. Just one that is sufficiently
mixing and that depends equally on each bit of the text. Nature is not
malicious-- it is not trying to mess up. Ie, the chances that nature will
happen to hit on the noise structure to vastly increase the rate from
1/2^128 to a much smaller rate is even smaller than 1.2^128



>Regards.

 
Reply With Quote
 
David Schwartz
Guest
Posts: n/a

 
      04-28-2008, 06:26 PM
On Apr 28, 9:34 am, v...@calcite.rhyolite.com (Vernon Schryver) wrote:

> It is almost (but not quite) true that if you could say that
> "Crypto-Hash CH() detects all N bit errors" then CH would be "broken"
> on the grounds that you know it doesn't detect all N+1 bit errors,
> and so some of those undetected N+1 bit changes could be used for evil.


If that were true, the crypto hash would be broken. The whole point of
a crypto hash is that even if you know such changes exist, they cannot
be used for evil because they cannot be *found*. The possible
advantage of a crypto hash over another hash would be that collisions
cannot be found for a proper crypto hash. (Although in this case, it's
not clear why that would matter. If you want to maliciously corrupt
the data, you can just put in the correct hash anyway.)

> Never mind that most people who use "broken" in that context are wrong,
> as they are blather authoritative sounding nonsense about MD5 being
> "broken." MD5 and some other cryptographic hashes are "broken" only
> for some uses and not others.


Right, but this is dangerously close to one of those uses. All you'd
have to do is sign the hash, and you'd have a use case for which MD5
is broken.

> The big problem there are only vague
> hopes that SHA-512 or any other hash function you might name are not
> just as "breakable." That "hard to analyze" requirement on every
> crypto-hash function is at least so far and perhaps forever a fundamental
> weakness.


Ideally, you adjust your use of a hash so that even if it is "broken"
in the ways it's most likely to be broken in the future, that has no
effect on your use. That requires a deep understanding of the
strengths and weaknesses of cryptographic hashes.

For example, it's quite likely that someone will find two chunks of
data that hash to the same value long before they can find data of the
same length that hash the same hash as a given chunk.

DS
 
Reply With Quote
 
Rick Jones
Guest
Posts: n/a

 
      04-28-2008, 07:20 PM
In comp.protocols.tcp-ip David Schwartz <(E-Mail Removed)> wrote:
> This can be true of a cryptographic hash, and if it's not, then the
> hash is at least somewhat broken. Commonly-used cryptographic hashes
> are not broken.


Are not known to be broken.

rick jones
--
web2.0 n, the dot.com reunion tour...
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
 
Reply With Quote
 
Vernon Schryver
Guest
Posts: n/a

 
      04-28-2008, 07:48 PM
In article <fv5820$b7u$(E-Mail Removed)>,
Rick Jones <(E-Mail Removed)> wrote:

>> This can be true of a cryptographic hash, and if it's not, then the
>> hash is at least somewhat broken. Commonly-used cryptographic hashes
>> are not broken.

>
>Are not known to be broken.


Even that's is gross optimism. All cryptogrpaphic hashes are merely
hoped to not be secretly broken by too many adversaries. The nature
of all current cryptographic hashes is that no one has proven anything
useful about how well they work for simple error detection. I'd believe
SHA-512 detects all single bit errors in all blocks of 512 bits, but
I'd like to see a proof of all double bit errors in 512, all single
bits in 1024 (or even 513) bits, not to mention blocks not so tiny that
you would do better by transmitting second copies of the 64 bytes run
through a bijection ("scrambler"). Anyone who doesn't "know" a bunch
of stuff that is false would choose CRC-512 instead of SHA-512 to detect
natural errors. Unless you are battling adversaries who would use the
obvious ways to outwit CRC-512 as a signature, you are better off with
something than other than a cryptographic hash.


Vernon Schryver (E-Mail Removed)
 
Reply With Quote
 
 
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
'Silent' updates for IE browser Java Jive Broadband 25 12-30-2011 07:03 AM
Silent Phone Line Test David Broadband 1 12-10-2008 05:35 PM
Clueless -- Silent install package Fredly Windows Networking 1 04-05-2005 01:47 PM
[HELP] data corruption: swap of bytes distant from 12 bytes Francois Grieu Windows Networking 0 11-17-2003 07:57 AM
Need guru help with data xfer corruption problem Martin Vuille Linux Networking 5 07-26-2003 05:29 PM



1 2 3 4 5 6 7 8 9 10 11