Hello.
I've been discussing with my boss about the following question, which
arises because a security issue.
QUESTION:
When I have a TCP/IP socket, can the write() function return an error
in some critical circumstance (fexmpl: power failing just at that
moment) but the packet was sent OK?
POSSIBILITIES:
returns OK, packet was sent OK: My boss and me agree.
returns OK, packet accidentally did not get the destination: My boss
and me agree. (ex: placed on TCP/IP stack, write returned OK and
later, while the packet is still in the outgoing queue, the
destination had a power fail).
returns FAIL, packet failed: My boss and me agree.
returns FAIL, packet accidentally reached the destination: My boss
says it is possible, I say this cannot happen.
POSITIONMENTS:
I argument that it is NOT possible: It is clear that 99% it says OK
the packet is sent, but there are cases (for example power fail at the
other side of the network cable) at which the data may be correctly
placed at the stack of the tcp/ip queue and the packet never gets
sent. I assume that you can trap that situation later catching the
SIG_PIPE. So returning OK, has two realities: OK or fail (ambiguous).
So, necessarily, the FAIL result must not be ambiguous, because if it
was, the result had absolutely no sense and the function would be
defined as returning void. Ie: If you had a return value which if true
means true or false, and if false, means true or false, then the
result value is stupid. By reducting to the absurd, I guess that at
least one of the results must be reliable.
Another argument is: If it could return error on a packet being sent,
then a program sending 4 write()'s one byte per write, sending "A",
"B", "C", "D" then if it fails sending B it will retry. If the
argument of "packet may be sent while error is returned" is true, then
the receival host could receive ABBCD (two B's) and TCP/IP warantees
that this does not happen, so a write() returning fail never should
have sent any data.
On the ohter hand, my boss says that these are not arguments enough,
and I must demonstrate him that those are not supositions, that this
is true 100% (not 99.999999%). There is a security issue envolved here
and he is responsible of the system not failing.
ENVIRONMENT:
Simplifying, we work at a world-wide lotteries company. We are working
on electronic terminals. The terminal allos a player to bet. He fills
in a grid and submits. When the button "sumbit" is pressed, the
bet-data is sent to a game-server which validates the bet, assigns a
unique ID to it and returns it to the terminal. When teh terminal
receives the data (ie: transactionally it has completely been
submitted to the server), it prints a ticket on a thermal printer for
the player to have something to take with him as a proof-of-buy.
If there is some problem at sending back the ID to the terminal, the
bet is autocancelled (it never existed) (for example, power fail of
the terminal between sending the bet and receiving the result).
When data is sent to the terminal, the ticket is set as "printed" and
cannot be cancelled. If the ticket is really printed all is OK. If the
ticket was not completely printed (power fail "during" printing) then
the player will get angry, will go to the agency-staff and will
complain. As the ticket has NOT been cancelled, it is saved on the
central system and the agency-staff may print a copy or something like
this.
But the oposite case CANNOT exist: It is completely forbidden that a
ticket has been printed, and it is "in the street" BUT the bet has
been autocancelled.
THE FEAR:
My boss says: If write() in some weird circumstance may return "fail"
but the data was really (accidentally) sent, then the terminal could
print the ticket (data received) and the system could cancellate the
bet (write() returned fail).
He asks me to demonstrate that it is IMPOSSIBLE (0%, not 0.0000001%)
that this sitaution happens.
THE RESULTS:
The man page of write() documetns the errors as "local" errors, like
for example the fd does not exist and so on (this case, of course data
is not sent), but there is one error that says that it is returned if
the signal was received before ANY data was sent. Ie: If only one byte
was sent, this error is not reported. But also he says that the man
explicitly tells that "other errors may be returned depending on the
implementation". He arguments that this opens the 0% to 0.0000001% and
this is not tolerable.
More further, I've tried to read a bit of the kernel source code, but
up to where I arrived, I found that write() finishes calling a
"sub-write" function which is placed inside the file descriptor
stricture. Inside it there is an array op "file operations" such like
"open", "seek", "write", "close" and so on, and these are pointers to
functions so that depending on the filesystem, the same write() can
call several disctinct "underlying" writes. I cannot follow "who
filled that structure and which pointer is placed there" so I cannot
follow to the real tcp/ip write call. Even if I could, my boss says
taht this would be a contingent solution: This implementation could
act like this but the next not.
REFORMULATED QUESTION:
So... How can I find a "documentation" of "how the write() specific to
TCP/IP behaves"? I mean... where can I faind documentation about the
errors that the man leaves open? In other words... How can I
demonstrate to my boss that if write() says ERROR it is because there
was an error at the 100% of the cases and not at the 99.99999999999%
of the cases?
Sorry for the post being so tough, but I was not able to simplify :-)
Thank you very very very much.
See you!
Xavi.