Networking Forums

Networking Forums > Computer Networking > Linux Networking > distributed measurement problem

Reply
Thread Tools Display Modes

distributed measurement problem

 
 
Shashank
Guest
Posts: n/a

 
      11-03-2008, 10:44 PM
Hi,

I am working on a distributed measurement project with a centralized
data collection node (server) and 28 clients with different number of
interfaces(1-4).

I've written C code that captures packets on all the interfaces on a
node(on which it runs), gets statistics(pps, Mbps etc for different
subsets of traffic), and sends it to the server every second. The
server basically creates a file for each interface on each client and
writes these statistics into the respective files.

I've used python to automate and synchronize, so it basically runs the
C program in the background on each of the interfaces.

The problem is:
If I initiate the client program to run for, say 200 seconds, the
clients run for the entire period sending statistics per second to the
server. However, files corresponding to some interfaces do not show
the entire 200 seconds even though the client finishes execution and
the server closes the file after the client has finished execution.

I don't think this is an issue with the server being flooded with data
(its multithreaded and the below example was run one node at a time)
or about packets being dropped(doesn't make sense for this problem
plus ifconfig doesnt show dropped packets and I am using TCP sockets
as well). I am not sure whether there is a bug in my code, since its
essentially the same client code on all systems.

Here is the wc -l execution on three nodes run one at a time for 200
seconds:

> wc -l *.log

44 core1.10.1.11.2.log
200 core1.10.1.3.2.log
49 core1.10.1.32.3.log
200 core1.10.1.9.2.log
49 core2.10.1.13.2.log
49 core2.10.1.15.2.log
200 core2.10.1.3.3.log
200 core2.10.1.5.2.log
49 core3.10.1.17.2.log
200 core3.10.1.18.2.log
200 core3.10.1.30.3.log
200 core3.10.1.5.3.log
1640 total

Each has 4 interfaces on it, and although the experiment ran for 200
seconds, some show about 44 or 49 lines on it. ifconfig on the server
shows no dropped packets.

Does anyone have pointers on this?
Sorry for the long post,

Thanks,
Shashank
 
Reply With Quote
 
 
 
 
David Schwartz
Guest
Posts: n/a

 
      11-04-2008, 01:21 AM
On Nov 3, 3:44*pm, Shashank <shashank.shanb...@gmail.com> wrote:

> The problem is:
> If I initiate the client program to run for, say 200 seconds, the
> clients run for the entire period sending statistics per second to the
> server. However, files corresponding to some interfaces do not show
> the entire 200 seconds even though the client finishes execution and
> the server closes the file after the client has finished execution.


This doesn't fit the pattern for any "typical mistake" that I'm
familiar with. I'd suggest trying to localize the problem bit by bit.

For example, first modify the client software to checkpoint how many
reports it has sent to the server. Have a client log file, and have it
write a 'checkpoint' after every ten messages. Open the log file in
append mode, assemble the checkpoint message in a buffer, and send it
with a single call to 'write'. If the checkpoints don't show the 200
messages, then you know the client is the issue.

Then add similar checkpointing in the software that talks to the
client. Make sure the server software sees 200 messages. If not, then
you know something is screwy in that piece of software. (Perhaps the
client isn't really sending the messages? Perhaps the server is
dropping some of them?)

Keep going until you localize the problem.

DS
 
Reply With Quote
 
Joe Beanfish
Guest
Posts: n/a

 
      11-04-2008, 05:21 PM
David Schwartz wrote:
> On Nov 3, 3:44 pm, Shashank <shashank.shanb...@gmail.com> wrote:
>
>> The problem is:
>> If I initiate the client program to run for, say 200 seconds, the
>> clients run for the entire period sending statistics per second to the
>> server. However, files corresponding to some interfaces do not show
>> the entire 200 seconds even though the client finishes execution and
>> the server closes the file after the client has finished execution.

>
> This doesn't fit the pattern for any "typical mistake" that I'm
> familiar with. I'd suggest trying to localize the problem bit by bit.
>
> For example, first modify the client software to checkpoint how many
> reports it has sent to the server. Have a client log file, and have it
> write a 'checkpoint' after every ten messages. Open the log file in
> append mode, assemble the checkpoint message in a buffer, and send it
> with a single call to 'write'. If the checkpoints don't show the 200
> messages, then you know the client is the issue.
>
> Then add similar checkpointing in the software that talks to the
> client. Make sure the server software sees 200 messages. If not, then
> you know something is screwy in that piece of software. (Perhaps the
> client isn't really sending the messages? Perhaps the server is
> dropping some of them?)
>
> Keep going until you localize the problem.
>
> DS


Also timestamp your messages and look to see which ones are missing.
That may give you a clue of where to look for the problem.
 
Reply With Quote
 
Shashank
Guest
Posts: n/a

 
      11-11-2008, 07:44 AM
On Nov 4, 1:21*pm, Joe Beanfish <j...@nospam.duh> wrote:
> David Schwartz wrote:
> > On Nov 3, 3:44 pm, Shashank <shashank.shanb...@gmail.com> wrote:

>
> >> The problem is:
> >> If I initiate the client program to run for, say 200 seconds, the
> >> clients run for the entire period sending statistics per second to the
> >> server. However, files corresponding to some interfaces do not show
> >> the entire 200 seconds even though the client finishes execution and
> >> the server closes the file after the client has finished execution.

>
> > This doesn't fit the pattern for any "typical mistake" that I'm
> > familiar with. I'd suggest trying to localize the problem bit by bit.

>
> > For example, first modify the client software to checkpoint how many
> > reports it has sent to the server. Have a client log file, and have it
> > write a 'checkpoint' after every ten messages. Open the log file in
> > append mode, assemble the checkpoint message in a buffer, and send it
> > with a single call to 'write'. If the checkpoints don't show the 200
> > messages, then you know the client is the issue.

>
> > Then add similar checkpointing in the software that talks to the
> > client. Make sure the server software sees 200 messages. If not, then
> > you know something is screwy in that piece of software. (Perhaps the
> > client isn't really sending the messages? Perhaps the server is
> > dropping some of them?)

>
> > Keep going until you localize the problem.

>
> > DS

>
> Also timestamp your messages and look to see which ones are missing.
> That may give you a clue of where to look for the problem.


Hello,

Thanks to both of you for the suggestions.
The problem was actually in one of the anomaly detection algorithms I
was using.
I have sorted the problem out.
Thanks..
Shashank
 
Reply With Quote
 
 
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
RSSI measurement baskar_bharadwaj Wireless Internet 1 11-06-2007 08:41 PM
Traffic Measurement Tools waxinwaxout@hotmail.com Linux Networking 6 04-18-2006 01:45 PM
Bandwidth measurement Richard Forbes Linux Networking 5 12-01-2003 07:12 PM
HH conversion - SNR measurement? Ian Stirling Broadband 17 11-15-2003 06:44 AM
Throughput measurement... Tor Tveitane Wireless Internet 5 10-23-2003 09:45 PM



1 2 3 4 5 6 7 8 9 10 11