Networking Forums

Networking Forums > Computer Networking > Linux Networking > Which is fastest program for copying files over network e.g rcp, ftp, scp, rsync

Reply
Thread Tools Display Modes

Which is fastest program for copying files over network e.g rcp, ftp, scp, rsync

 
 
David Travers
Guest
Posts: n/a

 
      12-02-2004, 12:01 AM
I want to mirror 1 server to another and want to do it as fast as possible.

It is approx 8.1GB of data of approx 350 files, ranging from 32KB in size to
2GB.

I have used rcp and think this is the fastest method with approx 8.1MB/s on
a Fast Ethernet host to Gigabit Ethernet host. This is an internal network
and security is not really an issue on these two hosts so the data need not
be encrypted in anyway. I have tried rsync, nfs and rsync but thought these
were slower.

Is this the fastest way to copy the files.

Overall it takes around 20 mins to copy the data.


 
Reply With Quote
 
 
 
 
Peter T. Breuer
Guest
Posts: n/a

 
      12-02-2004, 10:00 AM
David Travers <(E-Mail Removed)> wrote:
> I want to mirror 1 server to another and want to do it as fast as possible.


You can't. And what do you mean by "mirror"? realtime mirror? Or do
you want a snapshot?

> It is approx 8.1GB of data of approx 350 files, ranging from 32KB in size to
> 2GB.
>
> I have used rcp and think this is the fastest method with approx 8.1MB/s on


That's a snapshot, and that's about the slowest way you can do it!
There's not even any compression involved the way you do it! Use rsync.

> Is this the fastest way to copy the files.


No, it's about the slowest.

> Overall it takes around 20 mins to copy the data.


Unlikely. 8GB in 20m would be about 6.5MB/s. That's nearly flat out for
rpc over tcp over 100Mb/s ethernet, uncompressed. Unless you have a
dedicated cable it's hard to see that being kept up.

If you just want to go for speed, use ftp (over udp) and compress the
data, streaming. In an ftp client from the other end, something like:

get foo.tar.gz "|tar xzvf -"

should do the trick. Make sure you are positioned correctly. The ftp
server will tar up the directory foo and compress it, and send it to
you, who receive it in a pipe to a tar that uncompresses and unarchives
it (to the directory you are in, down one).

But you really would be better off doing a differential copy with
something like rsync or unison. The first time it will be slower, but
thereafter it would be 100 times faster, if I guess right.

Peter
 
Reply With Quote
 
Steve Wampler
Guest
Posts: n/a

 
      12-02-2004, 04:19 PM
On Thu, 02 Dec 2004 12:00:00 +0100, Peter T. Breuer wrote:

> David Travers <(E-Mail Removed)> wrote:
>> I want to mirror 1 server to another and want to do it as fast as possible.

....
>> It is approx 8.1GB of data of approx 350 files, ranging from 32KB in size to
>> 2GB.
>>
>> I have used rcp and think this is the fastest method with approx 8.1MB/s on

>

....
> There's not even any compression involved the way you do it! Use rsync.


Oddly enough, on *really* large files, it may be faster to use rcp.
I did some timings with 80MB files and discovered that, if it turned out
that the file was identical (i.e. rsync wouldn't have to copy anything)
it was faster just to copy it with rcp - it took rsync longer to figure
out that nothing needed to be copied! Granted, on a slower network
this result could easily change (this was with gigabit ethernet). It
wouldn't surprise me if the time to compress a really large file might
exceed the cost of just copying it over a sufficiently fast network, but
"sufficiently" may not be realizable.

So the 'fastest' way depends upon a lot of factors. Why not try a few
different ways and compare? If this transfer is going to be done often
it would pay to spend some time measuring...

 
Reply With Quote
 
Peter T. Breuer
Guest
Posts: n/a

 
      12-02-2004, 04:51 PM
Steve Wampler <(E-Mail Removed)> wrote:
> > There's not even any compression involved the way you do it! Use rsync.

>
> Oddly enough, on *really* large files, it may be faster to use rcp.


No ...

> I did some timings with 80MB files and discovered that, if it turned out
> that the file was identical (i.e. rsync wouldn't have to copy anything)
> it was faster just to copy it with rcp - it took rsync longer to figure
> out that nothing needed to be copied! Granted, on a slower network


It cannot easily be the case - it has to calculate the rolling md4 sums
both sides, which takes as long as it takes to read the file in the
first place at one side. So with rsync the cost is


2 * time to read file + 1 * time to transmit DIFFS

while with rcp the cost is


1 * time to transmit FILE

(assuming that the read time for the file is amortized into the send,
which will be the case when the net is slower than the disk)

If you assume the net is 10 times as slow as the disk, and there is a
proportion X of diffs in a file of size S (that takes time T to
transmit), then this is

(2 + 10 * X) * T

versus

10 * T

so you clearly will never win. The only chance you have is if the
proportion of diffs is over 80%.


> this result could easily change (this was with gigabit ethernet). It


Oh, well in that case the net might be exactly as fast as the disk,
which makes it

(2 + X) * T

against

1 * T

so yes, you win by sending without looking. But of course that result
only holds over short range connection via direct wire . And as soon
as you get faster disks the balance will shift again.


> wouldn't surprise me if the time to compress a really large file might
> exceed the cost of just copying it over a sufficiently fast network, but
> "sufficiently" may not be realizable.


Well, there is nothing to be surprised at. If the net is as fast as the
disk, then you gain nothing by compressing before sending. Is that not
obvious?


> So the 'fastest' way depends upon a lot of factors. Why not try a few
> different ways and compare? If this transfer is going to be done often
> it would pay to spend some time measuring...


Amen. I did say up front that there is no "fastest", did I not? At
least I recall putting something that I intended to mean that.

Peter
 
Reply With Quote
 
David Travers
Guest
Posts: n/a

 
      12-02-2004, 06:28 PM
It's and internal Gigabit switched network. One host has a Fast Ethernet
connection (HP9000 HP-UX server) the other Gigabit Ethernet (Redhat Linux AS
v3). Hard disks are all Ultra320 SCSI. There are no WAN links involved.

The mirror snapshot from 1 server to another will be a complete snapshot as
all the files will need to copied in it's entirety . The files being copied
are very large database files.

As I said it copied the data in 15-20 mins for an average transfer rate of
around 8MB/s (which is the fastest transfer rate I have seen, bet ftp in my
tests).

Two of the files being copied are 1.8GB in size. Compressing these can take
just as long as it takes to copy the data over the network.

Any more suggestions as to what would be the best protocol to use.

"Peter T. Breuer" <(E-Mail Removed)> wrote in message
news:g69382-(E-Mail Removed)...
> David Travers <(E-Mail Removed)> wrote:
>> I want to mirror 1 server to another and want to do it as fast as
>> possible.

>
> You can't. And what do you mean by "mirror"? realtime mirror? Or do
> you want a snapshot?
>
>> It is approx 8.1GB of data of approx 350 files, ranging from 32KB in size
>> to
>> 2GB.
>>
>> I have used rcp and think this is the fastest method with approx 8.1MB/s
>> on

>
> That's a snapshot, and that's about the slowest way you can do it!
> There's not even any compression involved the way you do it! Use rsync.
>
>> Is this the fastest way to copy the files.

>
> No, it's about the slowest.
>
>> Overall it takes around 20 mins to copy the data.

>
> Unlikely. 8GB in 20m would be about 6.5MB/s. That's nearly flat out for
> rpc over tcp over 100Mb/s ethernet, uncompressed. Unless you have a
> dedicated cable it's hard to see that being kept up.
>
> If you just want to go for speed, use ftp (over udp) and compress the
> data, streaming. In an ftp client from the other end, something like:
>
> get foo.tar.gz "|tar xzvf -"
>
> should do the trick. Make sure you are positioned correctly. The ftp
> server will tar up the directory foo and compress it, and send it to
> you, who receive it in a pipe to a tar that uncompresses and unarchives
> it (to the directory you are in, down one).
>
> But you really would be better off doing a differential copy with
> something like rsync or unison. The first time it will be slower, but
> thereafter it would be 100 times faster, if I guess right.
>
> Peter



 
Reply With Quote
 
Simon Waters
Guest
Posts: n/a

 
      12-02-2004, 07:13 PM
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Peter T. Breuer wrote:
|
| Amen. I did say up front that there is no "fastest", did I not? At
| least I recall putting something that I intended to mean that.

At 8MB/s on a 100Mbps network he is getting into marginal returns
unless he can reduce the work with something like rsync as you
suggested.

I managed to push NFS to over 70Mbps of data over 100Mbps fully
switched, fullduplex, everyone (including me) was surprised, on HP
workstations once, at the time ftp was still slightly faster, but
ftp didn't fit the purpose. I haven't done the maths but I assume
the 70Mbps is pretty much at or around the theoretical limit for
NFS, given the overheads in the various protocols on top of the
ethernet. At the time FTP was the fastest of the readily available
tools I tested.

However these days it is probably not cost effective to push too
much harder (I spent several days understanding the innards of NFS
writes to get from lousy performance, to the best I could get
without tuning the IP stack). Depending how far apart the boxes are,
as a gigabit ethernet card in the end without will make far more
difference.

Stupid question time...

Does any step in the process do run length encoding if I do these
sorts of copies with the regular tools over ethernet?

Does anything attempt to use fullduplex bandwidth to speed the
transfers in these situations, presumably the receiving end could
suggest or signal patterns it knows over the return channel,
allowing the sending end to say "data data data - your pattern 10267
- - data data" or some such, or is this a futile exercise in speeding
transfer - seems it might make sense with assymetric connections.


-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Debian - http://enigmail.mozdev.org

iD8DBQFBr3d0GFXfHI9FVgYRAsR0AKCBD4320um8XDGbYjBGNL ZWuOqVFQCgqNMo
No5i0roJ9Bbg/OdX1vl7jfM=
=Kxsd
-----END PGP SIGNATURE-----
 
Reply With Quote
 
Alexander Clouter
Guest
Posts: n/a

 
      12-02-2004, 08:13 PM
On 2004-12-02, David Travers <(E-Mail Removed)> wrote:
> I want to mirror 1 server to another and want to do it as fast as possible.
>
> It is approx 8.1GB of data of approx 350 files, ranging from 32KB in size to
> 2GB.
>
> I have used rcp and think this is the fastest method with approx 8.1MB/s on
> a Fast Ethernet host to Gigabit Ethernet host. This is an internal network
> and security is not really an issue on these two hosts so the data need not
> be encrypted in anyway. I have tried rsync, nfs and rsync but thought these
> were slower.
>
> Is this the fastest way to copy the files.
>
> Overall it takes around 20 mins to copy the data.
>

"poof use 'netcat'" Who needs application layer headers? Raw TCP is what
its about, if you feel 'lucky' go for UDP

Seriously though we used it in our flat, if you are going to saturate a
gigabit ethernet card (ignoring if your motherboard, cpu, memory, harddisk,
etc can actually do this then) netcat is your key.

Of course its not too convient, however if its for large single file
transfers (can you say uncompressed tarball?) you cannot get any better; even
if you use an ssh session to start things off on the remote end as a cron
job.

My flatmate did this on my suggestion, but as hes a C coder he threw in a
special md5/sha1sum hook for safety.

Cheers

Alex
 
Reply With Quote
 
Peter T. Breuer
Guest
Posts: n/a

 
      12-02-2004, 11:58 PM
David Travers <(E-Mail Removed)> wrote:

(please do NOT top post! Thank you).

> It's an internal Gigabit switched network.


Fine. In that case you might do best to send before looking. It depends
on the speed of your disks. If your disks are faster than your net you
might wish to scan them for things you can leave outrather than sending.


> The mirror snapshot from 1 server to another will be a complete snapshot as
> all the files will need to copied in it's entirety . The files being copied


Why?

> are very large database files.


So? Why does that imply they need to be copied in their entirety? To me
it implies they definitely don't!

> Two of the files being copied are 1.8GB in size. Compressing these can take
> just as long as it takes to copy the data over the network.


No it can't, because you will only be compressing an extra bit each KB. You
will do: compress 4KB; send 2KB; compress 4KB; send 2KB; ...

(and if you are running two processes you will be piplelining that to
advantage)

So provided that your disk is twice as fast as your net, you will win by
compressing first (assuming a compression ration of 2:1).

When the sending is done in a separate thread, then of course you win
completely, because you do:

compress 4KB; send 2KB;
cmpress 4KB; send 2KB;
compress 3KB; send 2KB;
...

and thus you run at exactly twice the network rate (if the disk is
twice as fast as the net) for a one-time capital cost of only 4KB extra.

SO it's dependent on the implementation.

Of course, even without a separate sending thread, you would win
provided you divided the set of files into two and sent each half via
different processes, each process doing compression.

Then the net would reach saturation from the two processes, each of
which would read and compress from disk while the other was sending over
the net.


> Any more suggestions as to what would be the best protocol to use.


Use your head. I'm fed up doing your thinking for you.

Peter
 
Reply With Quote
 
 
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to get log file for rsync operation? Does rsync also delete remote files? Goran Ivanic Linux Networking 9 05-05-2008 04:58 PM
Copying Dial Up Network Files Toni Wireless Networks 1 03-21-2007 07:15 AM
Modifying files under Program Files directory in network Gibson Windows Networking 0 10-12-2005 06:31 PM
Mulitple files copying slowly over network John Pugh Windows Networking 1 09-07-2004 04:10 PM
Copying Files over Intranet, Network Access Denied. Dunk Windows Networking 1 10-09-2003 09:38 AM



1 2 3 4 5 6 7 8 9 10 11