Networking Forums

Networking Forums > Computer Networking > Linux Networking > Is my understanding of rsync correct?

Reply
Thread Tools Display Modes

Is my understanding of rsync correct?

 
 
linq936@hotmail.com
Guest
Posts: n/a

 
      10-19-2006, 06:14 PM
Hi,
I am going to do a rsync between 2 machines over a slow network
connection. On source machine there is a large directory set i need to
copy over to machine 2. Here is my plan to do this using rsync:

1. I run zip to zip up all the directories into one zip file
2. run rsync to copy over the zip file
3. unzip the file on machine 2
4. create cronjob to do this nightly

The size of the zip file should be around 30M - 40M, but after the
1st copy over, it should be OK because rsync only copies the changed
parts of the zip file.

Do you see if this works?

 
Reply With Quote
 
 
 
 
Grant
Guest
Posts: n/a

 
      10-19-2006, 06:33 PM
On 19 Oct 2006 11:14:52 -0700, (E-Mail Removed) wrote:

>Hi,
> I am going to do a rsync between 2 machines over a slow network
>connection. On source machine there is a large directory set i need to
>copy over to machine 2. Here is my plan to do this using rsync:
>
> 1. I run zip to zip up all the directories into one zip file
> 2. run rsync to copy over the zip file
> 3. unzip the file on machine 2
> 4. create cronjob to do this nightly
>
> The size of the zip file should be around 30M - 40M, but after the
>1st copy over, it should be OK because rsync only copies the changed
>parts of the zip file.
>
> Do you see if this works?


It fails.

Grant.

--
http://bugsplatter.mine.nu/
 
Reply With Quote
 
Frank W. Steiner
Guest
Posts: n/a

 
      10-19-2006, 06:54 PM
On Thu, 19 Oct 2006 11:14:52 -0700, linq936 wrote:

> Hi,
> I am going to do a rsync between 2 machines over a slow network
> connection. On source machine there is a large directory set i need to
> copy over to machine 2. Here is my plan to do this using rsync:
>
> 1. I run zip to zip up all the directories into one zip file 2. run
> rsync to copy over the zip file 3. unzip the file on machine 2
> 4. create cronjob to do this nightly
>
> The size of the zip file should be around 30M - 40M, but after the
> 1st copy over, it should be OK because rsync only copies the changed parts
> of the zip file.
>
> Do you see if this works?


The nice thing about rsync is that it allows to copy directories, and
keep the copies in sync with the originals, without having to package them
(with zip, tar or whatever) before. I do not think that the scheme that
you are proposing would work; but, even if it did, it is not a good use of
the capabilities of rsync.

I think that you should read the rsync documentation more carefully.



 
Reply With Quote
 
linq936@hotmail.com
Guest
Posts: n/a

 
      10-19-2006, 07:15 PM

Frank W. Steiner wrote:
> On Thu, 19 Oct 2006 11:14:52 -0700, linq936 wrote:
>
> > Hi,
> > I am going to do a rsync between 2 machines over a slow network
> > connection. On source machine there is a large directory set i need to
> > copy over to machine 2. Here is my plan to do this using rsync:
> >
> > 1. I run zip to zip up all the directories into one zip file 2. run
> > rsync to copy over the zip file 3. unzip the file on machine 2
> > 4. create cronjob to do this nightly
> >
> > The size of the zip file should be around 30M - 40M, but after the
> > 1st copy over, it should be OK because rsync only copies the changed parts
> > of the zip file.
> >
> > Do you see if this works?

>
> The nice thing about rsync is that it allows to copy directories, and
> keep the copies in sync with the originals, without having to package them
> (with zip, tar or whatever) before. I do not think that the scheme that
> you are proposing would work; but, even if it did, it is not a good use of
> the capabilities of rsync.
>
> I think that you should read the rsync documentation more carefully.


Thanks for you reply.

The problem is the directory set is very large and they are not under
one or several root directories, actually they spread a lot.

If I run rsync with root directories, I will have to run rsync many
many times, each time with a root directory.

You say my plan does not work, could you elaborate on that?

My understanding is, let us say i have one file, the size is 1M. I know
rsync divides the file into pieces and run checksum to compare whether
a piece needs to be updated. Let us say the piece size is 1k, then
there are 1000 pieces for this file.

If there are only 5 pieces whose checksum are different between source
and destination file, then only those 5 pieces are copied over.

This understanding is not correct?

 
Reply With Quote
 
Jack Snodgrass
Guest
Posts: n/a

 
      10-20-2006, 01:15 AM
On Thu, 19 Oct 2006 11:14:52 -0700, linq936 wrote:

> Hi,
> I am going to do a rsync between 2 machines over a slow network
> connection. On source machine there is a large directory set i need to
> copy over to machine 2. Here is my plan to do this using rsync:
>
> 1. I run zip to zip up all the directories into one zip file
> 2. run rsync to copy over the zip file
> 3. unzip the file on machine 2
> 4. create cronjob to do this nightly
>
> The size of the zip file should be around 30M - 40M, but after the
> 1st copy over, it should be OK because rsync only copies the changed
> parts of the zip file.
>
> Do you see if this works?



depends on how large a 'large' file is on your slow network, but
you may find that sending 1 large file has issues... if it doesn't
go... and stops at 95% for example... and crashes... you lose all
of the data sent and have to send it again. I use a program I found
called 'splitpea'. It splits the large file into 'chunks'. I rsync
the small chucks... if the link dies, I only have to resend the
chuck that failed. Once all the data is over there, I use splitpea
to re-assemble the large file.

jack

--
D.A.M. - Mothers Against Dyslexia

see http://www.jacksnodgrass.com for my contact info.

jack - Grapevine/Richardson
 
Reply With Quote
 
David Schwartz
Guest
Posts: n/a

 
      10-20-2006, 01:33 AM

(E-Mail Removed) wrote:

> The problem is the directory set is very large and they are not under
> one or several root directories, actually they spread a lot.


> If I run rsync with root directories, I will have to run rsync many
> many times, each time with a root directory.


Then create a special image of the directories just to rsync.

> You say my plan does not work, could you elaborate on that?


> My understanding is, let us say i have one file, the size is 1M. I know
> rsync divides the file into pieces and run checksum to compare whether
> a piece needs to be updated. Let us say the piece size is 1k, then
> there are 1000 pieces for this file.


> If there are only 5 pieces whose checksum are different between source
> and destination file, then only those 5 pieces are copied over.


> This understanding is not correct?


That is correct. However, how will that help you? The 'zip' function
mixes all the file pieces together when it compresses them. Even if a
file is unchanged, it will not compress to the same thing in a
different context.

For example, suppose you have ten files all of which contain the
letters 'ab'. The first one may not compress, but the other nine may
compress to the idea of 'same as the first file'. Now, what if someone
changes that first file? All of a sudden, the encoding of all ten files
has changed.

This is the norm, not the exception. That is, in a typical zip
application, changing one file will affect the encoding of every
compressible file after it. (Through a domino affect, basically.)

DS

 
Reply With Quote
 
linq936@hotmail.com
Guest
Posts: n/a

 
      10-20-2006, 02:29 AM

David Schwartz wrote:
> (E-Mail Removed) wrote:
>
> > The problem is the directory set is very large and they are not under
> > one or several root directories, actually they spread a lot.

>
> > If I run rsync with root directories, I will have to run rsync many
> > many times, each time with a root directory.

>
> Then create a special image of the directories just to rsync.
>
> > You say my plan does not work, could you elaborate on that?

>
> > My understanding is, let us say i have one file, the size is 1M. I know
> > rsync divides the file into pieces and run checksum to compare whether
> > a piece needs to be updated. Let us say the piece size is 1k, then
> > there are 1000 pieces for this file.

>
> > If there are only 5 pieces whose checksum are different between source
> > and destination file, then only those 5 pieces are copied over.

>
> > This understanding is not correct?

>
> That is correct. However, how will that help you? The 'zip' function
> mixes all the file pieces together when it compresses them. Even if a
> file is unchanged, it will not compress to the same thing in a
> different context.
>
> For example, suppose you have ten files all of which contain the
> letters 'ab'. The first one may not compress, but the other nine may
> compress to the idea of 'same as the first file'. Now, what if someone
> changes that first file? All of a sudden, the encoding of all ten files
> has changed.
>
> This is the norm, not the exception. That is, in a typical zip
> application, changing one file will affect the encoding of every
> compressible file after it. (Through a domino affect, basically.)
>
> DS


Thanks, this really makes sense.

Then do you know if tar can work here? Does tar use some algorithm to
combine data or just simply concatenate?

Or you have any suggestion for my situation?

 
Reply With Quote
 
Unruh
Guest
Posts: n/a

 
      10-20-2006, 04:17 AM
(E-Mail Removed) writes:

>Hi,
> I am going to do a rsync between 2 machines over a slow network
>connection. On source machine there is a large directory set i need to
>copy over to machine 2. Here is my plan to do this using rsync:


> 1. I run zip to zip up all the directories into one zip file
> 2. run rsync to copy over the zip file
> 3. unzip the file on machine 2
> 4. create cronjob to do this nightly


> The size of the zip file should be around 30M - 40M, but after the
>1st copy over, it should be OK because rsync only copies the changed
>parts of the zip file.


> Do you see if this works?


Really teriible idea. Any compression looks through the file to figure out
what a good compressionscheme is.Ie even small changes in teh file can
totally change the zipped file.

Just transfer all the files using rsync. As you say after the first time
only the changes will be transfered.


 
Reply With Quote
 
Mihai Osian
Guest
Posts: n/a

 
      10-20-2006, 07:05 AM
(E-Mail Removed) wrote:
> David Schwartz wrote:
>> (E-Mail Removed) wrote:
>>
>>> The problem is the directory set is very large and they are not under
>>> one or several root directories, actually they spread a lot.
>>> If I run rsync with root directories, I will have to run rsync many
>>> many times, each time with a root directory.

>> Then create a special image of the directories just to rsync.
>>
>>> You say my plan does not work, could you elaborate on that?
>>> My understanding is, let us say i have one file, the size is 1M. I know
>>> rsync divides the file into pieces and run checksum to compare whether
>>> a piece needs to be updated. Let us say the piece size is 1k, then
>>> there are 1000 pieces for this file.
>>> If there are only 5 pieces whose checksum are different between source
>>> and destination file, then only those 5 pieces are copied over.
>>> This understanding is not correct?

>> That is correct. However, how will that help you? The 'zip' function
>> mixes all the file pieces together when it compresses them. Even if a
>> file is unchanged, it will not compress to the same thing in a
>> different context.
>>
>> For example, suppose you have ten files all of which contain the
>> letters 'ab'. The first one may not compress, but the other nine may
>> compress to the idea of 'same as the first file'. Now, what if someone
>> changes that first file? All of a sudden, the encoding of all ten files
>> has changed.
>>
>> This is the norm, not the exception. That is, in a typical zip
>> application, changing one file will affect the encoding of every
>> compressible file after it. (Through a domino affect, basically.)
>>
>> DS

>
> Thanks, this really makes sense.
>
> Then do you know if tar can work here? Does tar use some algorithm to
> combine data or just simply concatenate?
>
> Or you have any suggestion for my situation?
>



Two questions:

1. Why are you afraid of running rsync multiple times, once for each
directory ?
2. If you add ten small files somewhere in the middle of the tar
archive, the poor rsync would have to work a lot to identify where in
the tar archive the modified parts start and where they end. This is not
really efficient.

I suggest you write a bash script that looks like this:

for dir in dir1, dir2, dir3, ...
do
rsync -azv -e "ssh -C -l remote_user" --delete $dir
user@server:backup/$dir
done

Note that if you use ssh you can achieve 2 goals at once: encrypted
connection and compression (via the -C flag).


If you really wish to run rsync only once (for religious reasons):

IMAGE=/tmp/image/
mkdir $IMAGE
for i in dir1, dir2, ...
do
# this only works if $i and $IMAGE are on the same partition
# alternatively you can make symbolic links
# and tell rsync to follow them
cp -al $i $IMAGE
done
rsync -azv $IMAGE user@server:backup

Mihai

PS: I haven't checked the above scripts, perhaps I got some syntax
wrong. Take them only as guidelines.

 
Reply With Quote
 
Philipp Pagel
Guest
Posts: n/a

 
      10-20-2006, 08:30 AM
(E-Mail Removed) wrote:
> Frank W. Steiner wrote:
> > On Thu, 19 Oct 2006 11:14:52 -0700, linq936 wrote:


> The problem is the directory set is very large and they are not under
> one or several root directories, actually they spread a lot.


> If I run rsync with root directories, I will have to run rsync many
> many times, each time with a root directory.


So what? In your scenario you will have to specify all the directories
to zip instead. I don't see how this will make things
easier/faster/whatever.

As others have pointed out already: Just run rsync on all directories.
Maybe you could tell us why you feel this is not a good option.

You may also want to have a look at unison:

http://www.cis.upenn.edu/~bcpierce/unison/

cu
Philipp

--
Dr. Philipp Pagel Tel. +49-8161-71 2131
Dept. of Genome Oriented Bioinformatics Fax. +49-8161-71 2186
Technical University of Munich
http://mips.gsf.de/staff/pagel
 
Reply With Quote
 
 
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Help understanding nameservers CRC Linux Networking 2 09-08-2008 02:19 AM
How to get log file for rsync operation? Does rsync also delete remote files? Goran Ivanic Linux Networking 9 05-05-2008 04:58 PM
Understanding the output of DMT Mortimer Broadband 1 08-07-2007 07:05 PM
Understanding TCPDUMP Rav Linux Networking 5 11-02-2006 11:47 PM
DNS documents for thorough understanding. in1478c Linux Networking 2 01-05-2006 11:00 AM



1 2 3 4 5 6 7 8 9 10 11