(E-Mail Removed) wrote:
> David Schwartz wrote:
>> (E-Mail Removed) wrote:
>>
>>> The problem is the directory set is very large and they are not under
>>> one or several root directories, actually they spread a lot.
>>> If I run rsync with root directories, I will have to run rsync many
>>> many times, each time with a root directory.
>> Then create a special image of the directories just to rsync.
>>
>>> You say my plan does not work, could you elaborate on that?
>>> My understanding is, let us say i have one file, the size is 1M. I know
>>> rsync divides the file into pieces and run checksum to compare whether
>>> a piece needs to be updated. Let us say the piece size is 1k, then
>>> there are 1000 pieces for this file.
>>> If there are only 5 pieces whose checksum are different between source
>>> and destination file, then only those 5 pieces are copied over.
>>> This understanding is not correct?
>> That is correct. However, how will that help you? The 'zip' function
>> mixes all the file pieces together when it compresses them. Even if a
>> file is unchanged, it will not compress to the same thing in a
>> different context.
>>
>> For example, suppose you have ten files all of which contain the
>> letters 'ab'. The first one may not compress, but the other nine may
>> compress to the idea of 'same as the first file'. Now, what if someone
>> changes that first file? All of a sudden, the encoding of all ten files
>> has changed.
>>
>> This is the norm, not the exception. That is, in a typical zip
>> application, changing one file will affect the encoding of every
>> compressible file after it. (Through a domino affect, basically.)
>>
>> DS
>
> Thanks, this really makes sense.
>
> Then do you know if tar can work here? Does tar use some algorithm to
> combine data or just simply concatenate?
>
> Or you have any suggestion for my situation?
>
Two questions:
1. Why are you afraid of running rsync multiple times, once for each
directory ?
2. If you add ten small files somewhere in the middle of the tar
archive, the poor rsync would have to work a lot to identify where in
the tar archive the modified parts start and where they end. This is not
really efficient.
I suggest you write a bash script that looks like this:
for dir in dir1, dir2, dir3, ...
do
rsync -azv -e "ssh -C -l remote_user" --delete $dir
user@server:backup/$dir
done
Note that if you use ssh you can achieve 2 goals at once: encrypted
connection and compression (via the -C flag).
If you really wish to run rsync only once (for religious reasons):
IMAGE=/tmp/image/
mkdir $IMAGE
for i in dir1, dir2, ...
do
# this only works if $i and $IMAGE are on the same partition
# alternatively you can make symbolic links
# and tell rsync to follow them
cp -al $i $IMAGE
done
rsync -azv $IMAGE user@server:backup
Mihai
PS: I haven't checked the above scripts, perhaps I got some syntax
wrong. Take them only as guidelines.