Networking Forums

Networking Forums > Computer Networking > Linux Networking > Shared files - advice

Reply
Thread Tools Display Modes

Shared files - advice

 
 
James
Guest
Posts: n/a

 
      07-21-2004, 09:08 PM
Hi, I have two Linux severs running Samba in different offices, providing
around 80 windows users (+-40 in each office) file and print services. The
two offices are are conencted via a rather slow broadband connections
512/256 and are linked to each other via OpenVPN.

The question is what is the best method sharing files between the two
servers. Samba, NFS...?

The type of files that are going to be shared are word, excel and pdf docs.

Any ideas, comments would be appreciated


thnx


James
 
Reply With Quote
 
 
 
 
Juhan Leemet
Guest
Posts: n/a

 
      07-21-2004, 11:19 PM
On Wed, 21 Jul 2004 16:08:40 -0500, James wrote:
> Hi, I have two Linux severs running Samba in different offices, providing
> around 80 windows users (+-40 in each office) file and print services. The
> two offices are are conencted via a rather slow broadband connections
> 512/256 and are linked to each other via OpenVPN.
>
> The question is what is the best method sharing files between the two
> servers. Samba, NFS...?
>
> The type of files that are going to be shared are word, excel and pdf docs.


If it is a slow link, you won't be particularly happy with any protocol.
All of those files/documents get to be HUGE! Many MBs! So, we're talking
in the order of 3MB/32KBps (back of an envelope calculation) = 100 seconds
or 2 minutes transfer each, IF there's nothing else going on with the link.

Here's a suggestion, though...
What about accessing these documents through a cacheing proxy web server
(apache will do this) on each side? Then at least you won't be fetching
the documents several times (within some time window). The cacheing proxy
I believe checks the timestamp?/size? on the file to see if it changed
since last time, and if not, it'll serve the file out of its own cache.
Linux normally does cacheing of network files/data anyway, and so does
Windoze (but I'm not sure how well?), but there is no way to "tune" that.
With a cacheing proxy apache server, you can tell it how much disk to use
for cache, and I think there are other parameters you can "tune". If
you're clever with your workload, you could even "prefetch" or rsync some
stuff overnight/weekend, in preparation for the next day/week. You can
also build in search engines, and indexing web pages, etc. That's the best
idea I can think of. There will likely be other suggestions.

--
Juhan Leemet
Logicognosis, Inc.


 
Reply With Quote
 
Raqueeb Hassan
Guest
Posts: n/a

 
      07-22-2004, 03:55 PM
Well, rsync might do the job, cause it provides fast incremental file
transfer between hosts even on slower link. It does this by copying
just the differences in the files, without requiring that both sets of
files are present at both end across the link.

Go here ... http://samba.anu.edu.au/rsync/

--
raqueeb hassan
congo (drc)
 
Reply With Quote
 
James
Guest
Posts: n/a

 
      07-22-2004, 04:13 PM
Juhan Leemet <(E-Mail Removed)> wrote in
news(E-Mail Removed) om:

> On Wed, 21 Jul 2004 16:08:40 -0500, James wrote:
>> Hi, I have two Linux severs running Samba in different offices,
>> ...
>> ...
>> The question is what is the best method sharing files between the two
>> servers. Samba, NFS...?
>>

>
> If it is a slow link, you won't be particularly happy with any
> protocol. All of those files/documents get to be HUGE! Many MBs! So,
> we're talking in the order of 3MB/32KBps (back of an envelope
> calculation) = 100 seconds or 2 minutes transfer each, IF there's
> nothing else going on with the link.
>
> Here's a suggestion, though...
> What about accessing these documents through a cacheing proxy web
> server (apache will do this) on each side? Then at least you won't be
> fetching the documents several times (within some time window). The
> cacheing proxy I believe checks the timestamp?/size? on the file to
> see if it changed since last time, and if not, it'll serve the file
> out of its own cache. Linux normally does cacheing of network
> files/data anyway, and so does Windoze (but I'm not sure how well?),
> but there is no way to "tune" that. With a cacheing proxy apache
> server, you can tell it how much disk to use for cache, and I think
> there are other parameters you can "tune". If you're clever with your
> workload, you could even "prefetch" or rsync some stuff
> overnight/weekend, in preparation for the next day/week. You can also
> build in search engines, and indexing web pages, etc. That's the best
> idea I can think of. There will likely be other suggestions.
>


I like the idea of using a proxy to assist - also the search and
indexing - I am currently looking at webdav, and these would be nice
enhancments.

The rsync issue raises problems with version control - i.e. what if I
save a doc on to a share which is synced the remote server and then want
to make changes - but someone (remote) has already started to work on
the doc. The idea of a doc being in two places at once could create
problems


thanks for
 
Reply With Quote
 
Juhan Leemet
Guest
Posts: n/a

 
      07-22-2004, 09:59 PM
On Thu, 22 Jul 2004 11:13:34 -0500, James wrote:
> Juhan Leemet <(E-Mail Removed)> wrote in
> news(E-Mail Removed) om:
>> On Wed, 21 Jul 2004 16:08:40 -0500, James wrote:
>>> Hi, I have two Linux severs running Samba in different offices,
>>> ...
>>> The question is what is the best method sharing files between the two
>>> servers. Samba, NFS...?

>>
>> Here's a suggestion, though...
>> What about accessing these documents through a cacheing proxy web
>> server (apache will do this) on each side? ...
>> ...parameters you can "tune". If you're clever with your workload, you
>> could even "prefetch" or rsync some stuff overnight/weekend, in
>> preparation for the next day/week. You can also build in search
>> engines, and indexing web pages, etc. That's the best idea I can think
>> of. There will likely be other suggestions.
>>

> I like the idea of using a proxy to assist - also the search and
> indexing - I am currently looking at webdav, and these would be nice
> enhancments.


Yes, I've been eyeing webdav, but I don't use it yet. My document
preparation does not require that kind of functionality yet. It looks
interesting from a dynamic web site maintenance perspective. There is also
wiki, but I think that is more of a "free for all". Where I would have
concerns with something like webdav is related (as you imply below) with
configuration management (and version control). If you had web site
contents that were good last Friday, and something is wrong on Tuesday,
can you find out exactly what changed (in what order) between the two to
have screwed up your content? I guess this applies more to web index
structures, etc., rather than distinct documents (M$ Word, Excel, etc.).

I'm a proponent (see disclaimer down below?) for applying configuration
management to basically any valuable IP, including specs, docs, code, etc.
(probably not DBMS contents, but certainly their structures!).

> The rsync issue raises problems with version control - i.e. what if I
> save a doc on to a share which is synced the remote server and then want
> to make changes - but someone (remote) has already started to work on
> the doc. The idea of a doc being in two places at once could create
> problems


Depending on your switch selection (there are lots of options), I believe
rsync can at least use timestamps to resolve which copy to move to the
other side if/when one of them has changed. If both have changed? Last one
in wins! That's the classic software configuration management problem.
Unfortunately, rsync doesn't know how to "merge" changes (thank you!).

If you are planning to do "concurrent development" in a distributed
mannerk, then you should consider using some form of semi-formal
configuration management or at least version control tools. I don't know
if there are any integrated into web servers... er, well, the latest
subversion is an apache2 service, but I haven't used it yet, so I feel
uncomfortable making any recommendations. I don't know if webdav has
any hooks in/out of subversion. I would hope. You'll have to research.

You can work around these kinds of things with project management,
scheduling, or work flow kinds of solutions. I don't mean tools
necessarily but a "way of thinking". You would still have that problem if
people were indiscriminately editing "random" files on a local disk. I've
seen this myself. M$ Office tries to put down "lock" files to prevent
someone else editing an already open document, but that does not always
work and/or people can be clever and devious to circumvent any mechanism.

You have to think through your "publishing" application:
* how is work assigned
* who should edit documents
* how to prevent people accessing same document(s)
* is there an editing/approval function
* how are edited documents "released" for consumption
....etc...etc...

As you can imagine, this can be as big or as small as you want to make it.

Personnally, I would look into subversion (for a quick overview of
capabilities) and check if you are running apache2 or apache (1).

p.s. My main consulting "gig" for the last few years has been mainly
configuration management, and its use/implications in various scenarios:
mainframe thru workstations to networked PCs (not much standalone anymore).

--
Juhan Leemet
Logicognosis, Inc.

 
Reply With Quote
 
James
Guest
Posts: n/a

 
      07-23-2004, 07:25 AM
Juhan Leemet <(E-Mail Removed)> wrote in
news(E-Mail Removed) om:

> On Thu, 22 Jul 2004 11:13:34 -0500, James wrote:
>> Juhan Leemet <(E-Mail Removed)> wrote in
>> news(E-Mail Removed) om:
>>> On Wed, 21 Jul 2004 16:08:40 -0500, James wrote:
>>>> Hi, I have two Linux severs running Samba in different offices,
>>>> ...
>>>> The question is what is the best method sharing files between the
>>>> two servers. Samba, NFS...?
>>>
>>> Here's a suggestion, though...
>>> What about accessing these documents through a cacheing proxy web

> ...
> ...
> ...

...
> ...
> ...
> As you can imagine, this can be as big or as small as you want to make
> it.
>
> Personnally, I would look into subversion (for a quick overview of
> capabilities) and check if you are running apache2 or apache (1).
>
> p.s. My main consulting "gig" for the last few years has been mainly
> configuration management, and its use/implications in various
> scenarios: mainframe thru workstations to networked PCs (not much
> standalone anymore).
>



Hmm - so it all comes down to work flow within the company but with a solid
set of tools - and then relying on staff dicipline to work with them - I
like the idea of useing apache as the underlying technology along with
squid, I must do some reading on this to see what modules are available to
enhance the doc sharing.

Any idea whether rsync or rsync technology can be integrated into squid or
should I be looking for another caching proxy ?

thnx for all the ideas


James


 
Reply With Quote
 
Juhan Leemet
Guest
Posts: n/a

 
      07-23-2004, 05:28 PM
On Fri, 23 Jul 2004 02:25:06 -0500, James wrote:
> Hmm - so it all comes down to work flow within the company but with a solid
> set of tools - and then relying on staff dicipline to work with them - I
> like the idea of useing apache as the underlying technology along with
> squid, I must do some reading on this to see what modules are available to
> enhance the doc sharing.


Well, you can use squid, but I didn't. Apache can work as a proxy server
itself, i.e. serve out it's own pages, and also work as a proxy for other
web servers. Just enable the "proxy module" in the apache httpd.conf file.
Worked OK for me at at time when I was going out from my LAN to the
internet through a gateway machine (my own) and a dialup to my ISP. Squid
has been designed to be a proxy server. Never used it. Considered it, but
haven't done enough research, to figure out which it better:
- apache w. it's own proxy module
- apache & squid

If you do the research, would you let us know which is better, and why?
I suspect that squid might (should?) be better at managing big cache?

> Any idea whether rsync or rsync technology can be integrated into squid or
> should I be looking for another caching proxy ?


Well, rsync is something different, not related to proxy. The rsync
protocol is particularly efficient at transferring contents, because it
checks for changes (even withing files, I believe) by some kind of block
checksum scheme. It does take a bit of CPU. Slow machines don't like it.

If you have the two web servers at the 2 sites: call them www.aaa.site.com
and www.bbb.site.com, then when you use a proxy, you always refer to them
with the site (that "owns" the web content) but you setup your browser to
direct this request to the local proxy server. So, at site aaa, you setup
your browsers to point to www.aaa.site.com, and refer to
http://www.bbb.site.com/path/document to get a document from the other
side. Well, you know how a proxy works. If you use rsync to transfer
copies of the files, then you are NOT filling up the proxy cache, but are
creating duplicates on your site server. Then you would refer to them as
http://www.aaa.site.com/path/document (the copy). The proxy cache has it's
own special structure to make lookups more efficient. If you duplicate
documents with rsync then you have to manage the 2 copies and deal with
merging them if you have edit both of them, etc. Why duplicate?

You might decide to leave the documents "owned" by the 2 sites at the web
servers on the 2 sites, and avoid duplication (except in the proxy
caches). To "preload" your cache for the next day's workload, you could
run a script that simply browses those web pages overnight. Just
referencing them through the proxy server will force the server to get the
copy from the other side and put it into cache. If your cache is big
enough the documents will stay there all day (week? month?). Any random
reference to a document will of course take the full/slow download.

One idea might be to have huge cache, and use (off the top of my head,
some tool like) "wget -r" (recursive) wget, "throwing away" the retrieved
contents, just to "populate the cache". As documents are edited on either
side, cache contents will start to go stale. Might need periodic refresh.

Without knowing more about your application it's hard to recommend whether
to use (or not use) rsync and/or duplication of content. It might be that
a proxy web server could be enough for your purposes. Think through your
access patterns. Are they predictible? How many "random" requests? Can you
wait for them to get fetched? Of course, once you have fetched a "random"
request, it will be local, living in proxy cache. You can figure it out.

viz. webdav, I think it should do a "write through" to put any modified
content back on the "owning" web server (not the local proxy). This will
take time and use up bandwidth (and interfere with other communications)
but it is probably the "right thing to do" to manage the originals.

--
Juhan Leemet
Logicognosis, Inc.

 
Reply With Quote
 
James
Guest
Posts: n/a

 
      07-24-2004, 06:38 PM
Juhan Leemet <(E-Mail Removed)> wrote in
news(E-Mail Removed) om:

> On Fri, 23 Jul 2004 02:25:06 -0500, James wrote:
>> Hmm - so it all comes down to work flow within the

company but with a
>> solid set of tools - and then relying on staff dicipline

to work with
>> them - I like the idea of useing apache as the underlying

technology
>> along with squid, I must do some reading on this to see

what modules
>> are available to enhance the doc sharing.

>
> Well, you can use squid, but I didn't. Apache can work as

a proxy
> server itself, i.e. serve out it's own pages, and also

work as a proxy
> for other web servers. Just enable the "proxy module" in

the apache
> httpd.conf file. Worked OK for me at at time when I was

going out from
> You might decide to leave the documents "owned" by the 2

sites at the
> web servers on the 2 sites, and avoid duplication (except

in the proxy
> other communications) but it is probably the "right thing

to do" to
> manage the originals.
>


Okay - lots of work ahead :-) I'm going to look at apache's
proxy -
I've only ever used squid as a caching proxy to manage
regular internet
access so I'm quite keen to see how apache can assist.

Many thanks for the information and pointers.

I'll let you know how it all turn out

rgds


James
 
Reply With Quote
 
 
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
shared files etc Lez Pawl Home Networking 1 09-02-2006 08:10 AM
SHARED FILES ahamami Windows Networking 0 08-15-2006 09:35 AM
Cannot see shared folder/files on one PC Ian Roberts Wireless Networks 2 02-17-2006 10:11 PM
Can't Find Shared Files? IGGY4MAYOR Wireless Networks 1 06-17-2005 07:47 AM
Can i monitor shared files bry25_uk Wireless Networks 1 06-02-2005 08:31 PM



1 2 3 4 5 6 7 8 9 10 11