| Home | Register | Members | Search | Links |
![]() |
| Thread Tools | Display Modes |
|
|
|
| |
|
7
Guest
Posts: n/a
|
Johannes Petersson wrote:
> Hello > > I'm interested in how to setup a bunch of Linux computers as a > backup/storage entity in my company's network. > > So, let me start by explaining the situation. I'm working at a company > with about 200 employees and at least once every year there is a bunch > of people who get new workstations and their old ones are just left in > the basement for the mice to play with. I thought I could make > something usefull of these old workstations by turning them in to a > Linux storage cluster. > > For the moment I have about 20 computers ranging from Pentium 2 with > 20 GB hard drives and 128 MB RAM to Pentium 4 with 120 GB hard drives > and 512 MB RAM. (Pretty heterogenous right?) > > My primary goal with these computers is to turn them in to what would > seem like one large hard drive accessible through our LAN. Anyone got > any suggestions on how to do this? > > The secondary goals would then be to make sure that this large "hard > drive" is also RAIDed through out these 20 different computers. And > also since these computers are old (and might fail) I want to be able > to just plug another old/new computer in to the storage network to > either replace a broken computer or to extend the storage space > available. I suppose that if my primary goal is possible also these > secondary goals should be possible? But the big question is how? > > The reason I want to do this is so that we wont need to buy yet > another tape-robot for backup purposes. > > I've looked into different possible solutions but I'm not really sure > which way I have to go. It all sounds tempting, but practical points might change your directions a little bit. There are many different creative ways so here are some suggestions. 1. Using old drives and power supplies is a no no for corporate data. 2. Its cheap enough now to buy a headless PC for about $100 and stick a 200 Gb hard disk for another $100. 3. Alternaively, use the old PCs, but replace their old power supplies and hard disks. Either take them home or organise the management to sell them off to the employees. 4. You can run up something a relative light weight Linux like Knoppix LiveCD or Mepis LiveCD and install it onto hard disk. After creating some user accounts, you can allow users to log into the PC from windows using winscp or from linux using ssh or use konqueror url method fish://username@ipaddress. It requires hardly any setting up other than install from a LiveCD and entering the username. All the ssh stuff is built in into knoppix and mepis. 5. You can get vncserver working on the linux PC, you can operate it headless (i.e. without monitor or keyboard or mouse). From windoes you can use GPL software such as TightVNC to log into the linux box. From other linux boxes you just run up RDesktop in KDE or just plain old console and a remote ssh connection without the need for vncserver. With all that you can do backups and so on without having to be at the machine itself. 6. You can go on to enable samba for windows shares - but invariably that is prone to security problems and also hideos networking problems and configuration issues. While installing linux and doing up ssh can take two hours, and setting up 200 users might drain a day's effort, installing winscp might take another 1-2 days, if you insist on samba, you should allow 1 to 2 weeks to get it fully working. |
|
|
|
|
|||
|
|||
|
James Knott
Guest
Posts: n/a
|
Johannes Petersson wrote:
> My primary goal with these computers is to turn them in to what would > seem like one large hard drive accessible through our LAN. Anyone got > any suggestions on how to do this? > > The secondary goals would then be to make sure that this large "hard > drive" is also RAIDed through out these 20 different computers. And > also since these computers are old (and might fail) I want to be able > to just plug another old/new computer in to the storage network to > either replace a broken computer or to extend the storage space > available. I suppose that if my primary goal is possible also these > secondary goals should be possible? But the big question is how? > There's an article, in the latest Linux Journal, about "union". According to what I read, it might help. You'd also want to look into running logical volumes over a network. |
|
|
|
|
|||
|
|||
|
Peter T. Breuer
Guest
Posts: n/a
|
Johannes Petersson <(E-Mail Removed)> wrote:
> I'm interested in how to setup a bunch of Linux computers as a > backup/storage entity in my company's network. Fine. > So, let me start by explaining the situation. I'm working at a company > with about 200 employees and at least once every year there is a bunch > of people who get new workstations and their old ones are just left in > the basement for the mice to play with. I thought I could make > something usefull of these old workstations by turning them in to a > Linux storage cluster. Fine. But why a cluster? Can't you just use them individually as backup? They don't seem to need to be available in real time, so why bother clustering? No failover needed. > For the moment I have about 20 computers ranging from Pentium 2 with > 20 GB hard drives and 128 MB RAM to Pentium 4 with 120 GB hard drives > and 512 MB RAM. (Pretty heterogenous right?) Huge. There's no good way to make use of such very heterogenous sizes efficiently and safely. The best thing to do would be to take the drives of similar size, say 5 of 120GB, and put them into one machine as a RAID5 device. Do the same with those of size 60GB, and the same with those of size 40GB, and the same with those of size 20GB. At that point you can start thinking about how to arrange them into pairs of failover devices in a cluster, if you really wanted to, but I don't see the point. > My primary goal with these computers is to turn them in to what would > seem like one large hard drive accessible through our LAN. Anyone got > any suggestions on how to do this? Well, the canonical way. but what's the difficulty and WHY? It's crazy. > The secondary goals would then be to make sure that this large "hard > drive" is also RAIDed through out these 20 different computers. And Crazy. > also since these computers are old (and might fail) I want to be able > to just plug another old/new computer in to the storage network to > either replace a broken computer or to extend the storage space > available. I suppose that if my primary goal is possible also these > secondary goals should be possible? But the big question is how? You can't - it's a crazy idea. While a RAID5 of 20 disks is safer against disaster than a linear aggregate of the 20, it's chances of going down per day are 19p^2/2, as opposed to 20p, where p is the probability of one disk dying per day. So it's about 1/p times less likely to die on any particular day. HOWEVER, that forgets that failures are not independent - usually heat or spike related. The real probabilities are higher. And you also have a much higher absolute probability of failure per se, simply because you are using 20 disks instead of one. And then there are the network brownouts ... All in all, you will be pushing water uphill. It doesn't bear thinking about. If you really want to raid5 together 20 different networked block devices, you'd be the first! Even in data processing clusters, the local raid topology is usally a quad or triple (1 local device and 2 or 3 remote devices forming a raid5), and the whole toplogy of the nodes is a torus or something similar. Thus each node would serve out one raid5 device and import two or three raid5 devices from neighbours, each of the latter comprising at least one of its own local disks. > The reason I want to do this is so that we wont need to buy yet > another tape-robot for backup purposes. You don't need to anyway. Using a few spare ide disks as backup medium is a fine and normal thing to do. > I've looked into different possible solutions but I'm not really sure Like? > which way I have to go. > > - I've looked into setting up all the computers as a openMosix cluster > with oMFS (openMosix File System) but I don't know if this actually This is not what you want - that sort of thing is for computation, not storage. > does what I want? From what I've read it seems as if I could get my > primary goal fullfilled with this but probably not my secondary? Goals? > - I've also looked at iSCSI but that only seems to work on one > computer with all discs local in that machine? Well, it's a transport - nothing much to do with anything else. Yes, it's used in fibre, which sometimes is to disks. > - Then there is the possibillity of using some sort of Logical Volume > Management System, like EVMS or LVM, maybe EVMS combined with a > cluster is needed? That would be over the TOP of the raid device you make. Forget it for now. > As you can see I've tried to find different solutions but I'm not I don't see ANY real solutions listed yet ![]() > really sure which way to go, so I'm hoping that someone out there has > experience from this and can help by pointing me in the right > direction (so I don't need to try them all). Everyone has experience of it. It's normal. What's the problem? If you really don't know what you are looking for, and it looks like you don't, go to the linux high avalilabilty pages and look it all up. I can't recommend solutions myself, as I am the author of several. Peter |
|
|
|
|
|||
|
|||
|
Johannes Petersson
Guest
Posts: n/a
|
7 <(E-Mail Removed)> wrote in message news:<6RHod.59000$(E-Mail Removed) k>...
> Johannes Petersson wrote: > > I'm interested in how to setup a bunch of Linux computers as a > > backup/storage entity in my company's network. > > > > So, let me start by explaining the situation. I'm working at a company > > with about 200 employees and at least once every year there is a bunch > > of people who get new workstations and their old ones are just left in > > the basement for the mice to play with. I thought I could make > > something usefull of these old workstations by turning them in to a > > Linux storage cluster. > > > > For the moment I have about 20 computers ranging from Pentium 2 with > > 20 GB hard drives and 128 MB RAM to Pentium 4 with 120 GB hard drives > > and 512 MB RAM. (Pretty heterogenous right?) > > > > It all sounds tempting, but practical points might change your > directions a little bit. There are many different creative ways > so here are some suggestions. > > 1. Using old drives and power supplies is a no no for corporate data. > Yes I've been thinking about this as well, that is why I stated in my original post that I wanted to raid these computer over the network. This way the data would be safe even is one hard drive or power supply failed. > 2. Its cheap enough now to buy a headless PC for about $100 and stick > a 200 Gb hard disk for another $100. > Again true, but this would also be considered a hobby project for myself, and I really think it would be nice if the old computers could come into some use. > 4. You can run up something a relative light weight Linux like Knoppix > LiveCD or Mepis LiveCD and install it onto hard disk. After creating some > user accounts, you can allow users to log into the PC from windows > using winscp or from linux using ssh or use konqueror url method > fish://username@ipaddress. It requires hardly any setting up other than > install from a LiveCD and entering the username. All the ssh stuff is built > in into knoppix and mepis. > > 5. You can get vncserver working on the linux PC, you can operate > it headless (i.e. without monitor or keyboard or mouse). From windoes > you can use GPL software such as TightVNC to log into the linux box. > From other linux boxes you just run up RDesktop in KDE or just plain old > console and a remote ssh connection without the need for vncserver. > With all that you can do backups and so on without having to be at the > machine itself. > > 6. You can go on to enable samba for windows shares - but invariably > that is prone to security problems and also hideos networking problems > and configuration issues. > 4,5 and 6 doesn't really have anything to do with my original question. Since I am interested in setting up a storage cluster and not a generic user accessible server, I don't mind if it takes me 5 weeks to configure the computers or the user accounts. Anyway, thanks for your reply. Sincerely Johannes Petersson |
|
|
|
|
|||
|
|||
|
Johannes Petersson
Guest
Posts: n/a
|
(E-Mail Removed) (Peter T. Breuer) wrote in message news:<745c72-(E-Mail Removed)>...
> Johannes Petersson <(E-Mail Removed)> wrote: > > I'm interested in how to setup a bunch of Linux computers as a > > backup/storage entity in my company's network. > > Fine. > > > So, let me start by explaining the situation. I'm working at a company > > with about 200 employees and at least once every year there is a bunch > > of people who get new workstations and their old ones are just left in > > the basement for the mice to play with. I thought I could make > > something usefull of these old workstations by turning them in to a > > Linux storage cluster. > > Fine. But why a cluster? Can't you just use them individually as > backup? They don't seem to need to be available in real time, so why > bother clustering? No failover needed. > Yes, failover is needed since I want to be able to replace one failing computer without taking the whole backup solution down and manually rescuing the data in the failing computer. > > For the moment I have about 20 computers ranging from Pentium 2 with > > 20 GB hard drives and 128 MB RAM to Pentium 4 with 120 GB hard drives > > and 512 MB RAM. (Pretty heterogenous right?) > > Huge. There's no good way to make use of such very heterogenous sizes > efficiently and safely. The best thing to do would be to take the drives > of similar size, say 5 of 120GB, and put them into one machine as a > RAID5 device. Do the same with those of size 60GB, and the same with > those of size 40GB, and the same with those of size 20GB. > Okay, this suggestion is probably a good idea, and I will try to make this happen. > At that point you can start thinking about how to arrange them into > pairs of failover devices in a cluster, if you really wanted to, but I > don't see the point. > This is also very interesting. This is where openMosix once again comes in or? The point of this when the computers already have hard drives internaly raided might not be that big. But the original point was not to have raided hard drives in each computer but to set up all the computers to raid between each other in the network / cluster. > > My primary goal with these computers is to turn them in to what would > > seem like one large hard drive accessible through our LAN. Anyone got > > any suggestions on how to do this? > > Well, the canonical way. but what's the difficulty and WHY? It's crazy. > Why is it crazy? It would give us several terrabytes of storage to basically zero cost (the only cost would be my monthly salary, which is basically zero! . What do you mean with the cannonical way, haveeach of the computers to map the subsequent one with for example NFS? > > The secondary goals would then be to make sure that this large "hard > > drive" is also RAIDed through out these 20 different computers. And > > Crazy. > Well, I've never been known to be very sane. > > also since these computers are old (and might fail) I want to be able > > to just plug another old/new computer in to the storage network to > > either replace a broken computer or to extend the storage space > > available. I suppose that if my primary goal is possible also these > > secondary goals should be possible? But the big question is how? > > You can't - it's a crazy idea. While a RAID5 of 20 disks is safer > against disaster than a linear aggregate of the 20, it's chances of > going down per day are 19p^2/2, as opposed to 20p, where p is the > probability of one disk dying per day. So it's about 1/p times less > likely to die on any particular day. HOWEVER, that forgets that > failures are not independent - usually heat or spike related. The real > probabilities are higher. And you also have a much higher absolute > probability of failure per se, simply because you are using 20 disks > instead of one. And then there are the network brownouts ... > I'm not talking about that every computer should hold a copy of every other, I would want them to be raided maybe three ways within the network nodes. So if one computer fails it would be possible to detach it and replace it on the fly. > If you really want to raid5 together 20 different networked block > devices, you'd be the first! Even in data processing clusters, the > local raid topology is usally a quad or triple (1 local device and 2 > or 3 remote devices forming a raid5), and the whole toplogy of the nodes > is a torus or something similar. Thus each node would serve out one > raid5 device and import two or three raid5 devices from neighbours, > each of the latter comprising at least one of its own local disks. > This is basically what I'm talking about, and the way to solve this would be? Using NFS or samba or what to make the nodes import / export to eachother and then manually run rsync? There has to be a better way than this that is always up to date and always keeps the computers in sync with each other?? > > I've looked into different possible solutions but I'm not really sure > > Like? > Like the ones that I line up below! > > which way I have to go. > > > > - I've looked into setting up all the computers as a openMosix cluster > > with oMFS (openMosix File System) but I don't know if this actually > > This is not what you want - that sort of thing is for computation, not > storage. > Okay, so what is it that I want? I would call it a storage cluster but how do I set that up and what software / kernel patches do I need? > > does what I want? From what I've read it seems as if I could get my > > primary goal fullfilled with this but probably not my secondary? > > Goals? > If you have ready my message in full you probably have seen the beginning of it where I list one primary goal and two secondary goals. That's the goals I'm reffering to. > > - Then there is the possibillity of using some sort of Logical Volume > > Management System, like EVMS or LVM, maybe EVMS combined with a > > cluster is needed? > > That would be over the TOP of the raid device you make. Forget it for > now. > Well, I'm not really sure I should forget it since I want to make my network or cluster in to the raid device and then I should probably put EVMS or LVM on top of that. > > As you can see I've tried to find different solutions but I'm not > > I don't see ANY real solutions listed yet ![]() > That's why I said that I've _tried_ to find different solutions. I have tried but I havn't found any good solution. That is why I asked the questions here. > Peter Well, thank you for your time and your replies. Sincerely Johannes Petersson |
|
|
|
|
|||
|
|||
|
Peter T. Breuer
Guest
Posts: n/a
|
Johannes Petersson <(E-Mail Removed)> wrote:
> (E-Mail Removed) (Peter T. Breuer) wrote in message news:<745c72-(E-Mail Removed)>... > > Fine. But why a cluster? Can't you just use them individually as > > backup? They don't seem to need to be available in real time, so why > > bother clustering? No failover needed. > > > Yes, failover is needed since I want to be able to replace one failing > computer without taking the whole backup solution down and manually > rescuing the data in the failing computer. But why FAILOVER of the data? Why not just two copies of the data? You can move the IP address automatically from one node to the other if you like, but I see no need for realtime update of the data between them. Just two copies, done daily. You can copy one from the other at daily intervals too, and hence have two days of backups online, or keep the even days on one machine and the odd days on another machine, and thus have twice the extension of backups available, only at most missing out one day if one machine goes down. I really don't see any need for realtime failover of the data. > > Huge. There's no good way to make use of such very heterogenous sizes > > efficiently and safely. The best thing to do would be to take the drives > > of similar size, say 5 of 120GB, and put them into one machine as a > > RAID5 device. Do the same with those of size 60GB, and the same with > > those of size 40GB, and the same with those of size 20GB. > > > Okay, this suggestion is probably a good idea, and I will try to make > this happen. > > > At that point you can start thinking about how to arrange them into > > pairs of failover devices in a cluster, if you really wanted to, but I > > don't see the point. > > > This is also very interesting. This is where openMosix once again What on earth has openMosix got to do with anything? I don't know where you get that word from. You are not doing computation, which is what Mosix is for. > comes in or? The point of this when the computers already have hard > drives internaly raided might not be that big. But the original point > was not to have raided hard drives in each computer but to set up all > the computers to raid between each other in the network / cluster. There is no sense to that, and if you were to do it, it would not follow that design! > > Well, the canonical way. but what's the difficulty and WHY? It's crazy. > > > Why is it crazy? It would give us several terrabytes of storage to > basically zero cost (the only cost would be my monthly salary, which Not the way you are doing it it wouldn't. Please take notice instead of skipping the explanation. > is basically zero! . What do you mean with the cannonical way, haveThe normal way! Form raid arrays on the nodes, and do the backup to alternate backup nodes on alternate days. Failover the floating IP for recoveries from one to the other if you must, but I really don't see why. It's no big deal to lose a day of backups if you also have the previous day still available, and the next day. If you were that paranoid you would be spooling the backups to tape in the background! If you really wanted to form networked raid arrays, you would do it in pairs, forming from two computers with two partitions two raided devices - say a mirror - one on each node, each with one local component and one remote component. The nodes would failover the IP and do slightly complicated things to present the appropriate local device for replacement export in case the other node fails. > each of the computers to map the subsequent one with for example NFS? ?? No, although you can export the raided devices via nfs if yu wanted to. > > You can't - it's a crazy idea. While a RAID5 of 20 disks is safer > > against disaster than a linear aggregate of the 20, it's chances of > > going down per day are 19p^2/2, as opposed to 20p, where p is the > > probability of one disk dying per day. So it's about 1/p times less > > likely to die on any particular day. HOWEVER, that forgets that > > failures are not independent - usually heat or spike related. The real > > probabilities are higher. And you also have a much higher absolute > > probability of failure per se, simply because you are using 20 disks > > instead of one. And then there are the network brownouts ... > > > I'm not talking about that every computer should hold a copy of every > other, I would want them to be raided maybe three ways within the Three ways? You mean three mirror copies, or raid5 with three components? > network nodes. So if one computer fails it would be possible to detach > it and replace it on the fly. Oh - well, that's a little more sensible and is a classic configuration. In that configuration the three nodes each have three partitions. They raid together two remote partitions (one each from the other two nodes) and one local partition to form a single raid5 partition which they export (via nfs, for example). Failover is complicated, however. I don't think I want to describe it! I don't think I even want to think about it. It's insane because of its dependence on the net to even begin to function normally. You'd get more mileage out of raiding (raid5) two local partitions and one remote partition on each node, but then you need four local partitions per node, if I count correctly. Each node has to supply one extra partition to each of the two other remote nodes, as well as bind two of its own partitions into its own raid5. Four in total, locally. And when I say partition, you can take that as "disk", if you prefer. That configuration would be sensible. The other is nonsensical because losing the net ruins all the raids. This configuration survives a network dropout without any problem. > > local raid topology is usally a quad or triple (1 local device and 2 > > or 3 remote devices forming a raid5), and the whole toplogy of the nodes > > is a torus or something similar. Thus each node would serve out one > > raid5 device and import two or three raid5 devices from neighbours, > > each of the latter comprising at least one of its own local disks. > > > This is basically what I'm talking about, and the way to solve this > would be? See above. > Using NFS or samba or what to make the nodes import / export > to eachother and then manually run rsync? There has to be a better way Eh? Oh - don't you know about network block devices? There is nothing to "solve". > than this that is always up to date and always keeps the computers in > sync with each other?? You don't care abut them being in sync - it's not a realtime application. > > > - I've looked into setting up all the computers as a openMosix cluster > > > with oMFS (openMosix File System) but I don't know if this actually > > > > This is not what you want - that sort of thing is for computation, not > > storage. > > > Okay, so what is it that I want? I would call it a storage cluster but > how do I set that up and what software / kernel patches do I need? Just the normal stuff. Look at the high availability pages. > > > - Then there is the possibillity of using some sort of Logical Volume > > > Management System, like EVMS or LVM, maybe EVMS combined with a > > > cluster is needed? > > > > That would be over the TOP of the raid device you make. Forget it for > > now. > > > Well, I'm not really sure I should forget it since I want to make my > network or cluster in to the raid device and then I should probably > put EVMS or LVM on top of that. Sure. Forget it for now. It'll go on top of the raid devices. Peter |
|
|
|
|
|||
|
|||
|
Mike
Guest
Posts: n/a
|
On Tue, 23 Nov 2004 17:29:27 +0100, Peter T. Breuer wrote:
> Johannes Petersson <(E-Mail Removed)> wrote: >> I'm interested in how to setup a bunch of Linux computers as a >> backup/storage entity in my company's network. > Have you read http://dcwww.camp.dtu.dk/cluster-howto.html Seems pretty comprehensive and clear. |
|
|
|
|
|||
|
|||
|
|
|
| |
![]() |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Our high end SQL server cluster is maxed out, how else to expand? Will it help to move tables off of the cluster onto other clusters or will that just create processing bottleneck on the cluster running SQL server? | Daniel | Windows Networking | 0 | 07-20-2007 07:02 PM |
| Linux Network Cluster/Load Balancing | MattD | Linux Networking | 0 | 02-08-2007 10:05 PM |
| Linux cluster question | tom | Linux Networking | 2 | 06-23-2006 07:39 AM |
| help with setting up Linux cluster with RedHat Adv. Server | Hung Ngoc Lai | Linux Networking | 0 | 12-02-2003 11:22 PM |
| Linux Cluster over Ethernet | John | Linux Networking | 0 | 08-26-2003 09:23 AM |
Forum Software Powered by vBulletin®, Copyright Jelsoft Enterprises Ltd.
SEO by vBSEO 3.3.2 ©2009, Crawlability, Inc. |



Linear Mode

