random1.jpg

Login Form






Passwort verloren?

Meteohub

Lentföhrden
Germany

Vorhersage
Icons by Roman Attinger

Temperatur: 9.3°C
Luftdruck: 999.2hPa
Wind: km/h
Richtung:
Gefühlte Temp.: °C

powered by Meteohub

File Server Drucken E-Mail
Geschrieben von Administrator   
14.03.2006

ImageThe file server is a standard linux PC with a lot of harddrives. Based on a regular CS-901 tower I did some modifications inside to get the needed harddrives installed. They are tied together as a software raid 5 array for data security reasons. Apart from this raid array the server has an additional harddrive that holds the linux and a DVD-ROM for software installation purposes. Actually, the server is running SuSE 8.2 which supports software raid very well.

The file server is the central piece of gadget in the media network. When setting up a file server, that has a primary role of hosting your private media content, there are some objectives to be met:

  • Capacity should be in the range of tera bytes. As linux kernel 2.4 does support up to 2TB this is the limit to get as close as possible to.
  • Keep it as cheap as possible. This means in detail
    • mature but stable motherboard, ram and cpu.
    • cheap IDE drives, no scsi, no sata
    • no pricy hardware raid cards, go with linux software raid instead, based on additional commodity pci ide controller cards
    • using hard disks that give most "iGB per Euro". In June 2004 the 160GB PATA disks give the most "bang for the buck".
    • no high-end server tower, go with modified commodity big tower instead
    • no power supply redundancy, in the rare case of a power supply error the server will be down for a while - so what?

By following these restrictions (no invest in professional throughput and availability) you can build a tera byte monster without being Rockefeller. But keep in mind, this file server has less in common with high performance file servers in enterprise environments.

1. How much punch is needed?

To provide power for the mainboard, processor, and the included drives, I use a 350 W Enermax power supply (EG365AX), that can deliver up to 26A on the 12V branch. This gives a good reserve even during simultanously spinning up the drives. The heart of the file server is an MSI 815E PRO motherboard that provides 6 PCI slots. The server is powered by a 800 MHz Pentium III processor with 384 MB of SDRAM. Compared against today's systems this might look a bit outdated but it performs well if you don't need high performance in file serving. With this low performance setup you gain a low power consumption and less heat problems, which are both welcome to me.

2. Setting up the raid

I assume that you are familiar with raid levels and do understand why raid 5 (raid 6 would be better but isn't there for actual linux distributions) is the right thing to do, when you are going to bundle a lot of single drives to make up a big one.

2.1 How many drives to take?

If you go for raid 5 and you are looking forward to get close to 2 TB by stacking 160 GB disks you have to do some math to check out how many disks you need. It is quite obvious that 14 disks, with each drive providing 160GB, will result in about 13 * 160 GB = 2080 GB. So it looks like 14 disks might be right. But you have to do a more detailed calculation to check if they are really below the 2 TB threshold, which is the ultimate limit of what the linux 2.4 kernel can handle.

I decided to use the Samsung SP1604N hard drives (considerably fast, cool and quiet, 3 years Samsung warranty). These deliver 312,581,808 blocks á 512 bytes which is 160,041,885,696 bytes. When you pack 14 of these babies to a raid 5 array you get 13 * 160,041,885,696 bytes = 2,080,544,514,048 bytes of usable capacity. Having a look at the 2 TB limit which equals to 2 ** 41 -1 = 2,199,023,255,551 shows that we stay about 100 GB below the filesystem limit. 15 drives of this type would blow the limit. Image

The total number of disks in the system is 14 for the raid array plus one system hard disk plus one dvd rom, which sums up to 16 IDE drives. The motherboard provides 2 IDE channels that can handle 4 drives in total. Therefore, we need 3 additional PCI budget IDE controller. Each of these provides two additional IDE channels. Having 8 IDE channels communicating on a 33 MHz 32 bit PCI bus does definitely max it out. As a result some of the DMA requests that run through the PCI bus will be lost when the server is doing though work on the raid (in the case of a sync, for example). Seeing these messages in the /var/log/warn did alarm me in the beginning.

Jul 3 19:58:07 thunder kernel: hde: timeout waiting for DMA
Jul 3 21:07:41 thunder kernel: hdj: timeout waiting for DMA

Having an eye on that it turned out that these errors can be considered harmless, as the DMA protocoll simply does a retry with the cancelled operation and that's it. If you don't read the logs, you won't even know about it. To get these DMA time outs completely banned you could go with motherboards that provide more capable PCI busses und IDE controller cards that support these busses - both very expensive. Another way to get rid of the DMA time outs would be to reduce the number of active IDE channels to about 6, which means to use only 2 additional PCI IDE controller cards. This time you loose capacity or you have to go with bigger disks which on the down side have a less competetive "GB per Euro" ratio than the 160 GB ones. I decided to keep my money and to live with the harmless DMA timeout messages in the logs. The /etc/raidtab which is used to initalize the array looks like this:

raiddev /dev/md0 Image
raid-level 5
nr-raid-disks 14
nr-spare-disks 0
persistent-superblock 1
chunk-size 256
parity-algorithm left-symmetric
device /dev/hdc1
raid-disk 0
device /dev/hdd1
raid-disk 1
device /dev/hde1
raid-disk 2
device /dev/hdf1
raid-disk 3
device /dev/hdg1
raid-disk 4
device /dev/hdh1
raid-disk 5
device /dev/hdi1
raid-disk 6
device /dev/hdj1
raid-disk 7
device /dev/hdk1
raid-disk 8
device /dev/hdl1
raid-disk 9
device /dev/hdm1
raid-disk 10
device /dev/hdn1
raid-disk 11
device /dev/hdo1
raid-disk 12
device /dev/hdp1
raid-disk 13

As you can see I don't use a spare disk (no free IDE channel). So if one of the disks failes and the raid 5 is going into degraded mode I have to hurry up to get the bad one replaced. If there is a spare disk the system would automatically integrate the spare disk into the array in order to get out of degraded mode as fast as possible. Reason is that in degraded mode you will loose the complete content of the array when one of the remaining disk fails. As the propability of two drives failing short time one after another is not too big, I decided to life with the risk. Instead of having a spare disk I implemented a raid check utility that gives me an e-mail notification when the raid array has fallen into degraded mode.

This is what cat /proc/mdstat tells me about the raid array:

Personalities : [raid5]
read_ahead 1024 sectors
md0 : active raid5 hdc1[0] hdd1[1] hde1[2] hdf1[3] hdg1[4] hdh1[5] hdi1[6] hdj1[7] hdk1[8] hdl1[9] hdm1[10] hdn1[11] hdo1[12] hdp1[13]
2031747328 blocks level 5, 256k chunk, algorithm 2 [14/13] [UUUUUUUUUUUUUU]

The script that detects a degraded raid 5 simply checks this output for having one of the Us replaced by an underscore, which indicates that the corresponding disk had failed and was dismissed from the array.

#!/bin/sh
#
# script that checks for Raid Errors
if (cat /proc/mdstat | grep "blocks" | grep "_" >/dev/null 2>&1)
then echo "RAID ERROR"; echo "----------"; cat /proc/mdstat;
fi

My crontab calls this script once an hour. If the raid is ok, no output is generated. If there are disks failing the /proc/mdstat information is mailed to user root.

Haing the raid running doesn't mean you already have the disk space ready to use. When bringing a filesystem to the /dev/md0 raid device I propose that you go with ext3 (ext2 with journaling: mke2fs option -j). Journaling is making perfect sense, because a filesystem check can take plenty of time on a 2 TB device. Using ext3 does avoid filesystem checks in most cases like power failures, system crashes, etc. To further reduce the occasions when a complete filesystem check has to be done, I set the maximum mount counter to zero (tune2fs option -c 0) which forces not to count mounts in order to initiate a filesystem check on the n-th mount. Normally mke2fs is taking about 5% of the disks capacity as a spare for root processes. This does not make sense for a drive of that huge capacity which isn't a root drive. 5% of 2 TB are roughly 100GB which is much to much for any system logs etc you can think of. Therefore, I set the spare to 1%, which is plenty enough (tune2fs option -m 1). A drive of 2 TB will typically hold a lot of large files but not miriades of small ones. To take this into account I reduced the number of inodes which are reserved for the filesystem to 5.000.000 which is enough to hold about 5 million files, but much below the automatic calculation of mke2fs. Having done all this and making df will show you 2 TB of storage on the mounted raid array.

Filesystem 1K-blocks Used Available Use% Mounted on
/dev/md0 2031179688 159317244 1851544976 8% /share/media

If you get confused by the big numbers, simply go with SI units by calling df --si. This time it is telling me that I have about 2.1 TB to play with.

Filesystem Size Used Avail Use% Mounted on
/dev/md0 2.1T 164G 1.9T 8% /share/media

2.2 Sixteen IDE drives in a conventional big tower

A Chieftec CS-901 provides plenty of room but it isn't prepared from stock to give 16 disks a home. So it needs a little craftsmanship to mount the drives inside. The picture gives you an impression how the drives are located.

Image

Disks hda/hdc/hdd are mounted as a module that flips into the top area of the 5,25" drive bay. Directly below that is another group of disks, namely hde/hdf/hdg/hdh, that builds a second module (notice, picture below has disk hdh missing). The two drive cages of the lower part of the housing take 3 drives each (hdi/hdj/hdk, hdl/hdm/hdn). The drive cage at the bottom is expanded to hold another two drives, namely hdo/hdp.

Summing that up there is room for 15 drives plus the dvd-rom which is located at the very top of the 5.25" drive bay. As you see it is getting a bit crowded inside, but you still can take out the drive modules and cages quite easily. The material that is used for drive cage extension and for building the drive modules is some 1mm aluminium sheet metal, which is a charme to drill and saw. The 3.5" harddrive modules are extended by usual spacer parts that allow to mount these drives into a 5.25" bay. The flip-in rails are part of the equipment of the CS-901 housing. So all you need is some aluminium sheet metal and some "3.5 to 5.25" spacers to get all these drives well into a standard CS-901 big tower.

Image  Image

 

To be honest you need some more parts to get it work. I don't kow of any power supply that has connectors for 16 drives. So you need some Y-cables which split a single power connector into two. It is obvious that you need 8 IDE cables as well. Needless to say, that you should go with the round ones. Otherwise the connection of the IDE channels will be a complete mess. I tried this in the first time, but had no fun in terms of system stability and airflow - not to talk about how awfull that looked.

Don't forget to give the drives and the system an appropriate cooling. The upper part of the case is cooled by two 80mm fans in the front panel.

Image

The lower part has an 80mm fan per drive cage and an additional 80mm fan for cooling the two drives at the very bottom. This produces some serious airflow which adds up to the sound stage of 15 IDE hard disks. Having the server located in the utility room this isn't a problem, but the server significantly helps to warm up the room, which is nice in winter but is regarded less welcome during a warm summer. However, you shouldn't underestimate the amount of heat a server running 24/7 adds to a room.


Letzte Aktualisierung ( 14.03.2006 )
 
< Zurück   Weiter >
© 2008 Pasternak Homepage
Joomla! is Free Software released under the GNU/GPL License.