GlusterFS and Virtualization

A new friend tipped me off to glusterfs which is a distributed file system for linux.   With the market quickly shifting to hyper-converged solutions I find myself revisiting software defined storage as a possible solution.  Each day I am confronted with new problems caused by the lack of agility in storage systems.  Monolithic arrays seem to rule my whole world.  They cause so many problems with disaster recovery… a vendor friend once told me it’s nothing 300k in software licences cannot solve.   This is the number one problem with storage vendors… they are not flexible and have not made any major advances in the last twenty years.   Don’t get me wrong there are new protocols (iSCSI, FCoE) and new technologies (dedupe, VAII etc..) but at the end of the day the only thing that has really changed is capacity and cache sizes.  We have seen SSD improve performance pushing the bottleneck to the controllers but it’s really the same game.   It’s an expensive rich man’s club where disaster recovery costs millions of dollars.  Virtualization has changed that market a little… a number of companies are using products of Zerto to replication between long distances for disaster recovery.   There are a number of software based replication solutions for virtualization (vSphere replication every 15 minutes, Veeam etc..)  and they solve one market.   What I really want is what google has distributed and replicated file systems.   My perfect world would look something like this:

  • Two servers at two different datacenters
  • Each having live access to the same data set
  • Read and write possible from each location
  • Data is stored at each location so no part of the server requires the other site
  • Self healing when a server or site is unavailable

Is this possible?  Yes and lots of companies are doing this using their own methods.  GlusterFS was brought by RedHat last year and turned into RedHat Storage Server.   In fact RedHat has bought at least three companies in the last year that provide this type of distributed replicated storage system.  This has been a move to create a standardized and support backend for swift (openstack storage bricks).   Thanks to RedHat we can expect more from glusterfs in the future.  Since I play around with VMware a lot I wanted to try using glusterFS as a backend for VMFS via NFS.  This would allow me to have a virtual machine live replicated to another site using glusterfs to replicate.   Nearly no data loss when a site goes down.  In a DR situation there are VMFS locks and resignatures that have to take place but for now I was just interested in performance

Install GLUSTERFS

Setting up glusterfs is really easy.   We are going to setup a three node replicated volume.

Enable EPEL to get the package:

   wget -P /etc/yum.repos.d http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/glusterfs-epel.repo

Install glusterfs on each server

   yum install glusterfs-server -y

Start glusterfs

   service glusterd start

Check Status

   service glusterd status

Enable at boot time

   chkconfig glusterd on

Configure SELinux and iptables

SeLINUX can be a pain with free gluster… you can figure out the rules with the troubleshooter and work it out or run in disabled or permissive.  IPtables should allow 100% communication between cluster members any network firewall should have similar rules.

Create the trusted pool 

server 1 - gluster peer probe server2
server 2 - gluster peer probe server1
server 1 - gluster peer probe server3

Create a Volume

Mount some storage and make sure it’s not owned by root – storage should be the same size on each node.

  mkdir /glusterfs
  mount /dev/sd1 /glusterfs

On a single node create the volume called vol1
  gluster volume create vol1 replica 3 server1:/glusterfs server2:/glusterfs server3:/glusterfs
  gluster volume start glusterfs

Check Status

  gluster volume info

Mouting

Gluster uses NFS natively as part of it’s process so you can use the showmount command to see gluster mounts

  showmount -e localhost

You can also mount from any node using NFS (or mount then share out).    I recommend if you are going to write locally mounting it locally using glusterfs:

mount -t glusterfs server1:/vol1 /mnt

Mounting in VMware
You can mount from any node and gain the glusterfs replication.  But if this node goes away then you will not have access to storage.  In order to create a highly available solution you need to implement linux heartbeat with VIP’s or use a load balancer for NFS traffic.   (I may go into that in another article).  For my tests a single point of failure was just fine.

 

Tests

I wanted to give it a try so I setup some basic test cases in all these cases except the remote in #1 the same storage system was used:

  1. Setup a three node glusterfs file system on linux virtual machines who will then serve the file system to ESXi.   Two nodes will be local and one node is remote (140 miles away across a 10GB internet link).
  2.  Setup a virtual machine to provide native NFS to ESXi as a datastore all local
  3. Native VMFS from Fiber Channel SAN
  4. Native physical server access to Fiber channel SAN

In every case I used a virtual/physical machine running RedHat Linux 6.5 x64 with 1GB of RAM and 1CPU.

 

Results

I used two test cases to test write and read.  I know they are not perfect but I was going for a rough idea.  In eash case I took an average of 10 tests done at the same time.

Case 1

Use dd to write 1Gb of random data to file system and ensure it is synced back to storage system.  Sync here is critical it avoids skew from memory caching of writes.  The following command was used:

dd if=/dev/urandom of=speedtest bs=1M count=100 conv=fdatasync

Here are the numbers:

  1. 5.7 MB/s
  2. 5.6 MB/s
  3. 5.4 MB/s
  4. 8.2 MB/s

In this case only direct SAN showed a major improvement over all other virtual test cases.  gluster performed pretty well..

 

Case 2

Timed Cache reads using the hdparm command in linux.   This has a number of issues but it’s the easiest way to diagnose reads:

  1. 6101
  2. 5198
  3. 8664
  4. 14614

 

End result… oddly reads are a lot faster when using native VMFS and direct SAN.

Summary of results

My non-exhaustive testing proves that it’s possible to use glusterfs as a backend for VMFS taking advantage of gluster replication to live replicate between sites.   There are a lot more tests I would want to perform before I ever consider this a possible production solution but it’s possible.  (Bandwidth usage for this test was low.. 100mb or less the whole pipe)  I am concerned what happens to glusterfs when you have many virtual machines running on a volume it may kill the virtual machine.   I would love to repeat the test with some physical servers as the gluster nodes and really push the limit.   I would also like to see gluster include some features like SSD for caching on each server.  I could throw 1.6 TB’s of SSD in each server and really fly this solution.    There are other methods for gluster replication like geo-replication which was not tested.  Let me know what you think… or if you happen to have a bunch of servers you want to donate to my testing  :).    Thanks for reading my ramblings.

Create a ISO datastore with CentOS

Morning,

This came up in a discussion in the vmware forums and I figured I would put it all down.  The user wanted to be able to have his ISO’s for VMware and Windows shared and wanted to know how to do it from Vmware.  Well it’s not possible from VMware because it cannot be a NFS server to share out VMFS.  But VMware does support NFS storage so with CentOS (RedHat / OracleLinux it will work the same) you can create a shared NFS mount that can also be mounted via CIFS to Windows.

So I am going to Assume you know how to install Linux if not download and click next…next…next. Once installed login as root and make sure you have networking.

Install and lock down NFS

Code:

yum install nfs -y

Secure the install of NFS:

add the following to /etc/hosts.deny (Will block everyone access to NFS services)

portmap: ALL
lockd: ALL
statd: ALL
mountd: ALL
rquotad: ALL

Add hosts that are allowed to connect to NFS to /etc/hosts.allow each Ip with an or

portmap: 10.10.101.10 or 10.10.101.11
lockd: 10.10.101.10 or 10.10.101.11
statd: 10.10.101.10 or 10.10.101.11
mountd: 10.10.101.10 or 10.10.101.11
rquotad: 10.10.101.10 or 10.10.101.11

The exported file system is the file system you want to share out we will use /nfs/ISO in this example it can be anything.  I would make it a different partition and potentially LVM but that’s out of scope.  Edit /etc/exports and add the servers you want to be able to mount /nfs/ISO notice I made 10.10.101.11 read only (ro) and 10.10.101.10 read write (rw)

# Sample
/nfs/ISO 10.10.101.10(rw,no_root_squash)
/nfs/ISO 10.10.101.11(ro,no_root_squash)

Now we need to lock down NFS to specific ports to make it more firewall friendly.   Edit /etc/sysconfig/nfs and add the following lines (make sure to comment out these lines if already in use)

STATD_PORT=4000
LOCKD_TCPPORT=4001
LOCKD_UDPPORT=4001
MOUNTD_PORT=4002
RQUOTAD_PORT=4003

Add the following to /etc/services and comment out original entries:

rquotad         4003/tcp                        # rquota
rquotad         4003/udp                        # rquota

Start NFS service and enable at boot time:

/etc/init.d/portmap start 
/etc/init.d/nfs start 
/etc/init.d/nfslock start 

chkconfig portmap on 
chkconfig nfs on 
chkconfig nfslock on

Now if your running a host based firewall you will want to open it remember we are controlling access via hosts.allow:

-A RH-Firewall-1-INPUT -p tcp -m multiport --dports 4000:4003 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp -m multiport --dports 4000:4003 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp --dport 2049 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp --dport 2049 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp --dport 111 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp --dport 111 -j ACCEPT

Now you need to setup and install SAMBA to share out the same file system via CIFS:

Lets start with the firewall rules and assume that our Windows servers are all on 192.168.10.0/24

-A RH-Firewall-1-INPUT -s 192.168.10.0/24 -m state --state NEW -m tcp -p tcp --dport 445 -j ACCEPT
-A RH-Firewall-1-INPUT -s 192.168.10.0/24 -m state --state NEW -m udp -p udp --dport 445 -j ACCEPT
-A RH-Firewall-1-INPUT -s 192.168.10.0/24 -m state --state NEW -m udp -p udp --dport 137 -j ACCEPT
-A RH-Firewall-1-INPUT -s 192.168.10.0/24 -m state --state NEW -m udp -p udp --dport 138 -j ACCEPT
-A RH-Firewall-1-INPUT -s 192.168.10.0/24 -m state --state NEW -m tcp -p tcp --dport 139 -j ACCEPT

Install Samba and required componets:

yum install samba samba-client samba-common

Turn it on at boot time

chkconfig smb on
chkconfig nmb on

Edit the file /etc/samba/smb.conf and add your info to the config file including workgroup… yes it’s possible to add Samba to a domain I will not cover it here

#======================= Global Settings =====================================
[global]
 workgroup = WORKGROUP
 security = share
 map to guest = bad user
#============================ Share Definitions ==============================
[ISO]
 path = /nfs/ISO
 browsable =yes
 writable = yes
 guest ok = yes
 read only = no

Restart samba services to reload the changed config

sudo service smb restart
sudo service nmb restart

Browse to the machine for something in 192.168.10.0/24 and you should see the share and be able to write to it.

Please let me know if you have any questions and enjoy your ISO share (yes it can be a virtual machine)

Disable crontab email to user

So your daily jobs send you email and you want it to stop well just add this to then end of all your cron jobs:

 

>/dev/null 2>&1

 

This means send standard output and standard error to /dev/null which throws it away.

 

Secure / Harden PHP

PHP is great and I love it, but it does have some basic things that can improve it’s security simple modifications to php.ini can really increase the security.  Locate your php.ini (find / -name php.ini) and then modify the following items

 

#Avoids system calls and buffer overflows

disable_functions = exec,system,shell_exec,passthru

# Injection protection
register_globals = Off

# Turns off display of PHP version
expose_php = Off

#Escape incomming quotes to avoid injection
magic_quotes_gpc = On

 

 

These will take huge steps to protecting your system

Change permissions or owner on all files Linux

So you have a directory and you want to change permissions on all files in the directory or directories but not the directories themselve it’s easy in linux

Assume the directory I want to start on is /local and I want everything under this directory to be chmod 644 then I would run

 

find /local -type f -exec chmod 644 {} \;

 

Or if I wanted to print out results first to check them

 

find /local -type f -print

 

Or if I wanted to change the owner to bob:bob

 

find /local -type f -exec chown bob:bob {} \;

 

How about you want to change directories only to 755?

 

find /local -type d -exec chmod 755 {} \;

 

 

Boot From SAN, multipath.conf and initrd

At work almost everything I run boots from SAN.  This allows flexability when hardware problems arise.  It does cause one issue, when moving storage systems how do you move your boot from san luns.  Well there is the old dd trick to do a block level copy between two similar sized luns.  This alone will not work, because the wwid of your boot from san lun will change.  When this happens the following all become issues:

1. You need to change your bios settings for boot from san wwid (This is a trick provided by modern HBA’s and trick the BIOS into thinking a single lun is a local disk so it can be booted from and is normally done in the HBA BIOS at least for qlogic)

2. You need to identify the lun to your multipathing software (so it’s redundant I will use Linux device mapper which is handled by a kernel driver and the multipath.conf file)

 

Both of those are straight forward and easy but when you change them it just does not work, why not?  Well it’s because of how linux boots and handles multipath devices.

You boot loader has not kernel and as such cannot load data off a multipathed device (it needs a kernel driver to do it) so how does it solve this issue? Well it compiles both the driver and multipath.conf into the initrd.  Yep you heard me right your multipath.conf which you have been changing for years without an mkinitrd has a copy of multipath.conf.   How does it work?

1. Grub stage 1 hands off to the hard drive partition (single path of your /boot partition)

2. Grub 1.5 loads the initrd which contains the multipath driver and multipath.conf for at least the boot from san lun.

3. Grub 2 loads the normal kernel which then reloads the multipath.conf from disk enabling all other disks listed in there

 

So to solve this issue when you change the WWID of your boot from SAN lun or optionally any lun you can remake you initrd with the following command:

 

mkinitrd /boot/initrd-new-boot.img your-kernel-version

 

For some reason on RHEL when I issue this command with the name of my current initrd and a -f to force it multipath does not get added like it should so use a different name then it works.  Then just change your /boot/grub/menu.1st to boot the new initrd and your all set..

 

How do I tell whats in my initrd :

 

cd /tmp

mkdir initrd

zcat /boot/initrd-version.img | cpio -id

and you will have an open initrd that you can browse… try looking for /etc/multipath.conf.

 

You can recompile it with

cd initdir
find . | cpio -o -H newc | gzip -9 > ../initrd.gz

 

 

Enjoy

 

Linux Boot Process

This will document the x86 boot process from a linux perspective. This document will attempt to provide a technical overview if you are not comfortable with hexidecimal, octal or binary you might want to brush up on them first.

Order of boot

  1. The BIOS completes it’s check (memory, cpu, video)
  2. The BIOS execututes the master boot code in the MBR
  3. The master boot code then has two functions identify any active partitions and any extended partitions.
  4. If the master boot code identifies a extended partition it follows the link to the extended partition and so on until it finds no additional partitions.
  5. The master boot loader moves to the active partition and turns over booting to that partition.
  6. The boot loader enters stage 1
  7. The boot loader enters stage 1.5 and displays the menu
  8. The boot loader enters stage 2 and waits for user input or default selection timout
  9. The Kernel initilizes the hardware.
  10. The boot loader loads drivers and modules out of the initrd in /boot/initrd
  11. The boot loader turns over booting to the kernel
  12. /sbin/init executes the rest of the system.
  13. int starts the run level scripts

The boot process starts with a 512 byte piece of code called the master boot record. The MBR is stored on the first 512 bytes of a drive. The BIOS accesses this section and it contains code that points to the rest of the boot process. The master boot record contains the partition table, bootloader and a section called the magic number. The bootloader takes the first 446 bytes. The partition table takes the next 64 bytes. The magic number takes the last 2 bytes.

The magic number is used as a crc check for your mbr it should always contain 0xAA55. You can dump the mbr on your system using:
dd if=/dev/hda of=/mbr.dump bs=512 count=1
This will dump the first 512 bytes of your hda drive to the file /mbr.dump. You can also rewrite this mbr to the file system using:
dd if=/mbr.dump of=/dev/hda bs=512 count=1
You can use strings to view the current boot loader: strings /mbr.dump


linuxmoney:~ # strings /mbr.dump
ZRrK
D|f1
GRUB
Geom
Hard Disk
Read
 Error

You can view the partitions on the disk by using: file /mbr.dump This will produce a output listing partitions and start and stop sectors: x86 boot sector;
partition 1: ID=0x83, starthead 1, startsector 63, 417627 sectors;
partition 2: ID=0x82, starthead 0, startsector 417690, 2104515 sectors; partition 3: ID=0x83, starthead 0, startsector 2522205, 4209030 sectors;
partition 4: ID=0xf, active, starthead 0, startsector 6731235, 149565150 sectors, code offset 0x48
You can see that partition 4 is active the ID displays a the type of partition. You can find a list of partition ID codes Here. Since each sector has 512 bytes we can find the size of each partition: For example partition 1 is 417627 sectors. You can find the size using: echo $(((417627/2)/1024)) You can compare this information to a df -k output Filesystem 1K-blocks  Used      Available Use%  Mounted on
/dev/hda5 20641788    5464224   14128924  28%   /
/dev/hda6 52964408    4147160   46126764   9%   /home
/dev/hdc1 244076732   100537572 143539160 42%   /data

You can dump hex of the mbr using: od -Ad -tx1 /mbr.dump

You can also dump it using hexdump

Key
Color Description
RED Boot Loader
GREEN 1st Partition table
YELLOW 2nd Partition table
BROWN 3rd Partition table
PINK 4th Partition table
BLUE Magic Number

You can also do a hex dump using xxd /mbr.dump


linuxmoney:~ # xxd mbr.dump
0000000: eb48 90d0 66bc 007c 0000 8ec0 8ed8 89e6  .H..f..|........
0000010: 66bf 0006 0000 66b9 0001 0000 f3a5 ea23  f.....f........#
0000020: 0600 0080 fa80 7c05 80fa 877e 02b2 8088  ......|....~....
0000030: 1649 0766 bfbe 0700 0031 f666 b904 0302  .I.f.....1.f....
0000040: ff00 0020 0100 0000 0002 fa90 90f6 c280  ... ............
0000050: 7502 b280 ea59 7c00 0031 c08e d88e d0bc  u....Y|..1......
0000060: 0020 fba0 407c 3cff 7402 88c2 52be 817d  . ..@|<.t...R..}
0000070: e836 01f6 c280 7456 b441 bbaa 55cd 135a  .6....tV.A..U..Z
0000080: 5272 4b81 fb55 aa75 45a0 417c 84c0 783e  RrK..U.uE.A|..x>
0000090: 7505 83e1 0174 3766 8b4c 10be 057c c644  u....t7f.L...|.D
00000a0: ff01 668b 1e44 7cc7 0410 00c7 4402 0100  ..f..D|.....D...
00000b0: 6689 5c08 c744 0600 7066 31c0 8944 0466  f.\..D..pf1..D.f
00000c0: 8944 0cb4 42cd 1372 05bb 0070 eb7d b408  .D..B..r...p.}..
00000d0: cd13 730a f6c2 800f 84e8 00e9 8d00 be05  ..s.............
00000e0: 7cc6 44ff 0066 31c0 88f0 4066 8944 0431  |
 .D..f1...@f.D.1This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
 
00000f0: d288 cac1 e202 88e8 88f4 4089 4408 31c0  
 ..........@.D.1.This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
 
0000100: 88d0 c0e8 0266 8904 66a1 447c 6631 d266  .....f..f.D|f1.f
0000110: f734 8854 0a66 31d2 66f7 7404 8854 0b89  .4.T.f1.f.t..T..
0000120: 440c 3b44 087d 3c8a 540d c0e2 068a 4c0a  D.;D.}<.T.....L.
0000130: fec1 08d1 8a6c 0c5a 8a74 0bbb 0070 8ec3  .....l.Z.t...p..
0000140: 31db b801 02cd 1372 2a8c c38e 0648 7c60  1......r*....H|`
0000150: 1eb9 0001 8edb 31f6 31ff fcf3 a51f 61ff  ......1.1.....a.
0000160: 2642 7cbe 877d e840 00eb 0ebe 8c7d e838  &B|..}
 .@.....This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
 }.8
0000170: 00eb 06be 967d e830 00be 9b7d e82a 00eb  .....}.0...}.*..
0000180: fe47 5255 4220 0047 656f 6d00 4861 7264  .GRUB .Geom.Hard
0000190: 2044 6973 6b00 5265 6164 0020 4572 726f   Disk.Read. Erro
00001a0: 7200 bb01 00b4 0ecd 10ac 3c00 75f4 c300  r.........<.u...
00001b0: 0000 0000 0000 0000 5147 0a00 0000 0001  ........QG......
00001c0: 0100 83fe 3f19 3f00 0000 5b5f 0600 0000  ....?.?...[_....
00001d0: 011a 82fe 3f9c 9a5f 0600 c31c 2000 0000  ....?.._.... ...
00001e0: 019d 83fe 7fa2 5d7c 2600 8639 4000 8000  ......]|&
 ..9@...This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
 
00001f0: 41a3 0ffe ffff e3b5 6600 de2e ea08 55aa  A.......f.....U.

linuxmoney:~ # hexdump mbr.dump
0000000 48eb d090 bc66 7c00 0000 c08e d88e e689
0000010 bf66 0600 0000 b966 0100 0000 a5f3 23ea
0000020 0006 8000 80fa 057c fa80 7e87 b202 8880
0000030 4916 6607 bebf 0007 3100 66f6 04b9 0203
0000040 00ff 2000 0001 0000 0200 90fa f690 80c2
0000050 0275 80b2 59ea 007c 3100 8ec0 8ed8 bcd0
0000060 2000 a0fb 7c40 ff3c 0274 c288 be52 7d81
0000070 36e8 f601 80c2 5674 41b4 aabb cd55 5a13
0000080 7252 814b 55fb 75aa a045 7c41 c084 3e78
0000090 0575 e183 7401 6637 4c8b be10 7c05 44c6
00000a0 01ff 8b66 441e c77c 1004 c700 0244 0001
00000b0 8966 085c 44c7 0006 6670 c031 4489 6604
00000c0 4489 b40c cd42 7213 bb05 7000 7deb 08b4
00000d0 13cd 0a73 c2f6 0f80 e884 e900 008d 05be
00000e0 c67c ff44 6600 c031 f088 6640 4489 3104
00000f0 88d2 c1ca 02e2 e888 f488 8940 0844 c031
0000100 d088 e8c0 6602 0489 a166 7c44 3166 66d2
0000110 34f7 5488 660a d231 f766 0474 5488 890b
0000120 0c44 443b 7d08 8a3c 0d54 e2c0 8a06 0a4c
0000130 c1fe d108 6c8a 5a0c 748a bb0b 7000 c38e
0000140 db31 01b8 cd02 7213 8c2a 8ec3 4806 607c
0000150 b91e 0100 db8e f631 ff31 f3fc 1fa5 ff61
0000160 4226 be7c 7d87 40e8 eb00 be0e 7d8c 38e8
0000170 eb00 be06 7d96 30e8 be00 7d9b 2ae8 eb00
0000180 47fe 5552 2042 4700 6f65 006d 6148 6472
0000190 4420 7369 006b 6552 6461 2000 7245 6f72
00001a0 0072 01bb b400 cd0e ac10 003c f475 00c3
00001b0 0000 0000 0000 0000 4751 000a 0000 0100
00001c0 0001 fe83 193f 003f 0000 5f5b 0006 0000
00001d0 1a01 fe82 9c3f 5f9a 0006 1cc3 0020 0000
00001e0 9d01 fe83 a27f 7c5d 0026 3986 0040 0080
00001f0 a341 fe0f ffff b5e3 0066 2ede 08ea aa55
0000200

You can manually decode the partition table using the following
information. Remember to flip the bytes to get the correct order e.g
0080 becomes 80 00.

Offset Size Description
0x00 1 byte Active flag 0x80 active otherwise 0x00
0x01 3 bytes Cylinder-head-sector address of the first sector in the partition
0x04 1 byte Partition type
0x05 3 bytes Cylinder-head-sector address of the last sector in the partition
0x08 4 bytes Logical block address of the first sector in the partition
0x0C 4 bytes Length of Parition in sectors

+--- Active partition flag 80H for active partition
|
|      +--- Cylinder-head-sector address of the first sector in the partition
|      |
|      |    +--- Partition Type List here.
|      |    |
|      |    |     +--- Cylinder-head-sector address of the last sector.
|      |    |     |
|      |    |     |        +--- Logical block address of the first sector.
|      |    |     |        |
|      |    |     |        |	    +--- Size of Parition in sectors.
|      |    |     |        |        |
-- -------- -- -------- -------- --------
DL DH CL CH TB DH CL CH LBA      SIZE
00 01 01 00 83 fe 3f 19 3f000000 5b5f0600	1st Partition
00 00 01 1a 82 fe 3f 9c 9a5f0600 c31c2000	2nd Partition
00 00 01 9d 83 fe 7f a2 5d7c2600 86394000	3rd Partition
80 00 41 a3 0f fe ff ff e3b56600 de2eea08	4th Partition

Decoding CHS

The CHS is used to decode the location of the first of the
partition if that location exists within the first 1024 cylinders of
the hard drive. When the location goes beyond that location the CHS
value is normaly set to the max values of 1024,254,63 or FE FF FF.
Decoding the values can be a challenge without switching to the binary
value. They are stored in the order of head, sector, and cylinder, the
cylinder value requires more than 8 bits (1 byte) the sector value uses
less than 8 bits, so you have to convert the values to binary to decode them:


If the ending value for cylinder is 1023 or above then you have to figure out the ending location by adding the size to the starting location.
Remember that we can only have four partition tables per disk. This

is why extended paritions were created an extended partition uses a

link table to create unlimited partitions. The partition entries on the

table are top down. The first partition on the physical disk is the

last entry in the MBR partition table.



Extended Partitions 

Extended Partitions are a way of getting around the four

partition limit on file systems. Extended partitions cannot be marked

as active or used as a boot device. The extended partition section in

the MBR can describe up to at least 23 (Old DOS) additional partitions

under linux the amount of partitions possible is much higher. Extended

partitions have the partition type of 05h or 0Fh depending on size of

the disk. Extended partition boot records duplicate the MBR. Normally

the first 446 bytes of the extended section is empty (LILO and GRUB

both use it for internal code) The partition table is then full of

partition entries followed by the aa55 code. In extended partitions the

total size (LBA) is the size of all extended partitions.

GRUB
Grub stands for GRand Unified Bootloader. It is the most common boot loader for linux today. The boot process with GRUB is as follows:

  1. Starts executing bootloader code (GRUB stage 1) (boot.img).
  2. Bootloader jumps to the sector number of next stage. The stage 1.5 located in the “DOS compat space” immediately after the MBR.
  3. Stage 1.5 loads the file system and make full drive size available for loading. (diskboot.img+kernel.img+pc.mod+ext2.mod)
  4. Stage 2 takes over and loads the boot menu. (normal.mod+_chain.mod)
  5. After your selection the operating system is loaded.
Grub files are located in /boot/grub here you can find the stage1 stage2 and the menu.1st or grub.conf files. The configuration is done in the menu.1st or grub.conf file.
linuxmoney:/ # ls -al /boot/grub/
total 228
drwxr-xr-x 2 root root   4096 Sep 27 18:12 .
drwxr-xr-x 3 root root   4096 Jun 30 17:43 ..
-rw------- 1 root root     30 Jun 30 17:43 device.map
-rw------- 1 root root     30 Jun 30 17:37 device.map.old
-rw-r--r-- 1 root root   7552 Nov 25  2006 e2fs_stage1_5
-rw-r--r-- 1 root root   7424 Nov 25  2006 fat_stage1_5
-rw-r--r-- 1 root root   6688 Nov 25  2006 ffs_stage1_5
-rw-r--r-- 1 root root   6688 Nov 25  2006 iso9660_stage1_5
-rw-r--r-- 1 root root   8160 Nov 25  2006 jfs_stage1_5
-rw------- 1 root root   1385 Jun 30 17:43 menu.lst
-rw------- 1 root root   1188 Jun 30 17:36 menu.lst.joe
-rw------- 1 root root   1385 Jun 30 17:37 menu.lst.old
-rw-r--r-- 1 root root   6848 Nov 25  2006 minix_stage1_5
-rw-r--r-- 1 root root   9216 Nov 25  2006 reiserfs_stage1_5
-rw-r--r-- 1 root root    512 Nov 25  2006 stage1
-rw-r--r-- 1 root root 104042 May 19 11:13 stage2
-rw-r--r-- 1 root root   7040 Nov 25  2006 ufs2_stage1_5
-rw-r--r-- 1 root root   6240 Nov 25  2006 vstafs_stage1_5
-rw-r--r-- 1 root root   8904 Nov 25  2006 xfs_stage1_5

To reinstall grub in your mbr type: grub-install /dev/hda       Configuration for grub is done inside grub.lst (normally in /boot/grub/grub.lst) this file has the following settings: # Comments inside grub.lst ae done with a hash mark (#)

# default defines the default choice to boot without user interaction
default 0
# Time out sets how long the boot menu will display before it loads default
timeout 30
# fallback provides a another choice in case default fails.
fallback 1
# hiddenmenu allows you to choose not to display the boot menu instead boot the default
# hiddenmenu
# OS definitions begin with a title title is what is displayed on the screen to the user
title openSUSE 10.2 – 2.6.18.8-0.3
# After the title description everything that follows is part of the same boot loader until the title tag appears again.
# Common entries in linux are root, kernel, and initrd
# root defines the root partition and tries to get the size of the partition hd0 partition 4
root (hd0,4)
# kernel attempts to load the kernel image off the root device
kernel /boot/vmlinuz-2.6.18.8-0.3-bigsmp root=/dev/hda5 vga=0x31a resume=/dev/hda2 splash=silent showopts
# initrd Load an initial ramdisk (allows you to modify the kernel without a recompile
initrd /boot/initrd-2.6.18.8-0.3-bigsmp
Command Line Options
While at the boot menu you can also pass grub command line variables like what runlevel to boot into or additional options. To choose the run level to boot the kernel into:

  • On the graphical menu highlight the kernel you wish to boot
  • Press the e button to edit the kernel selection
  • At the prompt type the number of the run level you wish to boot into (1 to 5) single or emergecy
  • Once returned to the grub menu press b too boot the kernel and runlevl selection

You can read more about grub options at The GNU grub menu.
LILO
LILO (LInux LOader) is a generic boot loader for Linux. Lilo is an older boot loader it follows the same process as GRUB. Unfortunatly, it does not contain a command line interface like grud making MBR changes required each time you want to change boot parameters. Also changes to LILO can cause the system to fail to boot. It is for this reason alone that GRUB has become the standard boot loader of linux. Lilo keeps some files in /boot but it’s configuration is done in /etc/lilo.conf. To reinstall lilo as the boot loader: /sbin/lilo Command Line Options
While at the boot menu you can choose what runlevel you want to boot by pressing:

  • Ctrl-X to get boot:
  • Type linux runlevel

Kernel

Once the boot loader has reached second stage it reads it’s configuration ahd displays a menu of available kernels to boot. Once the user or boot loader determines what kernel to load stage two boots the kernel file off the /boot partition. Once the kernel is loaded the first step is to initialize the hardware. Then the kernel loading is reading the initrd image this file contains drivers required by the kernel to load scsi devices and ext3 file systems. Once initrd image is completely loaded the boot loader turns the booting process over to the kernel file. The kernel creates a read-only root device and mounts it. At this point the kernel is loaded but since no user space files are loaded you cannot interact with it. This is where /sbin/init takes over.

/sbin/init

init is what process the rest of the boot and provides the user environment. init becomes the parent or grandparent process for all processes on a system it has a pid of 1 always. It first runs the /etc/rc.d/rc.sysinit script that starts swap, system clock, check file systems and many other processes. It the runs /etc/inittab which sets up the run levels.

Runlevels

A runlevel is a collection of scripts used to start applications and services used by a system.  Linux supports multiple runlevels.  You can change between runlevels very quickly on a Linux system dismounting file systems as you go.  The configuration for the runlevels is done inside the /etc/inittab file.  You can find the default runlevel inside inittab:

id:5:initdefault:

The default run level on this system is 5 which is multiuser with graphical X windows interface.  inittab also possibly defines:

  • First script to be executed before runlevels /etc/init.d/boot
  • Defines the RC scripts to be executed with each run level
  • It also defines special keyboard commands
  • The getty-programs for each run level

The /etc/init.d/boot defines the following settings:

  • Sets the terminal size and dimentions for the terminal
  • Starts the initial boot messages and coloring
  • Sets up /proc /sys /dev /sys/kernel/debug
  • Starts user defined scripts boot.local

The default runlevels for Linux are:

Runlevel State
0 Shutdown
1 Single User Mode
2 Multiuser without network
3 Multiuser text based
4 Unused
5 Multiuser with Graphical X
6 Reboot

You can quickly change the runlevels using:

init runlevel

Each runlevel executes the scripts contained inside /etc/init.d/rc_runlevel.  The scripts inside here are normaly symbolic links to scripts inside /etc/init.d/ these scripts should take at least two variables stop and start.  The links inside /etc/init.d/rc_runlevel are of two types kill (K) scripts and start (S) scripts.  The type is followed by a two digit number used to denote the order inside this runlevel for the script to be executed.  For example:

# ls -al
total 8
drwxr-xr-x  2 root root 4096 Sep  2 21:52 .
drwxr-xr-x 11 root root 4096 Nov 16 20:25 ..
lrwxrwxrwx  1 root root    9 Sep  2 21:52 K02single -> ../single
lrwxrwxrwx  1 root root   12 Sep  2 21:52 K13microcode -> ../microcode
lrwxrwxrwx  1 root root    9 Sep  2 21:52 K13splash -> ../splash
lrwxrwxrwx  1 root root    8 Sep  2 21:52 K21fbset -> ../fbset
lrwxrwxrwx  1 root root   15 Sep  2 21:52 K21irq_balancer -> ../irq_balancer
lrwxrwxrwx  1 root root    8 May 19 10:46 S01fbset -> ../fbset
lrwxrwxrwx  1 root root   15 May 19 10:45 S01irq_balancer -> ../irq_balancer
lrwxrwxrwx  1 root root    6 May 19 10:47 S09kbd -> ../kbd
lrwxrwxrwx  1 root root   12 May 19 10:51 S09microcode -> ../microcode
lrwxrwxrwx  1 root root    9 May 19 10:47 S09splash -> ../splash
lrwxrwxrwx  1 root root    9 May 19 10:47 S20single -> ../single
You can see that I have many files that start as part of runlevel 1 for example S09splash starts before S20single.  It is very easy to automatically add an item to a run level using chkconfig in linux.  For example if I wanted to see if a script in /etc/init.d is started at runtime use the following command:

# chkconfig -l apache2
apache2  0:off  1:off  2:off  3:on   4:off  5:on   6:off

chkconfig can also be used to turn on specific run levels using

#chkconfig service_name runlevel/runlevels

For example:

#chkconfig apache2 235

Will start the apache2 script in /etc/init.d on runlevel 2, 3 and 5.  You can manually add the links using ln.  Also running chkconfig alone will display all scripts and they status at the current run level or chkconfig -l will display all runlevels.

Format Conversion Scripts

Format Conversion scripts

WMA to MP3

I created this script to go through all *.wma files in a directory and convert them to mp3 files and then delete the wma files. I works great for my mp3 play that does not support wma. It takes the wma files and converts them to wav then converts them to mp3. It uses mplayer and lame to do the work.

#!/bin/bash
#Rip with Mplayer / encode with LAME
for i in *.wma ; do mplayer -ao pcm -vc dummy "$i" && lame --preset 128 audiodump.wav -o "`basename "$i" .wma`.mp3"; done; rm -f audiodump.wav
#Delete audiodump.wav
rm audiodump.wav

ffmpeg

If you need to transcode from one format to another I have struggled to get it working correctly the fastest / best program I have used is ffmpeg. I’ll document some of my steps here to make the correct ffmpeg setup.

To display all codec’s / supported formats type

ffmpeg -formats

To convert from a file to avi type this:

ffmpeg -i inputfilename.mpg output.avi