Logical Volume Management

Logical Volume Management provides a great deal of flexibility allowing you to dynamically re-allocate blocks of space to different file systems.  Traditional volume manage relies upon strict partition boundaries separating and containerizing each mount point.  Logical Volume Management in linux takes control of the whole drive carving it out into equal chunks called physical extents (PE).  Each PE is addressed by it’s logical extent (LE) address.  Groups of LE’s are grouped together to form logical volumes (LV) that are used to mount as file systems.   Then LV’s are grouped into volume groups (VG) for management purposes.

Creating a Physical Volume

Use the pvcreate command to create a PV.  To create a PV on an empty disk (/dev/sda)  use this command:

pvcreate /dev/sda

You can also setup PV’s on current empty partitions.  Use fdisk to change to system ID of the partition to hex 8e then use the pvcreate command on the partition.  Any data on that partition will be lost.

Create a Volume Group

Once you have one or more PV’s you can create a LV.   LV’s can cross multiple partitions or disks.  To create the initial VG called bob use the following command:

vgcreate bob /dev/sda1 /dev/sda3

If you would like to add space to bob use the following command:

vgextend bob /dev/hda1

If you want to reduce space you can use the following command:

vgreduce bob /dev/hda1


Creating the logical volume

Now that you have one or more PV’s grouped together in VG’s your ready for a logical volume and file systems.   In order to create a mountable file system it would be best to know the size of each PE.  This will allow you to define the correct size for each mount point.   You display a lot of information on vg’s with the following command:

vgdisplay bob

Then you can create an LV of the correct size with the lvcreate command:

lvcreate -l num_of_PEs bob -n name_of_LV

This will create a LV inside the location /dev/bob/name_of_LV.  Then you can use standard disk tools on the logical volume to lay down a file system.  If you want to extend the size of the LV you use the lvextend command for example to add 150Mb’s:

lvextend -L150M /dev/bob/name_of_LV

Exploring /etc/fstab

I have been working with linux for a long time but from time to time I come to a very simple concept that I should have already known.  I have been reading my old books and I ran across a discussion of /etc/fstab which sparked my interest into better understanding each of the fields in /etc/fstab in linux.

[table id=1 /]


Label

On the surface this seems to be a simple concept.  The most basic example is /dev/hda1 or /dev/sda1  which would be device hard drive a partition 1 (/dev/hda1) or scsi drive a partition 1 (/dev/sda1) but many things can be placed inside this field to identify hard drives.  You can use a multitude of options in modern linux.  You can use drive labels, device names or UUID’s.   Label’s and UUID’s provide the advantage of not possibly changing with a reboot.

Mount Point

The location on the file system where you want to mount the partition.

Format

The filesystem on the partition.  Can be mount using fdisk if needed.

Mount Options

There are lots of mount options for different needs.

[table id=2 /]

Dump Value

This field is used to denote if the backup program dump is used to backup the file system.  A value of 0 means it’s ignored while any number larger denotes how often in days the file system should be dumped.  This value is mostly ignored since very few people use dump now.  Root should be a 1 while other file systems should be 2 except swap which should be a 0.

Filesystem Check Order

This is the order in which file systems are checked during boot by fsck -p.   A setting of 0 is ignored while root should be 1 and everything else a 2 or higher.

Oracle 11gR2 Rac GNS Systems Requirements


Oracle Clusterware has introduced a whole bunch of new features.  One of the key features is the ability for DBA’s to add nodes without massive modification from DBA’s.   This feature is based upon standard RAC concepts:

Each node has a virtual IP or VIP.  VIP’s are assigned to each nodes.  If a node becomes unavailable the VIP can be reassigned to another node.

You have an additional set of virtual IP’s called SCAN’s which are owned by the cluster.  Scan’s all resolve to the same DNS name very similar to a DNS round robin (without the round robin).   You have a Scan for each node.  Clients connect to the cluster via the Scan IP addresses.  If a node becomes unavailable the other nodes answer the scan requests.

The cluster has a vip called GNS (Global Name Service) which is an DNS server for a subdomain where the cluster lives.  The GNS runs on each node in the cluster and if the current active node becomes unavailable then the GNS vip is moved to another node.

GNS_DNS

This process is dependent upon DNS, DHCP and lots of IP addresses.  The Oracle documentation is pretty weak on the systems details on this process and assumes that you will setup a dedicated DHCP and network for your RAC setup.   Personally I like to have full control over my DHCP setup which has posed some unique problems.

DHCP Setup:

DHCP is a very basic protocol.  A client sends out a layer 2 broadcaster with some potential options and waits for a DHCP server to respond.  Normally a DHCP client send it’s MAC address as one of the options.   Then the DHCP server responds with a layer 2 packet to the client identifying the clients IP address and additional options (such as router, domain name etc..)   Since the SCAN’s and VIP’s need to be mobile it’s impossible to provide a MAC address without breaking the protocol.  So it provides a generic mac (output from /var/log/messages):

DHCPDISCOVER from 00:00:00:00:00:00 via eth0

As you can see the client (a VIP in this case)  is identifing it’s self as 00:00:00:00:00:00 which is an impossible mac address.  So passing out addresses locked to specific mac address is not an option.  Our friends at Oracle did provide another option the DHCP request does contain a dhcp-client-identifier which is unique to the DNS name for each requested address (node1-vip.cluster-name.domain.name etc..)  This name if forgotten can be obtained by opening up the DHCP server and viewing the dhcpd.leases file.  The dhcp-client-identifier can be located as uid inside the correct entry.

Once you have located the client identifier your ready to lock down your dhcp server using the class (used to define groups of clients based upon expressions)

If my client identifiers were “\000node1vip” and “\000node2vip” then my class in DHCP would look like (defined inside the subnet area):

class "oracle-vip-class" {
                match if option dhcp-client-identifier = "\000node1vip" or
                option dhcp-client-identifier = "\000node2vip";
        }

We would then limit our range and assign it to vip’s using the following statement:

pool {
                range 10.10.101.10 10.10.101.12;
                allow members of "oracle-vip-class";
        }

This does spit in the face of Oracle original intent.  Now it requires a DHCP administrator to add a match statement before a node can aquire an vip or scan.

DNS Setup:

As shown above you need to have your enterprise DNS servers point to GNS Vip for a subdomain.  This is a standard subdomain setup without many modifications.  I found that I had to remove my forwarders line before it would work.  Forcing my DNS server to query root servers instead of upstream DNS servers.

Rescan SCSI on Linux

I constantly need to add disk to fiber channel based linux systems.  Here are a few methods that work for me:

Locate the location of your HBA’s:


ls -al /sys/class/scsi_host/

Rescan the HBA’s

echo “- – -” > /sys/class/scsi_host/host0

echo “- – -” > /sys/class/scsi_host/host1

You can view /var/log/messages to see if any lun’s / /dev/sd# locations were added. Or do a ls -altr /dev/sd*

Also you might want to look at this post.

On VMware you can dynamically add lun’s but you don’t have any HBA’s.  I have found the following script does a great job of rescanning the hard drives:

rescan-scsi-vmware