Scripting out vSwitches in VMware

Virtual switches are a fun topic in ESX,  They are unique on each ESX node and not shared across the cluster.  This problem was addressed in ESX 4.0 with distributed virtual switches (DVS) which allows you to create switches on vCenter and pass it to all nodes.  Unfortunately DVS is available only in the plus licenses which cost about $1000 more per processor.  For those of us without DVS are forced to script out vSwitches.   The process is pretty simple but has to be done in the right order from the service console:

  1. Create the vSwitch
  2. Create port groups
  3. Assign VLAN tags to port groups if required
  4. Apply security policy
  5. Link a nic to the switch
  6. Create a service console if required
  7. Assign ip addresses if required
  8. Enable vmotion if required

8.

The only thing I missed was setting a default order on the nic’s if you have multiple nic’s: For example my vSwitch1 has two port groups with 2 vnics I can choose to force a vnic for each group:

VLAN Tagging in Linux

Recently I have been doing some reworking on networking at work. One of the new requirements is that everything be network connection be a tagged VLAN. This is a pretty simple process in Red Hat Linux with multiple paths. Test files are my favorite way to make these changes so lets assume that I want the VLAN to be 455 with the nic eth0.

  • Navigate to your networking scripts:  /etc/sysconfig/network-scripts
  • Copy your current eth0 configuration cp ifcfg-eth0 ifcfg-eth0.455
  • Open the file:

  • Modify the device name to read eth0.455
  • Add the line VLAN=yes to the end of the file
  • Save and exit
  • Shutdown the old interface (make sure your on console)
  • ifdown eth0
  • Bring up new VLAN
  • ifup eth0.455
  • Delete old interface rm ifcfg-eth0

That’s all you have to do and your Operating system will be tagging all outbound traffic with VLAN 455 and only reading traffic from 455.

Mounting via labels in fstab

An issue I have run into on Solaris is the dreaded device rename during a reboot. This can wreak havoc on your system during booting making it unable to boot. This problem was addressed in multiple ways in Linux. One of the most common methods is via labels. This allows you to write a string name to the disk and mount the disk via the string name. I find this particularly useful when using hot swappable drives in my PC. To see the current label on a disk:

[root@linuxmonkey ~]# e2label /dev/sda1
/boot

so the label on disk /dev/sda1 is /boot

To set a label type
[root@linuxmonkey ~]# e2label /dev/sda1 label_name

In order to mount via label instead of the traditional /dev/sda1 in fstab use a

LABEL=/boot

And you have a mount that is not effected by reboots.

Logical Volume Management

Logical Volume Management provides a great deal of flexibility allowing you to dynamically re-allocate blocks of space to different file systems.  Traditional volume manage relies upon strict partition boundaries separating and containerizing each mount point.  Logical Volume Management in linux takes control of the whole drive carving it out into equal chunks called physical extents (PE).  Each PE is addressed by it’s logical extent (LE) address.  Groups of LE’s are grouped together to form logical volumes (LV) that are used to mount as file systems.   Then LV’s are grouped into volume groups (VG) for management purposes.

Creating a Physical Volume

Use the pvcreate command to create a PV.  To create a PV on an empty disk (/dev/sda)  use this command:

pvcreate /dev/sda

You can also setup PV’s on current empty partitions.  Use fdisk to change to system ID of the partition to hex 8e then use the pvcreate command on the partition.  Any data on that partition will be lost.

Create a Volume Group

Once you have one or more PV’s you can create a LV.   LV’s can cross multiple partitions or disks.  To create the initial VG called bob use the following command:

vgcreate bob /dev/sda1 /dev/sda3

If you would like to add space to bob use the following command:

vgextend bob /dev/hda1

If you want to reduce space you can use the following command:

vgreduce bob /dev/hda1

Creating the logical volume

Now that you have one or more PV’s grouped together in VG’s your ready for a logical volume and file systems.   In order to create a mountable file system it would be best to know the size of each PE.  This will allow you to define the correct size for each mount point.   You display a lot of information on vg’s with the following command:

vgdisplay bob

Then you can create an LV of the correct size with the lvcreate command:

lvcreate -l num_of_PEs bob -n name_of_LV

This will create a LV inside the location /dev/bob/name_of_LV.  Then you can use standard disk tools on the logical volume to lay down a file system.  If you want to extend the size of the LV you use the lvextend command for example to add 150Mb’s:

lvextend -L150M /dev/bob/name_of_LV

Exploring /etc/fstab

I have been working with linux for a long time but from time to time I come to a very simple concept that I should have already known.  I have been reading my old books and I ran across a discussion of /etc/fstab which sparked my interest into better understanding each of the fields in /etc/fstab in linux.

Column OrderFieldDescription
1LabelThe file system label identifer or mount point
2Mount Pointlocation to mount the label
3FormatFile system type for the mount identified in 2
4Mount Optionscommonly default which is rw can also be: suid, dev, exec, auto, nouser, and async etc..
5Dump ValueIf 1 the filesystem is automatically written to disk
6File system check orderFile systems that need fsck, root should be 1, other partitions should be 2 and block devices or remote directories should be 0


Label

On the surface this seems to be a simple concept.  The most basic example is /dev/hda1 or /dev/sda1  which would be device hard drive a partition 1 (/dev/hda1) or scsi drive a partition 1 (/dev/sda1) but many things can be placed inside this field to identify hard drives.  You can use a multitude of options in modern linux.  You can use drive labels, device names or UUID’s.   Label’s and UUID’s provide the advantage of not possibly changing with a reboot.

Mount Point

The location on the file system where you want to mount the partition.

Format

The filesystem on the partition.  Can be mount using fdisk if needed.

Mount Options

There are lots of mount options for different needs.

OptionDescription
sync/asyncAll I/O to this file system should be done synchronously or asynchronously
autoMounted at boot or when mount -a is used.
noautoFile system will not automatically mounted at boot time or with mount -a
dev/nodevInterpret/Do not interpret block devices on file system
exec / noexecAllow or deny execution of binaries from file system
suid/nosuidAllow or Deny the use of suid or sgid bits
ro/rwMount Read Only (ro) or Read Write (rw)
userAllow any user to mount file system
nouserAllow only root to mount the file system
defaultsDefault settings equals: rw, suid, dev, exec, auto, nouser, async
_netdevNetwork file system bring up after network is up (only used with nfs)
atimeRecord latest time when a file is accessed
noatimeDo not record latest time when a file is accessed
relatimeUpdate access time if earlier than modification time.

Dump Value

This field is used to denote if the backup program dump is used to backup the file system.  A value of 0 means it’s ignored while any number larger denotes how often in days the file system should be dumped.  This value is mostly ignored since very few people use dump now.  Root should be a 1 while other file systems should be 2 except swap which should be a 0.

Filesystem Check Order

This is the order in which file systems are checked during boot by fsck -p.   A setting of 0 is ignored while root should be 1 and everything else a 2 or higher.

Oracle 11gR2 Rac GNS Systems Requirements


Oracle Clusterware has introduced a whole bunch of new features.  One of the key features is the ability for DBA’s to add nodes without massive modification from DBA’s.   This feature is based upon standard RAC concepts:

Each node has a virtual IP or VIP.  VIP’s are assigned to each nodes.  If a node becomes unavailable the VIP can be reassigned to another node.

You have an additional set of virtual IP’s called SCAN’s which are owned by the cluster.  Scan’s all resolve to the same DNS name very similar to a DNS round robin (without the round robin).   You have a Scan for each node.  Clients connect to the cluster via the Scan IP addresses.  If a node becomes unavailable the other nodes answer the scan requests.

The cluster has a vip called GNS (Global Name Service) which is an DNS server for a subdomain where the cluster lives.  The GNS runs on each node in the cluster and if the current active node becomes unavailable then the GNS vip is moved to another node.

GNS_DNS

This process is dependent upon DNS, DHCP and lots of IP addresses.  The Oracle documentation is pretty weak on the systems details on this process and assumes that you will setup a dedicated DHCP and network for your RAC setup.   Personally I like to have full control over my DHCP setup which has posed some unique problems.

DHCP Setup:

DHCP is a very basic protocol.  A client sends out a layer 2 broadcaster with some potential options and waits for a DHCP server to respond.  Normally a DHCP client send it’s MAC address as one of the options.   Then the DHCP server responds with a layer 2 packet to the client identifying the clients IP address and additional options (such as router, domain name etc..)   Since the SCAN’s and VIP’s need to be mobile it’s impossible to provide a MAC address without breaking the protocol.  So it provides a generic mac (output from /var/log/messages):

As you can see the client (a VIP in this case)  is identifing it’s self as 00:00:00:00:00:00 which is an impossible mac address.  So passing out addresses locked to specific mac address is not an option.  Our friends at Oracle did provide another option the DHCP request does contain a dhcp-client-identifier which is unique to the DNS name for each requested address (node1-vip.cluster-name.domain.name etc..)  This name if forgotten can be obtained by opening up the DHCP server and viewing the dhcpd.leases file.  The dhcp-client-identifier can be located as uid inside the correct entry.

Once you have located the client identifier your ready to lock down your dhcp server using the class (used to define groups of clients based upon expressions)

If my client identifiers were “\000node1vip” and “\000node2vip” then my class in DHCP would look like (defined inside the subnet area):

We would then limit our range and assign it to vip’s using the following statement:

This does spit in the face of Oracle original intent.  Now it requires a DHCP administrator to add a match statement before a node can aquire an vip or scan.

DNS Setup:

As shown above you need to have your enterprise DNS servers point to GNS Vip for a subdomain.  This is a standard subdomain setup without many modifications.  I found that I had to remove my forwarders line before it would work.  Forcing my DNS server to query root servers instead of upstream DNS servers.

Rescan SCSI on Linux

I constantly need to add disk to fiber channel based linux systems.  Here are a few methods that work for me:

Locate the location of your HBA’s:


ls -al /sys/class/scsi_host/

Rescan the HBA’s

echo “- – -” > /sys/class/scsi_host/host0

echo “- – -” > /sys/class/scsi_host/host1

You can view /var/log/messages to see if any lun’s / /dev/sd# locations were added. Or do a ls -altr /dev/sd*

Also you might want to look at this post.

On VMware you can dynamically add lun’s but you don’t have any HBA’s.  I have found the following script does a great job of rescanning the hard drives:

rescan-scsi-vmware