Partition Alignment and block size VMware 5

This post spawned from a question in the official VMware forums and got me thinking about this problem.  For years I have manually adjusted by partition alignment on my linux machines to get that little bit of improvement from the disk.  So the question was with so many layers (physical array, array raid, VMFS, Guest OS) what is the best practice around alignment and block size.

VMFS 5 facts

First lets start with some facts around VMFS version 5 (this assumes they are new lun’s in vmfs 5 not upgraded luns.  Upgraded lun’s retain their original block size)

  • Unified 1MB block file size (only present on new lun’s upgraded lun’s retail their 4.xx block size)
    • Supports very small files (<1KB) by storing them in the metadata rather than in the file blocks.
    • Uses sub-blocks of 8K rather than 64K, which reduces the space used by small files.
  • Partition table has been changed from MBR to GPT – which allows files larger than 2TB (remember that the max size for a single drive presented to a guest is locked at 2TB-512Bytes until 5.5 which allows 60TB’s to be presented to guests)
    • Upgrade does change the part table but only if the lun is over 2TB
  • Increased file count VMFS-5 allows for 100,000 files

You can check the current settings of your lun (figure out block size etc..)  via the following command:

vmkfstools -Pv 10 /vmfs/volumes/newly-created-vmfs5/

 

What is the deal with alignment:

Sectors on the disk

I will skip the legacy history lesson but the simple answer is hard drives are divided up into sectors using a method called Logical Block addressing (LBA).  LBA assigned the carved up disk sector numbers which are used to address specific locations.  The original size of a sector is 512 byte in size but there has been a big movement toward 4096 byte sectors.  The larger sector size provides some great efficiency but due to backward’s compatibility most storage will allow for 512 byte read’s / writes emulated which actually use 4096.

RAID on the disks

So now we have  disk cut into 4096 bytes and we want to give it some redundancy so we stripe / mirror data across multiple disks.   So on top of our 4096 byes we put data format carved into increments of 4KB to 256KB’s depending on the the array.  Now since all options should be in increments of 4KB’s we will not need half a sector to do a read.  See the diagram below for the correct alignment:

Untitled

Now if you raid set used an alignment of 13KB this would cause all kinds of problems.  The read/writes would cross sectors and carve them up into all kinds of messes.

Untitled

So the good news is your storage vendors know this and make sure their raid size is an increment of 4KB to avoid this.  What size does your array you use?  Completely up to your storage vendor.  You need to ask them.

VMFS Again

So how does vmfs handle alignment well it’s simple it used 1MB alignment size. so now we are up from 4KB – disk to 4KB – 256KB Raid to 1MB for VMFS.   There is another portion to this story disk alignment.   I will cover disk alignment more in the Guest OS section at this point just know that VMFS takes care of the issue.

Guest OS

Now we are really having fun.  The fine people who created your operating system needed some space at the front of every drive to hold partition information.  In their wisdom they used 62 bytes leaving 1 byte free and thus creating a mis-alignment with the 4KB sectors.  This alignment will create cross boundry read/writes as seen on the raid section.   So you need to offset your parition to start at a logical 4KB boundry like 63.  Failure to do this can cause problems.  In addition what block size should you use on your OS.  Well if given the choice you do not want to go smaller than 1MB since that’s the size VMFS uses.  You also want to make sure it’s divisable by 4KB.  So what should you use?  Really up to you and your operating system.  On Linux I normally use 4MB block sizes.  Windows does this on it’s own and aligns correctly.

What is the big deal?  How much do I really gain through all this math?

Not a whole lot really but it’s worth building your templates with these boundaries in mind.  Remember that all these things apply to physical machine best practices minus the 1MB VMFS size.  It’s a best practice and remember that with virtual machines it’s hundreds of mis-reads / writes and contention for the same sectors which can really add up.

Create a ISO datastore with CentOS

Morning,

This came up in a discussion in the vmware forums and I figured I would put it all down.  The user wanted to be able to have his ISO’s for VMware and Windows shared and wanted to know how to do it from Vmware.  Well it’s not possible from VMware because it cannot be a NFS server to share out VMFS.  But VMware does support NFS storage so with CentOS (RedHat / OracleLinux it will work the same) you can create a shared NFS mount that can also be mounted via CIFS to Windows.

So I am going to Assume you know how to install Linux if not download and click next…next…next. Once installed login as root and make sure you have networking.

Install and lock down NFS

Code:

yum install nfs -y

Secure the install of NFS:

add the following to /etc/hosts.deny (Will block everyone access to NFS services)

portmap: ALL
lockd: ALL
statd: ALL
mountd: ALL
rquotad: ALL

Add hosts that are allowed to connect to NFS to /etc/hosts.allow each Ip with an or

portmap: 10.10.101.10 or 10.10.101.11
lockd: 10.10.101.10 or 10.10.101.11
statd: 10.10.101.10 or 10.10.101.11
mountd: 10.10.101.10 or 10.10.101.11
rquotad: 10.10.101.10 or 10.10.101.11

The exported file system is the file system you want to share out we will use /nfs/ISO in this example it can be anything.  I would make it a different partition and potentially LVM but that’s out of scope.  Edit /etc/exports and add the servers you want to be able to mount /nfs/ISO notice I made 10.10.101.11 read only (ro) and 10.10.101.10 read write (rw)

# Sample
/nfs/ISO 10.10.101.10(rw,no_root_squash)
/nfs/ISO 10.10.101.11(ro,no_root_squash)

Now we need to lock down NFS to specific ports to make it more firewall friendly.   Edit /etc/sysconfig/nfs and add the following lines (make sure to comment out these lines if already in use)

STATD_PORT=4000
LOCKD_TCPPORT=4001
LOCKD_UDPPORT=4001
MOUNTD_PORT=4002
RQUOTAD_PORT=4003

Add the following to /etc/services and comment out original entries:

rquotad         4003/tcp                        # rquota
rquotad         4003/udp                        # rquota

Start NFS service and enable at boot time:

/etc/init.d/portmap start 
/etc/init.d/nfs start 
/etc/init.d/nfslock start 

chkconfig portmap on 
chkconfig nfs on 
chkconfig nfslock on

Now if your running a host based firewall you will want to open it remember we are controlling access via hosts.allow:

-A RH-Firewall-1-INPUT -p tcp -m multiport --dports 4000:4003 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp -m multiport --dports 4000:4003 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp --dport 2049 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp --dport 2049 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp --dport 111 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp --dport 111 -j ACCEPT

Now you need to setup and install SAMBA to share out the same file system via CIFS:

Lets start with the firewall rules and assume that our Windows servers are all on 192.168.10.0/24

-A RH-Firewall-1-INPUT -s 192.168.10.0/24 -m state --state NEW -m tcp -p tcp --dport 445 -j ACCEPT
-A RH-Firewall-1-INPUT -s 192.168.10.0/24 -m state --state NEW -m udp -p udp --dport 445 -j ACCEPT
-A RH-Firewall-1-INPUT -s 192.168.10.0/24 -m state --state NEW -m udp -p udp --dport 137 -j ACCEPT
-A RH-Firewall-1-INPUT -s 192.168.10.0/24 -m state --state NEW -m udp -p udp --dport 138 -j ACCEPT
-A RH-Firewall-1-INPUT -s 192.168.10.0/24 -m state --state NEW -m tcp -p tcp --dport 139 -j ACCEPT

Install Samba and required componets:

yum install samba samba-client samba-common

Turn it on at boot time

chkconfig smb on
chkconfig nmb on

Edit the file /etc/samba/smb.conf and add your info to the config file including workgroup… yes it’s possible to add Samba to a domain I will not cover it here

#======================= Global Settings =====================================
[global]
 workgroup = WORKGROUP
 security = share
 map to guest = bad user
#============================ Share Definitions ==============================
[ISO]
 path = /nfs/ISO
 browsable =yes
 writable = yes
 guest ok = yes
 read only = no

Restart samba services to reload the changed config

sudo service smb restart
sudo service nmb restart

Browse to the machine for something in 192.168.10.0/24 and you should see the share and be able to write to it.

Please let me know if you have any questions and enjoy your ISO share (yes it can be a virtual machine)

NFS for virtual machines Why?

A lot of shops are using NFS for VMFS datastores.  A lot of custom VMware storage arrays use it too like Tintri.  Why?  Well it brings some advantages to the tables.  In a lot of cases the storage vendor has been able to tweak the protocol to work faster with vm’s.  I won’t get into line speed questions but that is one to consider with 10GB NFS vs 8GB or 16GB fiber channel.

How is NFS different?

  • The file system itself is not managed by the ESXi host, ESXi just uses the protocol.
  • The burden of scaling is not placed upon the storage stack but instead the networking stack.  If networking fails you loose everything.
  • Since NFS uses ethernet multipathing is a complex situation that requires normally static link aggregation or dynamic LACP (802.3ad) (with multi-switch aggregation) You can also use multiple VMkernel networks on separate vSwitches and subnets.

Files that make up a virtual machine ESXi

For the longest time I always wondered what exactly all those files inside your directory do and their purpose so here is a handy guide.

Configuration File -> VM_name.vmx
Swap File -> VM_name.vswp or vmx-VM_NAME.vswp
BIOS File -> VM_name.nvram
Log files -> vmware.log
Disk descriptor file -> VM_name.vmdk
Disk data file -> VM_name-flat.vmdk
Suspended state file -> VM_name.vmss
Snapshot data file -> VM_name.vmsd
Snapshot state file -> VM_name.vmsn
Template file -> VM_name.vmtx
Snapshot disk file -> VM_name-delta.vmdk
Raw Device map file -> VM_name-rdm.vmdk

  • .vmx – Contains all the configuration information and hardware settings for the virtual machine, it is stored in text format.
  • .vswp – is a file that is always created for virtual machines during power on. It’s equal to the size of allocated ram minus any memory reservation at boot time. This swap file is used when the physical host exhausts all of its allocated memory and guest swap is used.
  • .nvram – is a binary formated file that contains BIOS information much like a BIOS chip. If deleted it is automatically recreated when the virtual machine is powered back on.
  • .log – Log files are created when the machine is power cycled the current log is always called vmware.log

Vmware virtual disk types

Vmware supports three different type of disks at this point (5.1)

  • Eager-zeroed thick
  • Lazy-zeroed thick
  • Thin

Eager-zeroed thick:

Disk space is allocated and zeroed out at creation time.   It takes the longest time to create but provides the best possible performance at first use. Mostly used for MSCS and FT virtual machines.

Lazy-zeroed thick

Disk space is allocated but not zeroed at creation time.  The first time your operating system requests a new block it is zero’ed out.   Performance is a little less than Eager-zeroed on first write then equal on each additional write to same sector.

Thin

Disk space is allocated and zero’ed upon request.

Which one do you choose?

Well that depends on your needs.  If performance is your critical issue then Thick eager zeroed provisioned is the only choice.  If you need to save disk space or doubt that your customer will really use the 24TB’s of disk space they have requested then thin provisioned is the choice.  Lazy zeroed is something between the two.  At this point vmware recommends Lazy zeroed.

How do I switch?

As of ESXi 5 you have two choices: storage vmotion and inflate.  When initiating a storage vmotion you have the option to choose any of the three options above and convert it.  You can also turn a thin into thick by finding the flat file using the datastore browser and selecting inflate.

SCSI Controller type (Only on first disks):

Much like disk type there are many choices:

  • BusLogic Parallel
  • LSI Logic Parallel
  • LSI Logic SAS – Requires Hardware 7 or later
  • VMware Paravirtual – Requires Hardware 7 or later

Paravirtual is a physical adapter that requires vmtools drivers in order to use.  Paravirtual adapters provide the best performance but can be only used in new operating systems.  Also they cannot be used on boot devices.   Normally your OS selection handles the best scsi type for you.

SCSI Bus Sharing:

When you add a new SCSI Bus you have options on the scsi type but it also gives you the following options (can only be changed when added or vm is powered down)

  • None – Virtual disks cannot be shared
  • Virtual – Virtual disks can be shared between virtual machines on the same server
  • Physical – Virtual disks can be shared between virtual machines on any server

Of course you still need a cluster file system but if you plan on using this system then select Physical.

Scsi bus location:

Each virtual machine can have up to 4 scsi buses each with their own controller.  Lots of people have questioned the advantage of multipe buses in vmware.  In traditional hardware you have multiple buses to provide redundancy in case of a bus failure.  This does not apply to virtual hardware.  But it still provides the virtual operating system multiple channels to handle I/O which is always a good thing.

Mode:

  • Independent (Not affected by snapshots)
  • Virtual (Default)

Independent Mode:

  • Persistent (Changes are written to disk) – great for databases and other data where a snapshot does not make sense.
  • Nonpersistent (Changes to this disk are discared when you power off or revert to the snapshot) – Used on lab computers, kiosks etc..