Quiescing Backup causing BSOD in Windows OS’s on Current VMware Tools

Evening,

I got notified of this problem earlier today thanks to my awesome BCS engineer.  You can read VMware’s week old KB here.  Essentially certain versions of tools can cause a BSOD when a quiescing operation is done.   This is a big problem for API based backups since when possible they use this method.   There are three solutions provided by VMware at this time:

  • Disable quiescing
  • Do not select Quiescing guest file system when taking a snapshot
  • Downgrade the VMware Tools to previous version not affected

Good news… the latest version of VMware Tools in 6 and 5.5 is affected.  I was notified that a few specific version were affected.  The KB does not specify them.   You can track versions to build numbers via this table: http://packages.vmware.com/tools/versions

I have been told the following are affected:

  • 8399 – 8.6.15
  • 9231 – 9.0.15
  • 9355 – 9.4.11
  • 9216 – 9.10.0

 

Here is a powershell snippet to locate the machine that are affected:

 $VMS = get-vm |get-view | where {$_.powerstate -ne "PoweredOff" } | where {$_.config.tools.toolsVersion -eq "8399" -or $_.config.tools.toolsVersion -eq "9231" -or $_.config.tools.toolsVersion -eq "9355" -or $_.config.tools.toolsVersion -eq "9216"}

 

I am not sure if there is a scripted way to downgrade the tools.  Here is VMware method to download older tools version.   As with all my articles you should open a VMware ticket to get specific production assistance.   Let me know if you know a scripted way to downgrade tools.

 

PowerCLI locate VM’s with multiwriter

Another snippet to locate VM’s with Mutiwritter enabled:

#Create the array

$array = @()

$vms = get-cluster “ClusterName” | get-vm

foreach ($vm in $vms)

{

 

 

$disks = get-advancedsetting -Entity $vm | ? { $_.Value -like “*multi-writer*”  }

foreach ($disk in $disks){

$REPORT = New-Object -TypeName PSObject

$REPORT | Add-Member -type NoteProperty -name Name -Value $vm.Name

$REPORT | Add-Member -type NoteProperty -name VMHost -Value $vm.Host

$REPORT | Add-Member -type NoteProperty -name Mode -Value $disk.Name

$REPORT | Add-Member -type NoteProperty -name Type -Value “MultiWriter”

$array += $REPORT

}

 

 

}

$array | out-gridview

PowerClI locate all the SCSI Bus Sharing VM’s

More things that stop vMotion like SCSI Bus Sharing here is a snippet to locate all of them in a cluster

#Create the array

$array = @()

$vms = get-cluster “ClusterName” | get-vm

#Loop for BusSharingMode

foreach ($vm in $vms)

{

 

$disks = $vm | Get-ScsiController | Where-Object {$_.BusSharingMode -eq ‘Physical’ -or $_.BusSharingMode -eq ‘Virtual’}

 

foreach ($disk in $disks){

$REPORT = New-Object -TypeName PSObject

$REPORT | Add-Member -type NoteProperty -name Name -Value $vm.Name

$REPORT | Add-Member -type NoteProperty -name VMHost -Value $vm.Host

$REPORT | Add-Member -type NoteProperty -name Mode -Value $disk.BusSharingMode

$REPORT | Add-Member -type NoteProperty -name Type -Value “BusSharing”

$array += $REPORT

}

 

 

}

$array | out-gridview

PowerCLI How to locate all RDM’s in a cluster

I love RDM’s they are a royal pain on managability until vSphere 6.   (You can vMotion RDM’s in 6)  Here is a snippet that will allow you to locate all RDM’s in a cluster:

#Create the array

$array = @()

$vms = get-cluster “ClusterName” | get-vm

foreach ($vm in $vms)

{

 

$disks = $vm | Get-HardDisk -DiskType “RawPhysical”,”RawVirtual”

 

foreach ($disk in $disks){

$REPORT = New-Object -TypeName PSObject

$REPORT | Add-Member -type NoteProperty -name Name -Value $vm.Name

$REPORT | Add-Member -type NoteProperty -name VMHost -Value $vm.Host

$REPORT | Add-Member -type NoteProperty -name Mode -Value $disk.DiskType

$REPORT | Add-Member -type NoteProperty -name Type -Value “RDM”

$array += $REPORT

}

 

}

$array | out-gridview

What is a Server Architect?

When I started my career I wanted to work with computers.   I knew that being a programmer was not for me, I liked to play with hardware and the big picture.   So I dabbled in PC support and quickly learned that I did not like being reactive.   Some jobs are mostly reactive for example a firefighter.  They train and prep, but most of their job is waiting for an emergency so they can react.   It’s impossible to be 100% proactive as a firefighter.  They have safety prevention and work to limit the effects of fire on the loss of life, but in the end they are still waiting for a fire.  PC support was the same model.  You wait for someone to break something, then you fix it.   I have seen some really great PC support teams that are very proactive with training and locking down the PC.   At the end of the day you are still waiting to react.   I wanted to be more proactive resolving problems before they become needs to react.   I went into systems administration convinced that computer will do what I tell them and I can enforce better outcomes.   I spent a number of years focused on Linux-based server working to create a very well-managed solution that would allow us to not be reactive.   I felt very successful in this journey to the point I became bored looking for new challenges.   When faced with many years left in my career I needed the next step.   That next step seemed to be Systems Architect.

What is an Architect?

In order to define an architect we should look to people who hold the title outside computers.   Building architects are the easiest.   A home architect takes into account many factors and produces a physical design for the builder to follow.    Some of the factors a building architect has to consider are the following:

  • Building code
  • City regulations
  • Lot size
  • Available funds
  • Customer requests and needs
  • Best practices

 

Each of these things can really be categorized into two columns:

  • Requirements – Things that must happen
  • Constraints – Things that limit or control what must happen

 

For example:

  • Building code – Constraints
  • City regulations – Constraints
  • Lot size – Constraints
  • Available funds – Constraints
  • Customer requests and needs – Requirements
  • Best practices – Things to keep in mind

 

Notice how best practices are not requirements or constraints.   It may be best practices to have a bathroom on the main floor but it’s not a requirement.  In IT this is true as well.   Systems Architects take information from the customer, their personal knowledge, and the constraints and form a solution.   Each solution should represent the requirements and constraints of the project.   An architect should understand building practices but does not have to be a practiced builder.   They need to understand the innerworkings and requirements for each design choice.   For example if I put down laminate flooring I need a underlay to reduce noise.  An architect should be the master of proactive administration.  Looking to reduce risk on a design and meet customer needs.   Each systems architect needs a methodogy to ensure they don’t miss critical steps in the process.   In systems I like to use the conceptual, logical and physical design model.   An architect does not form the perfect solution.  They form the solution that meets the customers needs with an eye to the elements of design.

Elements of Design

Early in my career I struggled to understand the elements of design.   What critical thinking should I use to make sure my architecture will work well.  VMware introduced me to the elements of design which mirrored my own really well.   I use the term RAMPS to remember them:

  • Recoverability – How do you recover the design from a failure, what is the requirements needs,
  • Availability – How do you ensure availability of the solution,  What options do you have
  • Managability – Is the design too complex and costly to manage, how do you manage it
  • Performance – Does the design meet performance needs and take into account growth
  • Security – Does the design meet security needs and requirements

 

I would like to illustrate the elements of design with a simple scenario.  The customer wants to deploy a web server running drupal for some new brand site.   The following questions might help you figure out the requirements and constraints while ensuring the solution meets RAMPS.

Recoverability:

  • What is the expected RTO (Recovery time objective, how long to get it back into service after a full failure)
  • What is the expected RPO (Recovery point objective, how much data is ok to lose in a failure scenario)
  • How do you expect to backup the application, database and user-generated data?

Availability:

  • Is there off hours for the application?
  • How much planned downtime is acceptable for the application per month?
  • What is the cost per minute of unplanned downtime?
  • Do you have a SLA (Service level agreement or objective) with your customers?
  • Who are your customers and where will they be accessing the application from?

Managability

  • How do you expect to make changes to the application?
  • What are the roles involved in this project (Form a RACI)
  • How often do you expect the content to change?
  • Is there any unique requirements around the application that we need to know?

Performance

  • How many concurrent users do you expect?
  • How large is the application?  Do you have any test metrics or data to show usage patterns or expectations?
  • What is reasonable response time from the application?
  • Any unique performance requirements?
  • How much network bandwidth do you expect the solution to use?

Security

  • Does you application require a login?  Where are they kept?
  • What type of data is stored in your application?  Is it sensitive
  • What is the cost of a data breach on this application
  • Are there any security policies from the organization that should be taken into account

 

At lot of these questions will yield no answer or unknowns.  The performance metrics are a particularly sticky question.  This is where our friend assumptions come to town.  When you don’t know write down an assumption so people understand what your designed to with a lack of information.   For example the customer may share they have no idea how many concurrent users will use the application.   You should make an educated guess about the number, get the customer to sign off and move forward.

 

So what is an Architect?

So what really is an architect?  It’s someone to attempts to meld best practices with customer requirements to form a usable solution.  A systems architect has to take into account all types of things like:

  • Interconnections between logical and physical elements
  • Building space
  • Capacity
  • Logical architecture of the solution
  • Cost
  • Power
  • Best practices
  • Current practices that are constraints
  • etc…

It’s a fun job that changes each day.  If you do it correctly you should be reducing the reactive nature of your systems architecture.   You need to plan, document, study then plan again.   It’s a detail job that requires lots of thought but mostly lots of reworking and negotiation.

Negotiation?

Yep in order to architect you need  a customer.  When building a house everyone wants a huge house with gold walls.  You need to manage the expectations to successfully complete the solution.   You have negotiate.  The first rule of negotiation is simple every answer is a “yes, however”  the customer can have anything they want, as an architect you have to help them understand the impact.    Every choice has an impact just like every action has a reaction.   If you want gold walls the cost will be impacted.   If you want no bathroom on the first floor, expect to be an expert stair climber.  Being an architect is as much about people skills as technical skills.

Systems Architecture Conceptual, Logical and Physical

There have been a number of articles of late on systems architecture all with the tag VCDX.   While I appreciate using that tag on my posts gets me 40% more hits it would like to focus on the larger concepts of systems architecture.   I am a VCDX certified person but really all that does is validate my ability to design systems architecture.  I have also have the pleasure of mentoring other VCDX candidates.    This has allowed me to meet some very talented people and learn from them as much as they may learn from me.   One common challenge faced by almost all systems engineers switching to architecture is the concept of conceptual, logical and physical diagrams and design.   I also struggled with these concepts.   Every systems engineer lives in the physical world.  It’s 100% details in order to get the work complete.  Product names, commands and hardware configuration are the realm of most systems engineers.

What is the definition of each?

The best definition of each layer is the article by Zachman here.   This is referenced in every Datacenter design test by VMware and is by far the best description.   I personally like one line answers so let me give you mine:

  • Conceptual – Explanation of what the solution should do in terms that your non-technical significant other can understand – In my case my wife had to understand what the solution should do by this design
  • Logical – Should interconnect between elements without specific products – defines how they should do it – if you have product names on the logical design you might not be doing it correctly
  • Physical – Detailed description of solution with products and interactions

Why do I need conceptual?

Conceptual takes business requirements and translates them into something consumable by the business and technical folks.   They should show what the solution should do.   No products or technical details just business.   This would seem like a waste of time for technical folks who say I know what it does… This is a failure to understand that the conceptual design is your get out of jail free card.   When combined with requirements and constraints the conceptual design allows you to design as you please with the blessing of the business unit.  It should show all their requirements and get their sign off before you pass to logical.

Why do I need logical?

Logical is created the start to show interactions between components.  It is critical because it get’s you thinking about everything required to complete the solution.  Do I need a database?  Do I need redundancy?  How do I create redundancy?  etc…  Get the logical on paper and it will get you thinking about the things you missed.   Why does the logical not have product names?  Because you should not choose products until you get to physical design.  You should design to meet the customers requirements not the products limitations.  It is possible that you do have product limits due to constraints but they should not limit the logical design too much.   Logical is about connections, communication and figuring out the scope of the problem while providing the how does the solution work.

Why do I need Physical?

Because without physical it will not work.  Why can’t I turn the physical over to the implementers?  … You can but they may not benefit from the exercise you did going through conceptual and logical.

An Example to learn this process

Last weekend I attended a school event with my daughter.  They have a new stem program that teaches critical thinking.  Once of the showcase activities was a design challenge.   It was interesting to see how my daughters (8 and 5) approached design.  The design challenge was as follows:

Using the following ingredients create a vehicle that can move by the power of air:

  • 4 Life savers
  • 2 Staws
  • Paper
  • 2 Paper Clips
  • Scotch Tape

So from a design perspective we have the following:

Requirements

  • Must be a vehicle
  • Move by the power of air

Constraints

  • We have limited ingredients

Assumptions

  • The power of air means your breath
  • There is no need to carry any load on this vehicle

 

Approach on design

I was determined to not get involved in guiding.  I wanted to watch them learn from the experience so no parent interaction.

Five year old – Her approach was simple grab straws and start cutting the first side of her car, she taped a lot and stuck thing together.  She went straight to physical design with the raw materials

Eight year old – Her approach was a little more thoughtful.  She gathered her ingredients and looked at them.  She held the straws in different positions and sketched them out on the paper.   She created a logical design.   Showing how she planned on getting the sail into the design.

Five year old – She completed the car portion very quickly and was impressed with her rolling car… but had missed the requirement to have it move by air… her attempts to blow on her car proved that it would not move.  She was able to push the car and was pleased by it.   I reminded her that a requirement was it move by air.  She had cut most of her straws and now had a true design challenge.

Eight year old – Completed her build including a sail and tested it to prove it was working

Five year old  – cut an odd sail out of paper and taped it on the boat to find it would not move it forward very well.

 

This little exercise proved something to me I have seen many times in my career.   Systems architecture should be about 90% planning and 10% execution or you spend at least double the amount of time trying to execute and re-engineer.   I know planning and documenting is boring work but it does cut down on frustration and improve the overall product.   It will remove the simple mistakes.   I highly suggest you try a conceptual, logical physical design model for your next project.

 

 

Disabling the new ATS heartbeat with PowerCLI

The issue with ESXi 5.5 U2 and 6 seems to be more wide-spread than just IBM storage.   One of the main problems with the current solution in the KB is that is requires that you login to each host individually.  A very smart co-worker provided this script with help from VMware’s BCS team.   He has given me permission to post it to assist others.   No promise that it works perfectly but it should help you on your path.

 

A few notes:

  • It is set to test for the presence of the setting and do nothing if already set
  • It is set to test if it’s 5.5 or higher
  • It is set to test if it’s build 2068190 or higher

 

Param(

$VIServer = (Read-Host “Type the vCenter servers name and hit [ENTER]”)

)

if ((Get-PSSnapin -Name VMware.VimAutomation.Core -ErrorAction SilentlyContinue) -eq $null)

{Add-PsSnapin VMware.VimAutomation.Core}  <# Checks for the existence of and if not found,

loads the VMware snap-in#>

if (!($DefaultVIServer.Name -eq $VIServer -and $DefaultVIServer.IsConnected -eq “True”)) {

Connect-VIServer $VIServer -Credential (Get-Credential) -ErrorAction Stop | Out-Null

}

 

$vmhosts = Get-Cluster | Sort Name | Get-VMHost | Sort Name

foreach($vmhost in $vmhosts){

If($vmhost.ApiVersion -ge “5.5”){

If($vmhost.Build -ge “2068190”){

$esxcli = Get-EsxCli -vmhost $vmhost

If(($esxcli.system.settings.advanced.list($false, “/VMFS3/UseATSForHBOnVMFS5”)).IntValue -eq “0”){

Write-Host $vmhost.Name “already set to revert heartbeat” -ForegroundColor White

}

Else{

Write-host “Starting on” $vmHost.name -ForegroundColor Yellow

$esxcli = Get-EsxCli -vmhost $vmhost

$esxcli.system.settings.advanced.set($false, 1, “/VMFS3/UseATSForHBOnVMFS5”)

Write-Host “Finished with” $vmHost.name -ForegroundColor Green

}

}

Else{

Write-Host $vmhost.name “build is less than U2 (2068190), so skipping host” -ForegroundColor Red

}

}

Else{

Write-Host $vmhost.Name “ESXi version is less than 5.5, so skipping host” -ForegroundColor Red

}

}

 

I hope it helps you. Please don’t use it unless directed by VMware support.