Death of the sysadmin and birth of….

Sysadmins

I started my career as a sysadmin..  I didn’t want to spend all day sitting in a chair writing applications.  I wanted to touch the hardware.   I am a firm believer that every sysadmin is a control freak to some respect.  They love how the machine obeys them.   They enjoy telling users no you cannot and figuring out ways limit access.  The essence of every good sysadmin is the innate need for improvement.   In the early days of my career I was exposed to systems administrators who had hundreds of shell scripts they had everything automated.   As the years past these older sysadmins seem to be replaced with younger admins who had been raised in an easy world.   They were used to clicking next to install applications and things that just work… (hence the appeal of the iPhone).   I am all for simple and easy gadgets and bringing computers to every old persons life.   The relative ease of the solutions have made life a little too easy for us.

Cloud

Then this crazy thing happened…. the cloud.   Amazon brought the easy button to server deployments.   Some embraced the ease of the solution, others liked the agility.   What ever your motivation for using AWS they have changed IT again.  Everywhere I go business units want to know why it takes so long to deploy a server.   They want to know how to create their own AWS cloud.   People every where have been deploying operating systems and getting IT done without systems admins..

The Auto Industry

When the auto industry first started Henry Ford and his engineers would assemble a car from scratch.  Everyone working on the car understand each component and how it worked.  They understood the flow of assembly to make the car.   Each of the assembly guys could build a car from scratch or design a car.   As time past demand increased for the product and Ford had to increase his agility to create cars.   He hired workers to build cars, assigned them specific roles with rote tasks.   These workers would do the same task over and over again.  This provided a few advantages: first they got good at the task and they did not need to know how to build a whole car.   It also introduced some challenges: if they missed their task due to human error failures were introduced.   Eventually humans workers were replaced with robotics.  This reduced the errors and increased the cost.  It also allowed Ford to build a lot more cars.  Not all the jobs went away they just changed.  Workers were replaced with robotics and automation engineers.   The people working on the cars had no idea how to build cars they just kept the robotics working.   Every other car manufacturer followed suite to compete.   Auto manufacturing plants became huge, downtime cost millions of dollars.   Massive amounts of money are spent to ensure the plants keep running.

What lessons can we learn from Auto Industry

  • Having highly skilled humans build the cars worked great
  • Having architects design cars then hand off work instructions to workers introduced a lot of errors
  • Having automation reduces errors and requires workers with a new skill set
  • It is not required that the people keeping the automation running have an understanding of the product, they just need to understand the automation
  • The cost of automation will force a centralization of building cars
  • As manufacturing became centralized downtime became a critical issue

What does this have to do with sysadmins?

Thanks for bearing with me this far.  If your still reading and wondering why I wrote this article let me explain.  I want to suggest that the world is changing for systems administrators and as control freaks they don’t really like it.   AWS has created a golden standard we can deploy the system in minutes why can’t you?  If you have not faced this question you will soon.   Every shop wants to have AWS.

Every shop wants to have AWS but do they need it?

AWS has a very specific business model.  Deploy base templates for customers then step away and collect cash.   It’s a great model.  Functionality of the virtual product beyond being powered on is 100% your problem.   AWS ensures uptime of power and networking.   You still have to do a lot of work to make that server deployed in minutes usable.   AWS has saved you the time of procurement of hardware and working with silo’ed team to get a server in place, but you don’t have a money-making machine until you install your product.  What is it about AWS that you really need?  I suggest it is not agility instead it’s less hassle.  AWS provides you freedom from people who seem to create never-ending road blocks while doing their job.  Yes, I am looking at you security team.  Yes, I am looking at you server deployment team.  Yes,  I am looking at you…

Why is IT so hard

IT is hard because it’s never the same.  In my career I have rarely seen the same request twice.   If your business is netflix and you have three types of servers then automation makes sense.   Most IT shops are not netflix’s every single business unit wants to drive IT choices and so we get a spaghetti mess of IT.   IT is hard because the business unit wants to drive technical choices instead of business requirements.   Many years ago we had a business unit demand that their new workflow be built in Sharepoint, forget the fact that we were a linux shop with no sharepoint.   So the lesson is:

  • Business unit’s stop messing with IT.  Bring your needs well-defined to IT and let us implement it.  Trust us to do our job it’s why we cost so much.

Why do Menu’s exist

Restaurants have menu’s for the following reasons:

  •  To limit customers options – they could not possibility have all ingredients
  • To help customers make choices –  if left to their own customers would become confused by the options and leave
  • Create standard workflows and realize cost savings
  • Give the customers illusion of choice

Why doesn’t IT have a Menu…. here it comes the ITIL service catalog.   Most service catalogs are too technical and don’t represent what the customer really needs.  What the customer needs is a service which is normally a lot more complex than a single server.  They have a project.

Project

Yep that word again project… it’s so important we have a certification and role who manages it.    Business unit’s rarely want one more netflix streaming server.. they expect IT to handle that if needed.  They want to create a whole new business and that requires a project.   Our menu really needs to be a project menu not a server menu.   We need to stop offering the business unit separate components of our offering or they will keep getting into our business.  We need to provide the business unit the correct choices that keep them away from dictating technology.

Death of a sysadmin … birth of a ..process engineer

So now that I have ranted for too long what is the future of systems administration.   I think we need to become process engineers.   Very few people are going to understand the whole product.  More will administrate from a automation console rather than logging into a server.   How do we re-tool for this change? I have a few suggestions:

  • Learn to examine process.  Do something manually first.  Document the process in extreme detail, use a process diagram. Critically look at your process diagram.  Do you see how many manual processes you have?   How can you automate them.
  • Help customers standardize, learn the language stop jumping to technical solutions with your customers.  Focus on their needs and requirements allow the technology to be a black box.
  • Develop standard methods for documenting and ingesting new projects… create a documented process and follow it.
  • Automate everything you can, develop solution with the automation mind set.  How would I do this if I had to deploy 100 servers instead of two.
  • Ask your self does this process, technology or choice scale up?   If I had to increase the amount of these by 1,000 would this process work.

Well thanks for reading my rant.  Let me know where I am wrong.

Do IT certifications really matter?

Twice in the last week people in IT have asked this question of me.  My answer has been it depends.  When I first started my career I hated certifications.  This is mostly because in college I attended a Microsoft certification course.   This course was a memorize the content don’t worry if you don’t understand type of test/course.   It seemed pointless to me… I passed the test and still had never worked with half the stuff I was tested on.   The memorized information was soon lost and nothing other and a piece of paper was gained.   This tainted my view toward certifications.  For many years I did not see the point and avoided them.   A few years ago an employer encouraged me to get a VMware certification.  They also offered to pay.   So I took them up on the offer and got the VCP certification.   The required course for the certification was good because it allowed a lot of time for question and answer sessions.   The instructor knew the material very well.   It was a good course.  With a little additional study I passed the test and had another IT certification.

What did I learn?

Knowing I was going to have to take the VCP test made my course learning more meaningful.   I was able to learn with intent.   I now realized that certifications might not have value but the knowledge did…  So since that time I have used certifications to motivate myself to learn.

Wait… certifications should translate into more money right?

While it is true my jobs continue to pay more as time goes along I do not believe this is because of my certifications.  I think it’s because of what I learned while doing the certifications.   Will certifications ensure more money?  Not always.   But more knowledge and skills will translate to more ability to do.

So you convinced me … what certs should I do?

Well here is the tough one.  I can tell you what certifications I see a lot of resumes and job postings:

  • ITIL – This one is on every resume.  Buy a book off Amazon and take the test… it’s not hard and people want it a lot.
  • VMware certification – Virtualization is hot… but only a few places have virtualization only admins..  VCP is normally enough.  VCAP and above are not seen much on job postings.  (Don’t get me wrong I am all about geeking out with VMware certs… as shown by my VCDX but in translation to jobs VCAP will not help you more than VCP… VCDX will but it’s a long journey)  Best fun test on that journey VCAP-DCA (it’s a live test that makes you do it’s so much fun)
  • RedHat certification (normally RHCE) redhat is still the leader in enterprise linux and their cert is a practice test that requires that you do things not just know them.
  • Windows Certification – They are a lot better than they used to be and look great for Windows jobs
  • PmP – if you want to get into technical project management this is the cert.
  • CCNA – If you are interested in networking start here… even if you don’t have Cisco in your shop.

 

Live Tests

My final note is a shout out to all testing systems that require you to work with a real environment like the VCAP-DCA, CCNA or RHCE.  These tests require you know how to do things and are awesome.  No pointless memorization required.  We need more IT tests like this…

2014 Top Virtualization Blogs

lime-cat

So it’s that time of year again time to vote for your favorite vmware blogs.  This year I selfishly added myself.. but there was a name Snafu so it ended up as voting for my name 🙂   You can read the results on the official site http://vsphere-land.com/.  I just wanted to thank the 9 people who voted for me and for the one person who voted for me as number 1 (it was not me I voted for yellow-bricks as always).   Thanks again and I promise to add more content this year.  I have a bit of a secret project and when it’s done in about a month I will be back with lots and lots of posts around design.

Enjoy my favorite cat.

Jumbo Frames per port group

A new friend recently pointed out that jumbo frames can be enabled on a port group basis just like vlan tagging.  I never noticed that… it’s a huge difference in my designs.   Of course it has to be enabled upstream on the switches.  Take a look at vmware’s article on the matter here.

Lots of new posts

Yes I have added a ton of posts in the last few minutes this is because I just moved everything from my other blog back here.  I am making a single home on the internet. Sorry to bomb your readers.

Thanks,

Joseph

File’s that make up a vm in ESXi

For the longest time I always wondered what exactly all those files inside your directory do and their purpose so here is a handy guide:

Configuration File -> VM_name.vmx

Swap File -> VM_name.vswp or vmx-VM_NAME.vswp

BIOS File -> VM_name.nvram

Log files -> vmware.log

Disk descriptor file -> VM_name.vmdk

Disk data file -> VM_name-flat.vmdk

Suspended state file -> VM_name.vmss

Snapshot data file -> VM_name.vmsd

Snapshot state file -> VM_name.vmsn

Template file -> VM_name.vmtx

Snapshot disk file -> VM_name-delta.vmdk

Raw Device map file -> VM_name-rdm.vmdk

.vmx – Contains all the configuration information and hardware settings for the virtual machine, it is stored in text format.

.vswp – is a file that is always created for virtual machines during power on.  It’s equal to the size of allocated ram minus any memory reservation at boot time.   This swap file is used when the physical host exhausts all of its allocated memory and guest swap is used.

.nvram – is a binary formated file that contains BIOS information much like a BIOS chip.   If deleted it is automatically recreated when the virtual machine is powered back on.

.log – Log files are created when the machine is power cycled the current log is always called vmware.log

ESXi Memory Management

ESXi has some interesting techniques used in order to save memory.   These techniques allow for memory over commitment and utilization.

Transparent Page Sharing

This is a process of identifying common items in memory.  For example each operating system has a number of files it loads into memory to operate.  These files are never changed but allow the system to run.  For example windows has a lot of DLL files that are loaded and never changed unless Microsoft patches them.   If you have two guest virtual machines with the same operating system these files can be shared in memory allowing for less memory allocation.   For this reason it’s good to run as much of the same type of operating system / application together as possible.   Creating a standard build and standard operating system allows for a huge gain with transparent page sharing.  TPS is based on pages not specific files so the pages have to be exactly the same.

Memory Ballooning

Lots of people have discussed memory ballooning and many of them do a better job explaining it than I do.   In order to explain ballooning it is important to have a few common terms:

  • host = ESXi Server
  • Guest = Virtualized server running an operating system (in this example we will use Linux)
  • Reserved memory – memory that is guaranteed by ESXi to the guest it can never be swaped

In addition there are three times of memory:

  • Active memory – Memory actively in use
  • Idle memory – Allocated memory not currently in use
  • Free memory – memory available

Guest Swapping

Each guest has memory and swap or paging capacity.  When a process requests more memory than is available in free the operating system swaps out the oldest idle memory to disk.  This is a very costly operation because of the speed of the disk.   The guest operating system is in the best possible situation to choose which process should be swapped due to it’s knowledge of active processes. This is normal system swapping for all operating systems.

Host Swapping

Vmware hosts also has a .vswp file that is create when a guest is powered on.  This file is stored with the virtual machine and is equal to allocated memory – reserved memory.   This file is available for when memory becomes so over used that vmware has to swap.  This is the worst type of swapping ESXi has no knowledge which pages on the guest are active, idle or free so it just guesses.  This can really effect performance.

Remember that only TPS is in operation unless there is contention for memory.

Memory Ballooning

In order to avoid host swapping vmware implemented the balloon driver (known as memctl on esxtop).  It is included with the vmware tools.  What the driver does is work within the guest operating system and request memory pages.  Since it is a driver it has high priority and does not have to return the memory.  This then forces the guest operating system to swap to OS page files.  Since the guest swap is better than a host swap this is a preferred operation.  This can happen up to 65% of the guest allocated memory.   In effect we are tricking the operating system into using less memory than it’s been allocated by having the balloon driver steal some RAM.   This memory gain is then passed off to other hosts allowing for over commitment of RAM.   The problem with ballooning is that it will eat into active processes if the need for RAM is too high, thus forcing the host swap.   Ballooning can only be active if your running vmware tools and if your host has been up for a little while.

Ok so what steps does ESXi take in what order:

This only happens when there is memory contention:

  1. (State:High) If no there is no contention then it does nothing.
  2. (State:Soft) ESXi starts using the balloon driver to allocate memory from the guest OS up to 65% of allocated memory.
  3. (State:Hard) Vmware tries to do a compression on the memory page, if they can get 50% compression then it goes into a special memory location known as compression cache. (Can also be SSD)
  4. (State:Low) If compression cache is not possible then the memory is sent to the host swap .vswap file.

You can tell what state your host is in by using esxtop.

  • Log into esxtop
  • Press m for memory
  • Find the state at top:

In my case it’s high state no memory management going on.   The state is determined by a sliding scale configured by ESXi.   Stored in the advanced variable Mem.MinFreePct.  Once this limit has been reached the state will change until our host is under the limit again.

We can also use esxtop to see how much memory is ballooning on the same page as before

  • Log into esxtop
  • Press m for memory
  • Press V for virtual machines only
  • Find the MEMCTL/MB to tell us how much swapping we are doing.

You can see I am ballooning 633 MB’s.

How do I create an artificial situation to simulate ballooning and swapping?

Just put in a memory limit resource pool and choke your vm’s.

Memory Usage:

What is the difference between Consumed host memory and active guest memory?

  • Consumed host memory – The amount of host memory allocated
  • Active guest memory – Amount of memory actively in use by guest and applications.

vCop’s appliance unable to connect to vCloud director

I ran into this one last week.  I was trying to tie my vCop’s instance into vCloud director with the vCloud adapter.   I tried using the hostname / ip without any success.  Some review of the logs for the adapter showed:

2013-08-08 14:57:50,272 ERROR [Collector worker thread 21] (171) com.integrien.adapter3.vcloud.VCloudAdapter.login - Exception occurred in login:
com.vmware.vcloud.sdk.VCloudRuntimeException: org.apache.http.NoHttpResponseException: The target server failed to respond

I just love java errors… I will spare you all the crap that came after.  The problem was simple but I had not expected it… DNS resolution was not working.  I did not believe it… why would I need DNS when I was using the ip address of my systems.   Well it’s a little complex by vcloud is three cells frontended by a F5 load balancer.   I was using the ip address of the load balancer but the public URL in VCD was set to a DNS name.  When you visit the ip address it redirects you and vcop’s to https://fqdn/vcloud  this redirection using DNS was causing the failure.   So we just have to get the linux appliances to see DNS.

  1. Login as root on both the UI and analytics machine
  2. Change the file /etc/sys/esxiconfig/network/configs
  3. Add the following line
  4. NETCONFIG_DNS_STATIC_SERVERS=”IP_ADDRESS_FOR_DNS_WITH_SPACES_FOR_MULTIPLES”
  5. Run the following command to sync to /etc/resolv.conf
  6. netconfig update -f

Then it should work.  Enjoy.

What is .dvsData

So you have noticed this folder on your datastore called .dvsData what is it?

This is a folder that stores information on a virtual machines connected port when connected to a virtual distributed switch.   Inside this folder is a subdirectory with the UUID of the distrubuted switch. Inside that folder are files that represent a port binding on the vDS switch (which is a specially configured hidden vSwitch).

You can find the vDS’s UUID via the following command:

esxcli network vswitch dvs vmware list

This will also give you a port ID which ties to the file name:

Name: vDS-01
 VDS ID: 14 5e 0d 50 af f5 6d 3a-30 ad 00 b0 f5 d7 cc ce
 Class: etherswitch
 Num Ports: 512
 Used Ports: 4
 Configured Ports: 512
 MTU: 1500
 CDP Status: listen
 Beacon Timeout: -1
 Uplinks: vmnic1
 VMware Branded: true
 DVPort:
 Client: vmnic1
 DVPortgroup ID: dvportgroup-53
 In Use: true
 Port ID: 2
Client:
 DVPortgroup ID: dvportgroup-53
 In Use: false
 Port ID: 3
Client:
 DVPortgroup ID: dvportgroup-55
 In Use: false
 Port ID: 132
Client: Win01-A.eth0
 DVPortgroup ID: dvportgroup-55
 In Use: true
 Port ID: 133
Client:
 DVPortgroup ID: dvportgroup-55
 In Use: false
 Port ID: 135
Client:
 DVPortgroup ID: dvportgroup-55
 In Use: false
 Port ID: 134

Vmware command line start / stop hosts or maintenance mode

Ok it’s rare that I need to do this from the command line but it’s possible from the vMA:

Shutdown host:

vicfg-hostops --server server_name --username user_name --password pass_word --operation shutdown

Reboot host:

vicfg-hostops --server server_name --username user_name --password pass_word --operation reboot

Enter Maintenance mode:

vicfg-hostops --server server_name --username user_name --password pass_word --operation enter

Exit Maintenance mode:

vicfg-hostops --server server_name --username user_name --password pass_word --operation exit

Info on Maintenance mode:

vicfg-hostops --server server_name --username user_name --password pass_word --operation info