Virtualisation Technology – Pie in the sky?

Sep 27th, 2011No Comments

There is much talk of Clouds and Virtualisation, how much is real, how much is hype of things to come, and how much it pie in the sky? Surprisingly the value of these technologies is actually far greater than the buzzwords might suggest, but conversely the technology available is far less mature than we might like or expect.

What is “virtualisation” ?

Essentially it’s a method used to partition a physical server down into compartments, each of which can be allocated resources and run independently of the other compartments. It’s a technology that’s been around for sometime, VMWARE for example have presented it for many years as a mechanism for running Windows on top of Linux. Essentially you partition off some disk space and some memory, then install windows on the disk space, then boot up a “real” copy of Windows, but running “on top” of a copy of Linux, but in it’s own dedicated compartment. What we’re seeing now as virtualisation is simply an extension of that, we add “a number” of compartments to a server, then run a number of virtual machines next to each other, all on-top of a host operating system.

And “the Cloud” ?

Is simply a fancy way of packaging virtualisation technology for sale to developers and end users. Some services provided by Google and Amazon are aimed at allowing you to run distributed / robust applications on top of a pre-existing network of interlinked servers, but when someone tells you that their server is in the cloud, it just means it’s running as a virtual server somewhere, typically as a compartment on somebody else’s physical hardware.

What are the benefits of Virtualisation?

The superficial response to this question is often “cost”, but more in-depth consideration reveals that although cost ‘can’ be a benefit, it is by no means the driving force when it comes to a desire to switch.
Consider an organisation running six Windows 2003 servers. That’s six servers, six lots of power, six sets of hard drives, six sets of backups to manage, and six unique lumps of hardware that could each develop their own unique fun-to-find hardware issues. Then consider, since they’ve had to purchase six machines, they’ve only purchased what they need in terms of performance, so these boxes, although fairly expensive servers, are probably only entry-level machines and as such will give entry level performance.

If you stop to analyse machines already in production, you find that the average CPU utilisation of an office server is in the sub-10% bracket with only occasional 100% spikes. Indeed disk IO is typically in the same category with memory being the only resource that’s generally subject to a consistent demand. So, this tells us that subject to sufficient memory, sharing one physical server between a number a virtual Windows 2003 machines should be quite feasible.
So, it we take two thirds of our budget from the above six server configuration and aim for a single server solution, we’re probably looking at a dual CPU server rather than a single, a hardware RAID controller with multiple disks instead of a single HDD, and generally higher CPU and memory bus clock speeds of the entry level servers. Just to put this into context, you might be looking at maybe 2x the peak CPU performance, and around 10x the peak disk performance, with a single HDD delivering only 120M/sec whereas even a modest RAID array delivering 1G/sec.

So we end up with one more powerful server rather than six more modest servers and whereas there is a tendency to think that virtualised servers will perform worse than physical servers when load is applied, in this scenario the opposite is actually true. If you load up a virtual server, you’re likely to have 10x the IO to play with of an entry level server, and probably twice the CPU power .. and taking into account that servers rarely peak at the same time, and that the majority of business applications tend to be IO bound rather than CPU bound, on average applications will appear to run much more quickly when virtualised than when on a dedicated server. (typically by a factor of between 2 and 10)
Although there are some notable exceptions (gaming servers for example) this is generally a fairly safe assumption with regards to virtualisation and something that can be easily verified in advance simply by monitoring existing systems for a while.

Moving on from pure performance, having all your servers running in containers makes them very easy to manage, easy to backup, and indeed very easy to move to other physical servers in the event of some sort of hardware failure.
So in summary, virtual servers are;

  • Cheaper to implement
  • Easier to manage
  • Perform better
  • Are more resilient
  • Are easier to back-up
  • Are easier to recover in the event of hardware problems

As you can see, from a business perspective there is far more to this than simply saving a few pennies.

So what about the Technology?

Well fundamentally there are a number of technologies, all of which are pretty solid. The three main options are;
VMWARE
XEN
KVM
Hyper-V

My take on these is relatively simplistic, as an Operating System, Linux will ultimately win-out because it’s “good enough” and because it’s “free”. Ultimately KVM will win because it will be carried on the Linux wave as the only option that’s built-in to the Linux kernel as a standard feature. So, although the following issues will affect all options, albeit to differing extents, I’m going to deal with them within the context of KVM rather than trying to cover all bases.

Memory

It would be nice to just have a pool of memory and let different virtual machines take and release memory as required, unfortunately this doesn’t really fit with traditional Operating System design. So although it is possible to dynamically change the amount of memory a virtual machine is using while it’s running, it’s something you might do manually on occasion rather than something which is automatic and manages itself. And of course although Linux supports this ability, Windows does not (!) The upshot is that host machines need to have lots of memory, i.e. the total amount that’s going to be allocated to the different compartments, and a little more for for the host machine. In reality a host machine will use as much as it can for buffers and caching, so to be realistic you can never get enough memory.

KVM does have an extension (KSD) which merges read-only pages from different instances, which can save 20-25% of your memory on hosts running similar clients, but even so memory is still a key issue.

Persistent Storage (Hard Drives)

This is the only really weak point or technology hole when it comes to virtualisation hardware. In order to make hosts resilient, storage space must (in some way) be shared between hard drives on multiple diverse systems, otherwise you risk losing an the contents of a compartment in the event of a chronic hardware failure. (say multiple hard drives failing in a RAID or a server going offline and being physically unreachable by a qualified engineer for an extended period, i.e. a few minutes)
Different cloud providers use different (unpublished) mechanisms for doing this and often quote stupidly large (99.99999999) uptime figures, however recent experiences indicate that the big boys can’t actually keep to these numbers, so take whatever you read re; up-times with a least a teaspoon of salt, if not more.

Read here for an example;
http://www.computerworld.com/s/article/9216064/Amazon_gets_black_eye_from_cloud_outage

Then there is the choice of ‘where’ to put the storage, do you make it available via the local network over something like NFS, NBD etc, or do you put storage on each individual host. Having centralised storage is easy to work with, a single point for replication and backups, a single point for hard drive failures etc, but then you have to access that storage from each node which either means relatively slow access via LAN links, or very expensive access via Fibre channel type controllers.
Again you get the same points with regards to ‘quality’ of equipment. If you’re going to put storage in each node, then you’re going to use cheaper / slower hard drives and controllers, but if you’re putting all your storage into a couple of back-end storage nodes, you can probably afford to spend a little more on your equipment.

The Hybrid Solution

What you (really) want is the convenience of centralised back-end storage, but also the speed you’d get from a locally accessible RAID array. If you spend a lot of money on a couple of servers it’s quite feasible to put an expensive RAID in each box, but this doesn’t scale well either in terms of cost or manageability.

Enter SSD caching.

One solution is to have centralised back-end storage, but to cache storage space locally on a per-front-end node basis. This would always sound like a good idea, it’s just a question of having a working (and reliable) implementation.

Enter Facebook!

From the most unlikely of corners, Facebook have released an opensource project called “FlashCache” which is available here; https://github.com/facebook/flashcache/.
Essentially this allows you to install an SSD into your local node, then partition it up and allocate slices to individual KVM instances. All read and write operations can then be pushed via the SSD slice taking your read / write latency from 8ms (on slower HDD’s) or 3ms (on expensive HDD’s) to under 0.1ms – which is a massive performance boost, with only cache miss read operations having to refer to back-end storage.

Evolution / the next generation …

The next evolutionary stage in virtualisation technology is actually likely to be an evolution in storage technology. Money and models aside, even 10G ethernet when used for back-end storage still can’t match a relatively cheap hardware RAID controller and half a dozen reasonable disks – so what’s really needed is a cheap controller that’s capable of shifting 1Gbyte+/sec between servers without either overloading your CPU or your pocket. Maybe something like a 100G network card, that runs a non-IP protocol capable of shifting information in 256k segments.
And the winner is ???

And by the way …

Virtualisation software, whether it be KVM or VMWARE is now relatively mature and the only real operational differences are visible in the management user interface. Currently VMWARE is leading the way on this front, however keep an eye on the oVirt project here; http://www.ovirt.org/ This is sponsored and supported by many companies (almost everyone apart from VMWARE) as a means to provide KVM with the same level of user interface as is available for VMWARE products. Names such as Cisco, IBM, Intel and Redhat make it quite a credible challenge!

avatar
About author:

All entries by

Leave a Reply

You must be logged in to post a comment.