Initial Thoughts on Cloud-Init

I have a confession: I only recently discovered the awesome that is cloud-init.  I mean, I’d heard of it, but hadn’t ever really dug in to get to know it.  I most definitely fall into the early-adopter category when it comes to technology, too.  As a freshman in college, I helped build a network kickstart installer, years before tools like Cobbler came on the scene.  After college, I had the good fortune to work somewhere that understood the possibilities with VMWare, and was running a nearly 100% virtualized infrastructure in the ESX 2.0 days.  I first discovered Puppet back in the 0.23 days.  I was using Mcollective in production almost 2 years ago.  I wowed my developers with Vagrant in the pre 1.0 days.

But cloud-init.

Okay, I’ll give myself a little bit of a pass because I’m actually pretty new to OpenStack and AWS, but I digress.

The Problem

Now, the general problem that cloud-init solves is not a new one.  I’ve solved it many times over the years.  There are many variations of the solution floating around.  But, to be clear, the general problem I’m talking about is that of taking a system from the point at which it is bootable (point A) to the point it has been handed off to a configuration management system (point B) to realize its true purpose in life.

There are  a lot of steps between point A and point B, and they vary depending on the final desired state of the system.  Generally, though, you’ll want to set the hostname, give the system new SSH keys, ensure networking is happy, maybe ensure a few users and groups exist, install a couple useful packages, maybe update the system … you get the picture.

The Old Days

A traditional, brute-force way of approaching this problem is to simply run post-installation scripts at the end of a kickstart.  The idea being that you build a minimal system with just enough knowledge to contact the configuration management server and it will then do so to finish bootstrapping itself and be useful.  Automation Nirvana, right?  This method works fine for even a fairly large, static, slow-changing population of servers that host long-running  (aka “traditional” or “legacy”) workloads.

With the advent of Virtualization suddenly one had the power to spin up a new system as fast as one could copy an image and boot it.  With that new power came new problems.  For one, Kickstart kinda seemed redundant.  Rather than repeating the same steps over and over to get to a bootable system, why not just do it once and copy the image?  But if one went that route, one needed to keep the golden master as generic as possible to be useful in as many situations as possible.  Solutions like Puppet really only solved problems further down the pipe.  Once the system could talk to a puppet server (or whatever configuration management server you prefer), there was more magic to behold, but there was that nether region, still.

The early days of virtualization were really about doing the same things more efficiently.   More to the point, it was just doing the same things on different “hardware”, at first.  The golden master idea was a good solution to the system image creation problem.  For the really creative (or lazy) types, one could install a shell script to help prep a system for talking to the configuration management server.  As long as one was still in the “traditional” environment, this would work fine.  With a fairly static, slow-changing population of servers it was not a big burden to manually plug each new VM into the configuration management system.  One would quickly run into scalability issues if they attempted to replicate that workflow on more than a couple dozen machines, though.

A Better Solution

Enter cloud-init.  Cloud-init solves this same problem of taking a completely generic golden master image, booting it, setting up things like the hostname, ssh host keys, users, groups, puppet keys (or chef keys), and getting it talking to the configuration management server to finish setting itself up.  It’s a tool to take a system across that nether region from bootable to useful.

Cloud-init is not amazing unto itself, though.  It works in conjunction with a metadata service to make the magic happen.  This metadata service exposes a ReSTful API which holds useful information about the system in question (think CPU core count, MAC address, disk size, RAM, instance id, instance type, kernel version, etc).  This metadata can be used by cloud-init to do a minimal amount of decision making to configure a system for handoff to configuration management.  Further, cloud-init is just a YAML file describing the steps to take.  No shell scripts required.  It’s kind of a proto-configuration management tool.  You can feed the booting image an arbitrary YAML file as user-data at creation time.  With this we have the power to take a generic, barebones base image and get it talking to configuration management, identified as the desired host, in an automated fashion.  Repeatably.

You might be wondering how this is materially different than, say, Vagrant?  Superficially, there’s not a huge difference.  Yes, of course, the details of how each tool gets its job done are drastically different.  Moreover, I tend to use Vagrant for rapid prototyping and as a means to get a working system into the hands of a developer quickly.  At the end of the day, though, this is simply another tool in my toolbox.  But now a better way to solve an age-old problem in a lot of scenarios.  That’s powerful.

The OpenStack End User Guide has very good documentation on how to use this powerful combo within OpenStack.  The cloud-init website has a lot of great examples, too.  And finally, if you’re looking for another plain-language explanation of cloud-init, I’d highly recommend this blog post at Scale Horizontally.

Leave a Reply