on chef cookbook re-use

Whenever someone asks me about using community cookbooks, especially when they're just starting out with Chef, I find myself increasingly giving the same two words of advice.

Be careful.

As good, humble software developers, we distrust our own expertise. When someone else has built a tool to do something we need to do, we know we should first try to use it. We've developed all these mechanisms, from object orientation to social coding on github to encourage this behaviour. So it's good that the first thing you think to do when you want to set up apache2 with Chef is to look for an existing apache2 cookbook.

Thus starts the road to madness, at least when you're starting out. That apache2 cookbook depends on an iptables cookbook. Oh, and a logrotate cookbook. And a pacman cookbook. Wait. What's pacman? Taking a look at the github repo, I see it's for supporting ArchLinux. The README specifically states that it's not relevant for anyone else. I'm trying to do this on Ubuntu. But it's a hard dependency, so I need to pull it in too.

If you're not using something like Berkshelf to pull in these cookbook dependencies, you're getting a bit frustrated at this point. Maybe you look into using Berkshelf and it's great, but it's another unique philosophy you have to wrap your head around when all you wanted to do was install apache2 with Chef.

But so be it. It's not like these other cookbooks are doing anything extra. They're just pulled in in case you want to use some functionality in this apache2 recipe that needs them (say, for example, if you are running it on ArchLinux). Right?

Oh, but I've hit a few recipes that do like to do extra things. I remember going down one of these cookbook rabbit holes to install a Ruby on Rails infrastructure, and one of the things one of the cookbooks it included did was create its own slightly different sudoer group, removing the ubuntu user that's typically in the group if you fire up a VM on OpenStack. Because the root user isn't usually given SSH access, and I couldn't figure out how to get in with any of the new users it set up, I ended up having to throw away the VM. Fine. As the new philosophy goes, treat your servers like cattle, not pets.

But lets get back to this apache2 installation problem. Say you wanted to reinvent the wheel and do it yourself. You open up your empty recipe, with no dependencies, and you type in:

package "apache2"

service "apache2"

template "/etc/apache2/sites-available/your-site.com.conf" do
  source "apache.vhost.erb"
  variables(
    :site_domain => "your-site.com"
  )
  notifies :restart, "service[apache2]"
end

Then you make the template for apache.vhost.erb:

<VirtualHost *:80>
  ServerName <%= @site_domain %>
  ServerAlias www.<%= @site_domain %>
  DocumentRoot <%= /var/www/@site_domain %>
</VirtualHost>

That's what? 14 lines of code? Granted the vhost file is probably far simpler than what you'll want in the end. And you'll probably have to enable a few modules. And yep, this makes a lot of assumptions about the system you're installing it on.

But now you get to do exactly what you want in the vhost file, install exactly the modules you want, and you don't have to install four other cookbooks to do it. You don't have to accept the configuration settings that the original cookbook authors thought were good but don't work for what you want to do. You don't have to find a way to work around them if you absolutely can't fit your requirements to their system.

The thing is, Chef actually does a lot to make automating your infrastructure really, really easy. At the first ChefConf, my mind was blown as Adam Jacob argued amongst a series of f-bombs for "primitives over ontologies". They learned the lessons against trying to make a grand one-size-fits-all system for all of the companies they were consulting for pre-Chef. That's why they made Chef the way they did!

So, community cookbooks are terrible and you shouldn't ever use them?

NO! That's not what I'm saying. Don't hit the send button on that nasty email rant just yet, please.

I am saying: be careful. Because coding your infrastructure is such a great thing to do. I don't want you to be turned off of that because you burned out trying to make a community cookbook work for you. When someone tells me they tried Chef and it didn't work for them, it often comes down to a community cookbook not working for them.

So don't let a bad cookbook experience stop you from automating your infrastructure. Whether you use Chef or Puppet or Ansible or whatever to do it, it's still better than a bash script. And bash scripts are better than nothing. Please do something other than doing everything by hand and creating reams of documentation that'll become crusty and old and scary within a couple of months.

By all means, try the community cookbook for whatever it is you want to install. But don't try to force it. If you find yourself shaking your fist at the terminal every 5 minutes, stop, take a step back, and ask yourself if you really need the community cookbook for this particular task. Ask whether it's really so evil if you just learn from it (this I highly recommend) and take what you need (properly attributed, of course). If it's between doing that and saying, "Screw this infrastructure automation stuff. I'm doing it by hand," choose the lesser evil.

You may find that later on, you understand a lot more of what that community cookbook was trying to do and decide you want to move towards using it. Is there anything wrong with that? Maybe you'll even be in a better position to make that decision. In the meantime, you'll have increased your knowledge of Chef and infrastructure automation in general, you'll have more hair still on your head, and you'll have a working, automated infrastructure.

DRY is a great principle, and it's led to a lot of great design. Chef itself is an example of that. But you can't be religious about it. Imagine if you took it to its logical extreme: "Gee... I seem to be writing a lot of for loops. Maybe I should look for a looping library." This sounds ridiculous, but fledgling Rails developers will bring in gems without a second thought to do something they could do in a line of Ruby code. Yes, there's an expense associated with reinventing the wheel. You won't get the improvements those gem authors make if you write that line yourself. But there's also a cost to managing the dependency. When they do something in the next version that makes total sense to them and yet breaks an assumption you made on your problem, now you won't have to scramble to figure out how to patch things up.

Try those community cookbooks, but set yourself some limits to the amount of effort you're willing to expend to make one of them work. Don't be afraid to break away from them if you need to.

Okay, glad to have that out of the way. Now that you've got some healthy pragmatism flowing through your veins, you should really be keeping close track of new developments in Chef and other automation frameworks that make it easier to reuse code so you don't always have to reinvent the wheel.

WHAT?!?!?

I know. I know. I'm sorry. But I don't want to suggest that you should take writing your own cookbooks to the extreme either...

The truth is that people in the community are learning all of these lessons too. They're constantly getting better at designing cookbooks that can be used by others. And Chef is constantly getting better at supporting re-use of cookbooks.

For example, in Chef 10, it was really hard to use composed attributes, where you would define something like default[:a] = node[:b] + node[:c] and then be able to define b and c at any point in the attribute chain. I ended up writing a huge hack in a recipe chain just to sort of make this work. But in Chef 11, that sort of composition just works. And that makes it a lot easier to (a) write cookbooks that are easier to re-use and (b) wrap a community cookbook so it integrates more easily into your infrastructure.

A lot of these ideas are really new. We've had a long time to figure out a bunch of great ways to share regular code while allowing for variation. We're just not quite at the point with infrastructure as code. But it's getting better and better every day, so it pays to be a bit skeptical of the longetivity of any current best practices, allowing them to change as the tools get better.