How to prevent database contention in continuous integration

We’ve used a few different continuous integration stacks for Rails over the last year at work—first CruiseControl.rb, which we found a little too complex to administer, then a custom bash script (which worked well, but took a lot of tweaking to get just right). When we eventually switched to git last year, we took the opportunity to try Integrity, a cute lil’ Sinatra app.

Integrity mostly “just works,” and it’s been a happy switch. One thing we lost in the move, though, was code that protected against resource contention when two builds are running at once. This is definitely a problem with Rails, since a typical database.yml tells Active Record to use the same database for all test runs. So you’ve got multiple builds hitting the database at once, dropping tables, creating records, and so on. Yikes.

Our old bash script used a filesystem lock and a queue to only run one build at a time, in order. In theory, this is the most sound approach, but hey—our build server has 8 cores and 16 GB of RAM, plenty of room for parallelism. During a pair-programming session this week with Jared Grippe, we decided that the best approach is to solve the contention issues and allow multiple simultaneous builds. We figured that’d keep the rapid feedback up-to-speed when the commits are flying in.

  1. Stop putting code in that little “build script” box in Integrity’s configuration page. Instead, drop it in a rake task, so it’s versioned and kept safe.
  2. In your build script, set a unique database name for the current build, and use ERB in the server’s database.yml to interpolate it in.
  3. Have the build script run rake db:create and rake db:drop, so that your databases are created and cleaned-up automatically.

Here’s an example script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
desc "Run continuous integration suite"
task :build do
  ENV["RAILS_ENV"] = RAILS_ENV = "test"

  # use ERB in config/database.yml to make this the database name:
  # database: <%= ENV["DB_NAME"] %>
  now = Time.now.utc
  identifier = "#{Process.pid}#{now.to_i}#{now.usec}"
  ENV["DB_NAME"] = "myapp_test_#{identifier}"

  begin
    Rake::Task["db:create"].invoke
    Rake::Task["db:test:load"].invoke
    Rake::Task["default"].invoke
  ensure
    Rake::Task["db:drop"].invoke
  end
end

Appending the process ID and time in microseconds to the database name is about as unique as you can get, without generating a UUID or something. Note how we wrap the build in a block, and perform db:drop in the ensure section: that way, the database is removed even if the build fails (which would normally abort your rake task).

Keep in mind that the database might not be the only shared resource used by your build—watch out for filesystem use, in particular. You can probably use a similar strategy to solve that problem.

Testing your dependencies with RSpec

I’m finding that managing my projects’ code dependencies is smelling worse and worse as time goes on. Code bases get bigger and acquire libraries as they grow; a part of your project sits untouched for a few months and its particulars leave your medium-term memory, and so on.

In Rails, we can freeze lots of stuff to our vendor directories. I do that as much as possible—gems that I only use for Rails apps get frozen to vendor/gems and then uninstalled system-wide; I use the gemsonrails plugin for this. If the little gem bits aren’t necessary, you can just pistonize a repository. Old news.

That’s not going to fly for platform-compiled gems, or even compiled libraries that aren’t gems at all (since you’re possibly running several different platforms between development and production). So I’ve been cooking up ways to keep myself sane:

The Simplest Thing That Could Possibly Work, I think, is just a quick test failure when a dependency is missing. If you’re already autotesting locally, and automatically running your test suite on each production machine as part of your deployment recipe, a quick, obvious exception could save you a little misery.

What I mean by “obvious”

This all came about because I went through two development platform switches recently: first, a clean install of Leopard, and just last week, a move to Intel from my old PowerBook. Both of those hosed my gems, and although I got test failures for each “broken” part of the app, certain libraries’ lazy/quiet-loading techniques don’t raise exceptions in a way that’s obvious.

For example, Rick Olson’s fantastic attachment_fu plugin is meant to work just fine for non-image files, so if you don’t have a compatible image processing library installed, it’ll just skip the thumbnailing for images and move right along. So my image-uploading tests failed on not creating the right number of files and records. It took me way too long to figure out what was going on, so I think it’d be better if I was checking for known dependencies directly.

First try

1
2
3
4
5
6
7
8
9
10
11
12
13
14
describe User do

  it "depends on one of three image processing libraries" do
    processors = %w(image_science RMagick mini_magick)
    lambda {
      begin
        require processors.shift
      rescue LoadError, MissingSourceFile => e
        retry if processors.any? or raise e, "Make sure an image processing library is available"
      end
    }.should_not raise_error
  end

end

Pretty good, although so much space between it and end makes me sad. Also, attachment_fu’s requirements are kind of an edge-case; I want to be able to spec a requirement for only one library, or several all at once.

Less sadness with matchers

Read up: if you aren’t using matchers, you aren’t using RSpec.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
describe User do
  it "depends on an image processing library for attachment_fu" do
    one_of(:image_science, :RMagick, :mini_magick).should be_loadable
  end

  it "depends on SHA libraries for password hashing" do
    both_of('digest/sha1', 'digest/sha2').should be_loadable
  end
end

describe Event do
  it "depends on chronic for date/time string processing" do
    :chronic.should be_loadable
  end
end

describe Post do
  it "depends on a text processing library for Markdown support" do
    either_of(:maruku, :RedCloth).should be_loadable
  end

  it "depends on some XML libraries" do
    all_of(:hpricot, :builder, :haml).should be_loadable
  end
end

The matcher I wrote to do this is a little beefy, around 60 lines. To check it out, you can grab it from svn (or in the <3 warehouse), or from pastie.

I’m now using this all over the place, and it’s saved me at least a couple headaches. It’s really helpful for making sure your CI and deployment environments are up to spec, as well.

I’m sure there’s more to do—like checking gem versions. How are you checking your dependencies from platform to platform?

Update on Nov. 30, 2008: well, this was a useful experiment in writing matchers, but these days I’m just using Rails’ built-in config.gem tool. Highly recommended.