Giant Killing with Beanstalkd

If you have ever dabbled in Service Orientated Architecture (SOA) or even read some interesting articles about it, you probably have come across the term “Message Queue”. The really terse explanation of a Message Queue, or MQ, is it allows services within your architecture to adopt a “fire and forget” approach to interacting with other services. By placing a queue in the system, non time sensitive operations may be carried out at the leisure of services that care about them, disregarding technology or programming language. As an example, let’s take a “send to a friend” feature within a Job Board application. Once the user has completed the form and clicked “Send”, do we really want the nitty-gritty of sending an email to a friend to live in our Job Board application?

Background Jobs

A common approach to this problem is to use background workers like Resque or Sidekiq. For the problem at hand, these are fine and somewhat more suitable. The only problem I have with that is:

The logic of sending email lives in our application that does not necessarily care about email.
I will probably duplicate the process of communicating to my SMTP server through a few applications within the architecture.
Background workers know a little too much about their origin, i.e. what models they came from, what they can access (my whole app stack).

If your architecture is growing, it may be worth considering moving some background workers to a MQ. For me, MQs just work. You drop in some data and a daemon or application that cares about that message picks it up some time later and acts on it. Meanwhile, the originator has carried on focusing on its core business. As the architecture grows and you add more services, some of these service may need to send email as well. At that point, you have established a clear, trusted method of sending email. You simply drop some data in the email queue and it will get sent.

Beanstalkd

Hopefully now you are getting the gist of why MQs are awesome. There are a few open source MQs available, most notable are RabbitMQ (there is a nice article on RubySource with details) and my personal favourite and what we will be using today Beanstalkd. Getting started with Beanstalkd really couldn’t be simpler. On OSX, you want to use homebrew (brew install beanstalkd) or for debian linux flavour you can use sudo apt-get install beanstalkd. It seems pretty well supported by most package mangers across platforms. You can see the details on the Beanstalkd download docs. Once installed, you can open the terminal and execute beanstalkd

. This will startup a Beanstalkd instance using its default port 11300 on localhost in the foreground. Not always ideal to run it in the foreground, so my typical command looks something like:

beanstalkd -b ~/beanstore &

This simply persists the queue data in a binstore under the directory ~/beanstore instead of just memory and runs the process in the backgound (the ampersand). For development, these settings are fine. When it comes to production, I would suggest you have a read of the docs pertaining to the admin tool that ships with Beanstalkd.

Beanstalkd Lingo

Beanstalkd has some nice vocabulary for describing the main players and operations. Let’s walk through them.

Tubes

A tube is a namespace for your messages. A Beantstalkd instance can have multiple tubes. On a vanilla boot, Beanstalkd will have a single tube named default. The idea is you wish a certain process to listen to messages coming in on a specific tube. As mentioned, tubes just act as namespaces for the consumers of the queue.

Jobs

The Jobs are what we are placing in a tube. It’s common for me to place JSON in a tube and marshall that at the other end. Beanstalkd doesn’t really care about the content of the job, so things like YAML, plain text or Thrift would be just fine. In a normal, happy path operation, jobs have 2 states:

Ready – Waiting to be processed.
Reserved – Being processed

If all goes well, the job is deleted. If there is a problem with the job, say our SMTP server is down, the job is put in a state of “Buried”. It will remain “Buried” until the tube is “kicked”. This will simply place the job back into the “Ready” state. So, with the SMTP back up, we kick the tube and the world keeps spinning. One other state we haven’t covered is “Delayed”. This simply means the job does not enter the state of “Ready” until some pre-determined interval has elapsed. I personally have not used this state much, so won’t cover it any more than mentioning that it exists.

OM NOM NOM

Now we have Beanstalkd running on our development boxes, we want to get some jobs in the queue. To achieve that, my usual weapon of choice is the Beaneater gem. Getting a job into a tube is as simple as:

require 'beaneater'
require 'json'

beanstalk = Beaneater::Pool.new(['localhost:11300'])
tube = beanstalkd.tubes['my-tube']
job = {some: 'key', value: 'object'}.to_json

tube.put job

And that is it. Now we get to the interesting bit, consuming the tube and all the jobs who live there. I am a big fan of a daemon process handling that. If the tubes start getting too full, we can spin up more daemons to help clear the backlog of jobs. Of course, we can also kill them off as required. So far I have used the Dante

gem for wrapping scripts into daemons. It seemed a bit lighter than Daemon Kit and I like to keep my daemons from getting bloated. The benefit of using Dante over something like ruby script/my_mailer_script.rb for me is nothing more than Dante gives you Process ID (PID) file generation out the box. With that, I can keep the daemons in check with monit. Beaneater provides a really nice API for consuming jobs in 2 ways. The first is manually stepping through the process of reserving a job, working on it, then deleting if it completes correctly or burying if an exception is raised. It looks something like this:

beanstalkd.tubes.watch!('my-tube')
loop do
  job = beanstalk.tubes.reserve
  begin
    # ... process the job
    job.delete
  rescue Exception => e
    job.bury
  end
end

A couple of things here worth mentioning. Yes, I’m using an infinite loop and the reserve method on the tube will actually sit and wait for a job to be “Ready”, reserve it, and continue. Beaneater provides a better interface for long running tasks and the above can simply be condensed into:

beanstalkd.jobs.register('my-tube') do |job|
  # ... process the job
end

beanstalkd.jobs.process!

This method wraps the behaviour (albeit in a much better way) of the previous example, reserving, processing, then deleting or burying based on the outcome.

No Magic Beans

The beauty of Beanstalkd is its absolute simplicity. There is really not much more I would be willing to dive into as an introduction. In terms of getting things running quickly, it is no more complicated than any of the background worker solutions discussed earlier. It does make sense to be pragmatic in your adoption of MQs, to be honest. Resque, Sidekiq etc. all have their place and work very well, but Beanstalkd addresses a few more problems, namely, interfacing between services which may or may not be written in Ruby (.NET clients for Beanstalkd are available). In fact, the entire thing is completely language agnostic. The neckbeard way of communicating with beanstalkd is via it’s own protocol over TCP. The Beaneater gem, as you will probably know, abstracts all that protocal stuff into a well packaged API for us. It is safe to say I’ll be leaning on Beaneater gem when using Beanstalkd for some time to come. If I had any advice on designing/composing tube consumers, stick to the Single Responsibility Principle (SRP) as much as possible. There will come a time when you will have to kick a buried job. If that job writes to a database AND sends an email, what happens when the sending of the email blows up? Replaying said message will result in a duplicate database entry. By splitting the processing of the job into the smallest responsibilities that are reasonable, the less you have to worry about performing duplicate actions. I really urge you too look to Beanstalkd as your application architecture grows. In personal experience, I have found it simple to get running, straightforward to manage and maintain, and the ruby client via Beaneater is one of the better interfaces I have used.

Frequently Asked Questions (FAQs) about Beanstalkd

What is the main difference between Beanstalkd and other job queue systems like RabbitMQ?

Beanstalkd is a simple, fast work queue service that is designed to improve the distribution of jobs among multiple workers. Unlike RabbitMQ, which is a more complex message-broker system, Beanstalkd focuses on providing a minimalistic approach to job queuing. It doesn’t support advanced features like routing, persistence, or replication, but it excels in its simplicity and speed. It’s easy to set up and use, and it’s perfect for scenarios where you need a straightforward job queue without the need for complex configurations.

How can I install and use Beanstalkd?

Beanstalkd is easy to install and use. You can install it using package managers like apt-get for Ubuntu or brew for macOS. Once installed, you can start the Beanstalkd service and begin using it. You can interact with Beanstalkd using various client libraries available in different programming languages like PHP, Ruby, Python, and more. These libraries provide an interface to create jobs, assign them to the queue, and process them.

Can Beanstalkd handle large-scale applications?

Yes, Beanstalkd is designed to handle large-scale applications. It’s a lightweight and efficient job queue system that can manage thousands of jobs without any significant performance degradation. It’s used by many large-scale web applications to distribute jobs among multiple workers efficiently.

How does Beanstalkd ensure job reliability?

Beanstalkd ensures job reliability through its job lifecycle management. When a job is created, it’s placed in the “ready” state. A worker can then reserve the job for processing. If the job is processed successfully, the worker can delete the job from the queue. If the job fails, the worker can release it back to the queue or bury it for later inspection. This lifecycle management ensures that no job is lost in the process.

What are tubes in Beanstalkd?

Tubes in Beanstalkd are essentially named job queues. When you create a job, you can specify the tube it should go into. Workers can then watch specific tubes for jobs. This allows you to categorize and prioritize jobs based on their tubes.

How can I monitor Beanstalkd?

You can monitor Beanstalkd using the built-in admin interface, which provides information about the current state of the server, including the number of jobs in each state and statistics about each tube. There are also third-party tools available that provide more advanced monitoring capabilities.

Can I use Beanstalkd with Docker?

Yes, you can use Beanstalkd with Docker. There are Docker images available for Beanstalkd that you can use to run it in a Docker container. This can simplify the deployment and scaling of Beanstalkd in a containerized environment.

How can I troubleshoot issues with Beanstalkd?

Beanstalkd logs errors and important events to the syslog, which you can check for any issues. If a job fails, it can be buried for later inspection. You can then kick the job back to the queue once the issue is resolved.

Is Beanstalkd actively maintained?

Yes, Beanstalkd is actively maintained. It’s an open-source project with a community of contributors who regularly contribute to its development and maintenance.

Can I contribute to Beanstalkd?

Yes, as an open-source project, Beanstalkd welcomes contributions from the community. You can contribute by reporting issues, submitting pull requests, improving documentation, and more.