Podcast: Play in new window | Download (Duration: 1:06:29 — 91.3MB)
Panel
- Avdi Grimm (twitter github blog book)
- Charles Max Wood (twitter github Teach Me To Code)
- David Brady (blog twitter github ADDcasts)
- James Edward Gray (blog twitter github)
- Josh Susser (twitter github blog)
Discussed in this podcast:
Definition of Queuing and Background Processes:
- Queuing is about messaging. Typically first in first out (FIFO)
- Background Processes are processes that pull messages off the queue and do things later.
Systems that people have used
- Poor man’s queue or database backed queue
What do you look for in your queue technology?
- How it takes failure
- Introspection
- It depends on what you’re doing
- Regular tasks vs. Immediate tasks
- One worker vs several workers on several servers
Dave’s beanstalk utilities
Beanstalk RailsCast
Hybrid approaches
- Google’s Geocoding API – Rate-limiting with cron
- Amazon’s FPS – polling amazon to get updates on payments
Best Practices
- Structure jobs so they are simple input/output
- Isolate your jobs as much as possible from the database
- Decouple your application from your queue – Message Passing
- Distributed == Not Dependent – If the job dies and you are broken, you’re designed wrong
- Idempotence on the job
- Messaging Queue == State Machine
- Logging/Emailing when something is unprocessed for too long
What do you send to the background?
- Payment Processing
- Emails
- PDF Generation
- Billing reconciliation
- Geocoding
- External Web Services
- Collate events into an aggregate event
- Farming out multiple jobs to multiple queues
- Separate unusual resources onto other servers
Picks
- Service-Oriented Design with Ruby and Rails (Addison-Wesley Professional Ruby Series)
(Dave)
- Enchantment: The Art of Changing Hearts, Minds, and Actions
(Dave)
- If you gaze into nil, nil gazes also into you (Avdi)
- hacker monthly (Avdi)
- rubythere.com (Josh)
- code for america (Josh)
- Philosophy Gym
(James)
- Philosophy Bites (James)
- Ask a Ninja (Chuck)
- Ruby 1.9 (Chuck)
- RVM (Chuck)







Very good discussion with lots of practical suggestions, thanks. One technique not mentioned was if you are on an EventMachine backed server like Thin, you can use EM's concurrency mechanisms to background tasks, straight from your web app. No other external dependencies, no external processes to kick off, etc. I haven't tried it in production, but to me this seems like a good basic alternative to a "poor-man's queue" for simple use-cases, plus it doesn't tie your queue to your app and probably scales pretty well. I wondered what success people have had with doing this kind of thing in production, in particular how you handled introspection, requeueing etc.
Great podcast! Its really broadening my ruby world. I had a question about something that seemed to be so common-knowledge that nobody really went into it. Charles M.W. even said many times something along the lines of "You don't want your queue to be authoritative" and the other panelists said things like "It should be distributable, and not fault tolerant", and database backed queues were spoken of as being not really the 'right way' of doing it. So I'm wondering, where DO you keep the authoritative list of things that need to happen? If the queue does blow up, how do you find out what needs to be run again? Do you have some fine-grained state in your objects which lets you instantly deduce what kind of background task (if any) needs to happen in order for it to move to a finalized state? Or do you just have another database to back up the database backed queue? Wondering if anyone could shed some light on that!
I was a little surprised that someone didn't mention using queues as an interface/abstraction between systems. I think it's a somewhat prevalent practice (especially in the enterprise world).
+1 That is my primary motivation these days. The Queue is a great way to create a message bus between applications. Perhaps RubyRouges should do a podcast on distributed systems.
I'm surprised you guys didn't mention TorqueBox, which features a lot of easy async processing abstractions (schedulers, queues, topics, daemons, backgroundables, etc), but more importantly, lets you optionally tie those to the lifecycle of your web app, which makes [re]deployment/packaging/installation/admin a lot easier on your "ops guys". Check it out at http://torquebox.org Disclaimer: I'm a TorqueBox developer. ;)
For philosophy podcasts, I'd also recommend "The Partially Examined Life". It can be quite digressive, but often funny and thought-provoking, and the hosts are great. Good episodes to start with are (imo) those on Hobbes, pragmatism, and Danto.