Poyomi’s architecture: .NET meets NoSQL and AMQP September 21, 2010
Welcome to our first behind the scenes post, an upcoming series of technical articles about Poyomi, our new photobook creation and printing service.
Handling lots of incoming photos, processing them and rendering live previews as fast as possible shouldn’t be a problem with todays powerful server configurations, but doing so with good scalability options certainly requires a special architecture. Here’s out approach.
Our company’s primary development platform is .NET 3.5 with C# 3, which might have had a reputation as being a bit of Microsoft-centric when it comes to interfacing with database engines, but lately this isn’t the case anymore. For Poyomi, we use two “next generation” backend products: CouchDB key/value database and RabbitMQ message queuing server.
Website’s backend is written in ASP.NET MVC and is meant to use as little CPU resources as possible – almost everything is offloaded to the Queue Processor (homegrown package) by sending a message to one of the message queues on the RabbitMQ server. The underlying AMQP protocol is bi-directional, so messages are sent and received with millisecond latency – no polling for messages every few seconds! Even serialized messages in the megabyte range are handled nicely – for instance, a newly uploaded photo from a user.
Consuming the real-time message queue is a job for our custom .NET / C# Queue Processor (the plan is to open source it some day, keep bugging us). It handles message serialization, resource limiting (each CPU gets a single job of this type), task status reports (via CouchDB), logging, task chaining and much more. Some of the tasks that we handle asynchronously by offloading them to the QP instead of doing them in MVC (gah!):
- Resizing images to thumbnail sizes
- Rotating images
- Rendering a live preview of a spread
- Validating uploaded photos
- Downloading photos from external sites like flickr
- Arranging photos on a single page with a special, CPU intensive algorithm
- Calculating photo’s color clusters
- Rendering the final PDF: One task per two pages in a PDF, so adding more QP consumers almost linearly decreases the time needed for the final assembly!
- Little tasks like sending notification emails, contact emails, …
Photo storage duties are handled by CouchDB, a NoSQL Key-Value database with a HTTP REST interface. Everything is stored as a JSON document, with one very useful feature that we exploit fully: binary attachments. Each photo is stored as a single document with multiple image attachments – the original photo and various thumbnails. No more scattered files all across the filesystem, everything is kept together!
Overall scalability is ensured by adding more message queue consumers – Queue Processors. Bringing one up is really easy, since the only two required connections are to RabbitMQ and CouchDB. A pair of configuration variables for each of those, host and port, that’s it. No more network shares or NFS mounts for storage, everything is handled via HTTP and AMQP. We love this.
Our MVC and QP servers are Windows machines, while RabbitMQ and CouchDB sit on a Linux server alongside nginx for web proxying and caching duties.
We’re proudly using many open source libraries, thanks guys! Here’s a little list: Divan, a CouchDB client, RabbitMQ’s AMQP client for sending and receiving queued messages, Json.NET for serialization duties, PDFSharp for PDF authoring, Autofac for Dependency Injection, FlickrNet for consuming flickr’s API and Yahoo’s YUI Compressor for .NET.
Please let us know what you’re most interested in for the next in the series of technical articles – take a look around, you’re sure to find something technically interesting! And the photobooks aren’t that shabby looking either.