Poyomi’s architecture: .NET meets NoSQL and AMQP September 21, 2010

Venice sample book
Some of our sample books

Welcome to our first behind the scenes post, an upcoming series of technical articles about Poyomi, our new photobook creation and printing service.

Handling lots of incoming photos, processing them and rendering live previews as fast as possible shouldn’t be a problem with todays powerful server configurations, but doing so with good scalability options certainly requires a special architecture. Here’s out approach.

Our company’s primary development platform is .NET 3.5 with C# 3, which might have had a reputation as being a bit of Microsoft-centric when it comes to interfacing with database engines, but lately this isn’t the case anymore. For Poyomi, we use two “next generation” backend products: CouchDB key/value database and RabbitMQ message queuing server.

Website’s backend is written in ASP.NET MVC and is meant to use as little CPU resources as possible – almost everything is offloaded to the Queue Processor (homegrown package) by sending a message to one of the message queues on the RabbitMQ server. The underlying AMQP protocol is bi-directional, so messages are sent and received with millisecond latency – no polling for messages every few seconds! Even serialized messages in the megabyte range are handled nicely – for instance, a newly uploaded photo from a user.

Consuming the real-time message queue is a job for our custom .NET / C# Queue Processor (the plan is to open source it some day, keep bugging us). It handles message serialization, resource limiting (each CPU gets a single job of this type), task status reports (via CouchDB), logging, task chaining and much more. Some of the tasks that we handle asynchronously by offloading them to the QP instead of doing them in MVC (gah!):

  • Resizing images to thumbnail sizes
  • Rotating images
  • Rendering a live preview of a spread
  • Validating uploaded photos
  • Downloading photos from external sites like flickr
  • Arranging photos on a single page with a special, CPU intensive algorithm
  • Calculating photo’s color clusters
  • Rendering the final PDF: One task per two pages in a PDF, so adding more QP consumers almost linearly decreases the time needed for the final assembly!
  • Little tasks like sending notification emails, contact emails, …



All of the previews are rendered in real time.

Photo storage duties are handled by CouchDB, a NoSQL Key-Value database with a HTTP REST interface. Everything is stored as a JSON document, with one very useful feature that we exploit fully: binary attachments. Each photo is stored as a single document with multiple image attachments – the original photo and various thumbnails. No more scattered files all across the filesystem, everything is kept together!

Querying for all of a particular user’s photos is done with CouchDB’s views – a map-reduce system for database queries. The mapping and reduction functions are written in JavaScript and kept in a special JSON document. But its main strength is querying for documents by its ID (the primary key, in SQL speak). Non-blocking, really fast and reliable, it works like a charm for things like task status updates and photo meta information.

Overall scalability is ensured by adding more message queue consumers – Queue Processors. Bringing one up is really easy, since the only two required connections are to RabbitMQ and CouchDB. A pair of configuration variables for each of those, host and port, that’s it. No more network shares or NFS mounts for storage, everything is handled via HTTP and AMQP. We love this.

Our MVC and QP servers are Windows machines, while RabbitMQ and CouchDB sit on a Linux server alongside nginx for web proxying and caching duties.

We’re proudly using many open source libraries, thanks guys! Here’s a little list: Divan, a CouchDB client, RabbitMQ’s AMQP client for sending and receiving queued messages, Json.NET for serialization duties, PDFSharp for PDF authoring, Autofac for Dependency Injection, FlickrNet for consuming flickr’s API and Yahoo’s YUI Compressor for .NET.

Please let us know what you’re most interested in for the next in the series of technical articles – take a look around, you’re sure to find something technically interesting! And the photobooks aren’t that shabby looking either.

9 Comments
Rolle September 22nd, 2010

Very nice article, thanks!
You write that you have a “reputation as being a bit of Microsoft-centric when it comes to interfacing with database engines”.
I then guess you did use something MS-products before? Which ones? Why did you make the change? Have you made any before/after benchmarks of the service?
I would love to see more info about how you use/interact with CouchDB.

rudi September 22nd, 2010

@Rolle: Linq-to-SQL is basically bound to MSSQL. A great and wonderful API (with its flaws and discontinued AFAIK), I’ve benchmarked it extensively, check out the slides: ASP.NET MVC Performance.

I know that some folks are trying to build an open source LINQ implementation that connects to MySQL/…, but that’s not really supported by Microsoft.

Otherwise, no, Poyomi was built from ground up with this architecture in mind, there were no switches. Hopefully the service grow and then I can give a realistic evaluation of how it holds up. Can’t wait for that. :)

Derek Stainer September 23rd, 2010

I was curious as to the performance you are getting using CouchDB essentially as image storage. What are the typical sizes of the images being stored? How many thumbnails per image? In your opinion, are there other use cases where this same use case, binary file storage, might come in handy?

rudi September 23rd, 2010

@Derek: I’d say that most of the photos are in the 2-4Mb range, with 4 additional thumbnail sizes (square, thumbnail, medium, large). We don’t store these photo objects forever, we keep them up to the point that we create a finished photobook out of them.

Performance: latency is good, while raw transfer capabilities were somewhere around 70% of what a good HTTP server (nginx) over 1G network does. Not a problem, for both of our consumers of binary data have aggressive caching: for user-facing content, nginx does the trick, while our .NET code has a custom LRU cache engine as well.

We also store completed photobooks using Couch, that is, the assembled PDF and any preview images that we generate. You can see that on the sample books page.

Mike Hamrah September 28th, 2010

I’m curious if you needed to make any changes to Divan in your interaction with CouchDb. How did you find the serialization/deserialization approach? Was it easy to integrated CouchDb views with the linq-to-sql integration?

rudi September 28th, 2010

@Mike: Yep, I forked the Divan library and added support for binary attachments – previously, the only way to store and retrieve them was by passing around a string. Which is, of course, useless for binary data.

Serialization is handled by a custom implementation of Divan’s ICouchDocument interface. For instance, JSON serialiazer (Newtonsoft’s) adds the .NET type of the object being serialized – this way, abstract classes and its implementations survive the round trip. An example would be: abstract Photo, implementations: PhotoUploaded, PhotoFlickr, … You get the idea.

Other custom settings: for round trip timestamps and enums being serialized as strings, not as actual values:

var s = new JsonSerializer();
s.Converters.Add(new StringEnumConverter());
s.Converters.Add(new IsoDateTimeConverter
       {
           Culture = CultureInfo.InvariantCulture,
           DateTimeStyles = DateTimeStyles.RoundtripKind,
           DateTimeFormat = "yyyy'-'MM'-'dd'T'HH':'mm':'ss.FFFFFFFK"
       });
s.TypeNameHandling = TypeNameHandling.Objects;

Another one: little helper methods that preserve the information about attachments and also support inline addition of attachments – by base64 encoding them. That’s also the only way to upload multiple attachments at once, which really helps with the performance when adding many small attachments – for instance, small, 5Kb thumbnails of a finished photo book.

Something that crept up the past week in production: Divan has a fixed HTTP timeout for all requests that is just absurd: an hour! It might be OK for long running views or Couch’s _changes API, but for retrieving a document… We’ve experienced some odd timeouts when doing the most basic operations at high load, like retrieving a single document by its ID. The current suspect is HTTP’s Keep-Alive feature, which Divan didn’t disable. After disabling it explicitly, we haven’t yet experienced this problem. Not much high load, though. Hopefully soon. :)

Overall, yes, a lot of patches to Divan, I think that some work has been done in the official repository for binary attachments, but unfortunately I didn’t find the time to merge my patches. I’m also thinking about a different paradigm for accessing CouchDB documents, which would require either a brand new client lib or a wrapper around Divan.

Re: linq-to-sql, we don’t use it all in this project. Are you thinking more about the adjustment needed with the usage and development of map-reduce functions instead of writing SQL/LINQ queries?

[...] while true blog » Poyomi’s architecture: .NET meets NoSQL and AMQP [...]

Dale Ragan October 29th, 2010

Rudi, I would like to get your opinion on the direction of Ottoman’s API, which you can find here:

http://github.com/sinesignal/ottoman

You can see an example of the available functionality here:

http://gist.github.com/440871

You can also view the tests.

Let me know what you think.

[...] right! $0.20 per page with worldwide shipping from the best digital on demand printing presses. Behind the scenes with a new .NET/NoSQL approach.   [...]

Leave a Reply

 
where to buy lasix where to buy tetracycline