You've been served: processed images "on the fly"

Lots of web applications are moving processing of their user-generated images to the foreground, on the fly.

Background

Here at Devpost, we handle a ton of image uploads for our users' various projects, submissions and challenges. For awhile, we've processed each of these images optimistically, in the background. As participation has increased, the load on our primary utility servers has as well.

Recently, a better approach has emerged: image processing as a service. Open-source projects like Firesize, image-resizer, Dragonfly, and Refile are gaining attention while there's already a bunch of commercial competition including http://cloudinary.com, http://www.imgix.com, http://www.blitline.com, https://6px.io. Even Mozilla has ponied up. The approach allows web apps to process images "lazily", or not until they're needed. It is also quite trivial to front the service with CDN, so once processed, we get the benefits of caching over the entire network.

We ended up extracting, Firefly, our own image processing web service, based on Refile. Our service allows use to process images users have uploaded to our site and also embed remote images in markdown. This also means we'll be able to scale our needs for image processing independently from the rest of platform. Win win!

How it works

When you upload an image to Devpost, like a profile pic or a screenshot for one of your projects, we then push that file up to a bucket on Amazon S3. Because we want to show that image in various height x width dimensions on a number of different pages, we need to process the image: scale it down, scale it up, augment the image quality, etc. This means creating additional copies of the pics in different sizes and formats.

Our "old school" approach is to kick of a process that processes your newly uploaded image separately from the rest of the application. Once the processing is complete, the new copies also get uploaded to S3 and the application is then notified that your image is ready to be displayed. Occasionally, you might have already moved on to check our your portfolio before all that processing is complete; because it's resource-intensive, this asynchronous service can fall behind at scale. When this happens, you may see a message like "Your photo is still processing...". Not ideal.

With Firefly, you never have to sit around reloading your page waiting for your photo to process. When you upload your photo, we upload it to S3 as before, but we don't trigger any background processing to create the alternate versions. We still have to show your photo at different dimensions... so how do we get away with it?

When you visit your portfolio page and we want to show your project thumbnail, we insert an image tag with a link that points to our CDN provider. A CDN is really good at caching content, like images, and serving that content quickly. When a request comes through that it doesn't know about, it forwards that request on to our Firefly application. That request contains information about the processing that must occur, like height and width dimensions. Only at that point, do we then retrieve your image back from S3 and convert it to the new dimensions... hence "on-the-fly".

Once the image has finished processing, it is streamed back to the CDN and, at this point, is finally forwarded to your browser. The CDN can now cache the image for future requests for the same thumbnail so we don't have to process the image again. All of this incurs some latency the first time we attempt to render that image, but this is a usually a nominal tradeoff; the big improvement for our users is they never get stuck with an inconvenient message in place of the image they expect to see.

Challenges I ran into

On-the-fly image processing comes with security concerns: the ability to generate thumbnails dynamically through an API should be only allowed when authorized. This means we don't want to allow just any random schmoe fiddling with our Firefly endpoint resulting in useless processing and unecessary caching.

At the time we began implementation, the Refile Sinatra app, off of which our web service is based, did not handle security concerns. So what did we do? We went ahead and implemented ourselves and submitted a pull request of our work back to the Refile project, recently merged. The approach forces clients to generate token based off the request parameters and a shared secret key. Now our application can generate secure URLs for our processed images.

Built With

Share this project:
×

Updates