How to optimize images for faster load times with Kelly Sutton

Interviewed by Christophe Limpalair on 11/30/2015

Optimizing images can often yield some of the largest byte savings and performance improvements for your websites. This episode answers the question "How can you make images load faster?"

Kelly is the CPO at imgix, where they serve 1 billion images per day. Don't worry, we definitely go into the details of how they optimize images on the fly.

My sponsor, Linux Academy, is offering viewers unlimited access to high-quality training courses and a 24% discount.

Downloads

Similar Interviews

Millions of requests per hour and processing rich video analytics at Wistia

Handling trillions of events daily and conquering scaling issues with Keen CTO


Links and Resources

Expand

Interview Snippets

Actions to take

Perform an audit on your current image setup:
  • Serve the correct image sizes (ie: Serving a 800x400 image in a 300x150 container is wasteful)
  • Use a CDN
  • Set cache-control headers
  • Serve the correct image formats (ie: WebP for Chrome)

Continue to read to learn how and why.

You're the Chief Product Officer at imgix, what do you do on a day to day basis?

02:36

Kelly manages the product and everything that it touches. For a service like imgix, data is very important to them. Keeping a close eye on this data and staying informed on how current (and future) users use the product is important. It helps them make decisions on how to improve the product, and how to schedule new features.

I've also seen him contribute to their GitHub repos. These repos host imgix libraries for users to use. Of course, that usually results in feature requests or solving unexpected bugs. Kelly says this is a very important part of what they do. In fact, they just hired someone full time to manage their open source libraries.

Speaking of hiring, Kelly is employee number 10, and they are now at 20 employees.

What are some use cases for people who use imgix?

4:28

imgix is a real-time image processing solution.

What does that mean?

Technically speaking, imgix is an image proxy. It can resize, process, add filters, and change formats of images just by changing URL parameters.

They like to call it "the graphics card for the internet."

Their founder and CEO, Chris Zacharias set out to solve an issue that he discovered while working at YouTube. They were having a hard time managing so many thumbnail images. He assumed that this was a problem a lot of other companies had as well.

When it comes to images, a lot of people might just serve the same image across different browsers and devices. This isn't economical at all, but implementing a solution to fix this issue could be more painful.

When it comes to ecommerce, longer page load times directly affects sales. Every extra second of load time decreases conversion rates by about 7%. [Source]

If conversion rates aren't enough to convince you, serving lighter images should reduce your CDN bills because you are using less bandwidth.

We'll explain how to optimize images in just a moment. First, let's take a look at how imgix works.

What kind of scale are you running at?

8:10

They serve about 1 billion images per day.

Another important number they look at is revenue. Kelly says they are very interested in doing this for a while, so a healthy business is a must.

When was imgix founded?

2010

What does your architecture look like?

9:34

They have a clean split between their web architecture layer (which serves the website and user dashboard) and their image proxy layer. The former is very small compared to the latter.

The image proxy layer is designed to sit between the origin and CDN, and process images in real-time.

They use Mac Pros to handle graphics processing. Yes, they do run on their own hardware. The data center is down in Santa Clara.


(See original post here)

They do use some open source tools like:

And a handful of others.

There is also a decent amount of custom built code. All of the graphics processing is built in C, Objective-C, and Lua.

The reason for this is because of the number of images they have to process every day. It needs to be lightning fast, and as smart as possible. These low level languages give you access to what you need.

What CDN do you use?

12:18

They can work with any CDN (what are CDNs, and which one should I use?), but by default they work with Fastly.

What's the origin you mentioned?

11:55

The origin is where the images live. This can be whatever you want, like S3 for example.

Since imgix is a proxy, they don't need to host the original images.

They also have layers of cache, to make sure that they don't have to go back to origin unless they need to.

They do this by respecting cache-control headers, for example. That way, you can lower S3 costs.

What do you mean by different layers of cache?

13:06

When you make a request for an image on imgix, it hits a CDN edge node (close to where you are located). If the image isn't found in that edge node, then the request will travel to their data center, where it checks whether it has the image or not. If it doesn't have the image, it needs to process it.

When imgix processes an image, it needs to have that image downloaded. Once downloaded, the image is cached in imgix's data center where it processes it, and then sends it to the edge node.

What does this look like in HTML? Do you simply change the img src?

14:56

All image transformations are specified by the URL parameters. For example, if you want to resize something, you'd add ?w=800&h=600 at the end of your image URL.

The neat thing here is that you only have to worry about the master image. So you can upload a large sized image, plug in your storage to imgix, and let it worry about creating image derivatives. Those derivatives never go back to your storage.

You're able to process images on the fly, and that's still more cost effective than loading a larger image?

16:23

Yes. Here's why:

Even if the image isn't in cache and already processed, the whole process takes ~700ms. But, the thing is, that only has to happen once. After an image is already processed and cached, it will not need to go through this process again.

So the first request is the slowest one

17:12

Yes, but Kelly clarifies that the processing itself is not slow at all. They're always looking to make this faster, but traditionally the slowest link is the connection between their data center and the origin.

How did you get it down to 700ms?

17:45

The fact that they own their own hardware makes a pretty big difference. They can optimize a lot. No noisy neighbors issues, no request scheduling issues, they're in control.

If they were to run this on AWS, not only would it be more expensive, but it would also be slower.

What kinds of image processing do you perform on your hardware?

18:50

They have 83 or 84 different kinds of URL parameters that people can use. All of the operations are constant time operations O(1).

The most common ones are pretty simple, like changing the image height and width, or cropping the image.

They do have a predefined order of operations, where they will resize an image before changing its, format for example. An example of changing formats would be with WebP which is an image format that only (currently) works with Chrome. WebP images are quite a bit smaller than PNGs and JPEGs, so it would be really nice to serve that format when possible.

Having this kind of processing in place makes it possible.

Why Apple servers? Is it that much better hardware/software for this kind of work?

20:22

The hardware and software combination does make a difference.

They've been able to use Core Image, but another main advantage is that it is the most affordable way to process images at their scale.

Per dollar, they can do more image processing with Macs than if they were using Linux based systems.

If you're doing this kind of image processing on Linux, you're probably using something like ImageMagic. ImageMagic can get the job done, Kelly says, but most companies will outgrow that.

If people don't have access to image experts, what optimizations would you recommend?

21:41

1) Best thing you could do to optimize images if you're trying to get the page weight down, is to resize your images.

Just the simple act of making sure that you are serving an image the size of your container (ie: container width = 600, image width should = 600) is a big optimization.

Even if you don't load an image sized to a container, you should still optimize (resize & change quality) the original image before uploading it. This is easily done with software.

You could even do it in Photoshop, by using Save -> Save for Web -> Choosing a new quality. Though that's a bit more manual. For a faster method, you can use gulp packages.

These packages compress and strip out unnecessary meta data, slimming down the image weight.

Other alternatives: TinyPNG and TinyJPEG

2) Use a CDN
Just like we talked about in the previous episode of ScaleYourCode with Max Schnur, Ilya Grigorik wrote a book called High Performance Browser Networking which covers the challenges that latency introduce.

Bandwidth isn't necessarily the problem anymore, latency is. How fast can you get the first byte? Reducing that time is key to loading images and pages faster. CDNs are the solution.

3) Set cache-control headers
The number of websites that don't do this is shocking. Big optimization here.

One of my posts explains how to set cache headers in S3.

Linux Academy is offering viewers a discount for their high-quality training courses


With close to 2,000 videos and real servers to complete hands-on labs, Linux Academy offers courses on AWS, DevOps, Linux, and OpenStack.

These courses are great for both individuals and teams. They serve companies like Rackspace and Linode, and schools like Clemson University and The University of Arizona.

Linux Academy is a trusted partner of The Linux Foundation, OpenStack training, AWS, CompTIA, and more.

They're the real deal. I know because I'm learning from their courses.

How are you going to reduce latency between your data center and image origins?

25:49

Just to reiterate, the origin is where a customer stores their images.

Kelly says that there are technical and human solutions to this.

The technical solution involves improving the origin cache (what pulls the image from origin and stores it in imgix's data center)...basically making sure that the origin is used as little as possible.

The human components on that are mainly around education and documentation. For example, customers need to understand that having their origin closer to imgix's Santa Clara data center will reduce that latency for initial fetch times.

The other tip for S3 users, is that S3 buckets default to Virginia. This means imgix has to send/receive packets from across the country. On top of that, the default S3 buckets do not have read-after-write consistency so that you can give S3 a file and ask for it immediately. S3 will sometimes tell you that it doesn't have the file.

If one of your users uploads a file and you want to display it immediately, imgix will ask S3 for the file and S3 will say it doesn't have it.

Other S3 regions don't have this issue. Knowing these kinds of tips can make a difference.

What other technical issues do you have right now?

28:15

They have just recently made a big push on the technical front by adding plenty of capacity, so they don't have many technical issues at the moment. They're in a cleaning up phase to prepare for the next push.

The extra capacity they have is crucial because they must be prepared in case customers run through heavy spikes in traffic.

Now, that's not to say they can't optimize. Since they're able to monitor every part of the system, they can see which parts should be faster and they can figure out how to optimize those parts.

How do you find bottlenecks? What do you use to monitor?

29:55

There are different tools for different parts of the chain. One of the tools is Prometheus.

They also use Graphite and Riemann to display charts in the office. That way, they can keep a close eye on things.

If you are interested in learning more about Riemann and monitoring, check out the episode with James Turnbull.

Whenever Kelly is doing Rails work, he likes to use Skylight.

Instead of specifying parameters manually, you have an Auto Responsive Images JavaScript Library

32:07

imgix.js is a client-side library that you can simply include in your page(s). This library basically makes sure that all of your images are served at the right sizes depending on their containers. Especially if the window resizes.

They like to think of it as the level of control that they wish browsers built into what you can do with images.

They actually just saw something last month called Client Hints which is a step in that direction.

What are Client Hints?

33:28

Chrome, if you tell it to, will include extra information in the request headers for images. More specifically, the device pixel ratio of the client and the width of the container (if it can be computed) and the width of the viewport.

imgix can take that information and compare it to what a customer is requesting with URL parameters. Depending on configuration, imgix can give Client Hints precedence.

You don't need imgix.js to take advantage of Client Hints, you just need to change the URLs a little bit.

What is Automatic Content Negotiation?

35:16

This is similar to what we just talked about, but for file formats.

If you want, you can serve the most optimized file format depending on the browser. We briefly talked about this with Chrome.

Most images on the Internet are JPEG and PNG, which are both old formats. There have been many advances in compressions since then. Just by switching to WebP, you could see important reductions in size without noticeable differences in quality. At the moment, Chrome is the only browser to serve WebP.

Other browsers have their own file format they're rolling with.

What if we implemented this with JavaScript?

36:54

The problem with this approach is that you have to wait for JavaScript to execute before you can begin image requests.

Plus, it could take a while for JavaScript to swap in the right src attributes on image tags because browsers often do preflight image requests.

When images take a while to load, how can we make the page load feel fast?

39:30

It really depends on the scenario.

If you're loading a very high resolution image in a lightbox (like a popup window), there's probably going to be a navigation step immediately before that. So instead of showing the high res image in the thumbnail, show a much smaller and more optimized version.

If you don't, you'll have that really slow load from top to bottom.

What about loading the dominant colors of an image and using that as a placeholder while images load? (Pinterest and Medium do this)

41:28

You can really crank up pixelation of an image to the point where it doesn't look anything like the image, but it has the predominant colors and serves as a placeholder. Once the main image is ready, swap it out. This results in less "strain" on the eyes, and makes it perceived to be fast.

Here's an example:



(Images randomly pulled from here)

You could also pull out the dominant color and fill the image container with that color (Pinterest does this). Swap out the color with the real image once it's done loading.

Why did you design your own racks for Mac minis and Mac Pros?

46:00

Mac minis and Pros are rarely seen in a server environment, so they partnered with a company called Racklive to design custom racks for them.


(See original post here)

What's the development process like at imgix?

47:28

They embrace GitHub flow.

Every feature gets its own branch, and is merged to master after completion. Master can be deployed at any time. A deploy should be frictionless, painless, and safe. Those are the goals they strive for.

They're also big on Continuous Integration, and using Code Coverage (how much code is covered under tests) as a yard stick to how protected they are. Tests aren't going to cover you 100% of the time, but they do give you the ability to ship code with more confidence.

What do you, and what does imgix as a company, look for when hiring?

49:50

When they are hiring 'general' developers (as in, not image-specific experts), Open Source contributions are pretty important. Why? Because it's an "easy" way to tell how the applicant works.

How does the developer behave online? Are they respectful, helpful, fast to solve issues?

After that, it's a pretty standard hiring process. Do we get along with this person? Are they smart? Can they succinctly describe what they do? When they give us references, what do they think about the individual?

"I would say it's a pretty simple hiring process."

How to get in touch with Kelly?

kelly at imgix dot com
imgix.com
imgix blog
@kellysutton


How did this interview help you?

If you learned anything from this interview, please thank our guest for their time.

Oh, and help your followers learn by clicking the Tweet button below :)