Large scale image processing on the fly in 25ms with Google's first Network Engineer
Interviewed by Christophe Limpalair on 02/02/2016
We've talked about image optimization before, but this episode is different. First of all, Jack Levin was Google's very first Network Engineer and hearing his stories on that alone is worth it. Second, he's built an impressive cloud engine which he says can process images in 25ms. This speed, obviously, has a lot of benefits, but how is it possible? That's the main topic of this episode.
As you'll see, this engine actually isn't even a service! It's an AMI image you can install on AWS EC2 machines. So on top of the technical details, we can also learn a valuable business lesson. Overall, a super interesting interview.
Links and Resources
Can you tell us about your background? You worked at Google as a first network engineer?
Yes, I moved into the area in 1997 after graduating from the University of Missouri in Columbia. Then, I worked with a company for a year that quickly closed because they ran out of funding.
I met Sergey (Google) at a job fair at Stanford. He told me they were building a great search engine. I tried it out and it worked really well and fast. I was interviewed by Larry and Sergey, and I joined the Google team as employee number 21.
On my first day of work, Larry told me to take the pile of "stuff" in the corner to the data center, install it, and make sure it worked well. As he walked out he told me to not forget to install the 2,000 servers that were being delivered the next day. "The rest is history." I really enjoyed working there.
Did you have a lot of experience before working there?
I had a lot of IT experience in networking...VPNs, NFS, large storage and sharing units...switching and routing...
"It was a wild ride." As I installed all those servers at Google, I applied all my previous knowledge and interests, but had to learn a lot as I went. For the next six years, I built data centers and infrastructure. When I left Google, there were eight or nine thousand people working there, so the company had changed a lot.
Did you found ImageShack after leaving Google?
Yes, my brother said he wanted to do this image hosting thing. I agreed to try it and the success was almost immediate. We literally launched the first server and started making money right away. "We were so spoiled by the early success."
The service was free. The money was from Google ads. Back then, the ad market was really immature so they didn't know what the cost per click should be. We were getting hundreds of clicks at 80 cents to a $1 per click. Within a few months, we were running hundreds of servers. It was really popular.
Two years passed (2007) and I thought we should turn it into a "real" company. I went to Sequoia Capital (an American Venture Capital Firm) and showed them my users figures (25 million users a month). We were making a few million dollars a year which was a pretty good salary for the two of us....a pretty easy salary. Two hours after meeting with them, Sequoia had the term sheets, which was amazing. We signed the deal.
Looking back, we always thought this would be a technology company, so we didn't invest in sales. It was very similar to Google in how quickly it became the thing people wanted to use.
Within the next seven years, MySpace and Facebook came up, and it became clear that people wanted to store and share their images on social networks more than on something standalone like ImageShack.
We also had lots of competitors like Photobucket, Imgur, and Flickr, and the market was very fragmented. In 2012 or 2013 it became apparent that ImageShack, as a standalone entity, would not be a billion dollar company, so we decided to shift gears into something else.
Christophe: I remember using ImageShack a long time ago and, as you say, most social media networks have their own image platform where you can just plug your photos in. This reminds me of Twitter with TwitPic which got big and then Twitter decided to create their own.
For a while, we were Yfrog, which was a competitor to TwitPic. Essentially, we were two big players for image hosting for Twitter.
You saw the imgix interview and you reached out to me to talk about your new company. Why did you contact me about this new product?
Basically, I thought our new product might be something you'd be interested in. I'm really excited about this new company. I think it's going to be really great and become a big part of the stack of the cloud infrastructure.
I have friends (former Google guys) working with imgix and I respect what they have done. Even though each of our products do image processing, we're doing things differently. They have data centers full of Macs. We're going the cloud route. We have gone to AWS and are using EC2s.
Our product is called Imagizer Media Engine. The biggest differentiator is that Imagizer is not a service, it's cloud software that you get to run on your own EC2.
You log in on your own AWS account, which is where your own image library is likely going to be anyway. You go to the Amazon marketplace and and get Imagizer to start. You don't pay us. There is no sales process...you simply click one button and you're good to go with your own instance of Imagizer. The fee is simply included in the price you pay Amazon per hour.
Think of it as a Photoshop in the cloud and it's running for your apps or web site or mobile site. You don't even need to run the CDN on top or do any caching. Once Imagizer instance receives an image, it can transcode all images into different resolutions pretty much in parallel. We can do it in 25 milliseconds or less. It doesn't matter how large the initial image is because the way we've written our algorithms, we're actually using a lot of interesting Computer Vision stuff to decompress and transcode JPEGs in parallel on the fly.
JPEG images are actually 12-Bit images. They're not RGB images, they're YUV images. YUV is not red, green and blue, so the pixels are described in a different way. RGB is described by 3 bytes per pixel. YUV is 1.5 bytes per pixel. You have luminosity, intensity, and greyscale in YUV.
It's better for compression. That's why it's used in JPEG.
We've found a way to do decompression by going natively into the YUV channels and decompressing them in parallel and transcoding. Most of the libraries out there actually take a JEPG image and upconvert it to a 24-bit image which would double the data in your memory and waste a lot of CPU time. We're foregoing this process completely.
Also, we're actually using native Intel..It's a single threaded computation transcoding of visual media that is done in parallel by using a very unique way of using registers within Intel architecture. It's written from scratch. It's written in C and Assembly. There's no ImageMagick anywhere or obviously it just would not be fast.
What instance type is required to run this?
It starts with M3 medium at about 22 cents per hour. You pay Amazon. We are an Amazon partner. They pay us a cut of the software licensing fees.
So you went to Amazon and optimized this transformation for their CPUs, hardware, and machine types. Is that why you have to start with a minimum of M3?
The reason is that it's the smallest unit to have a full core for quick transcoding.
So that makes a huge difference in speed?
Absolutely, just to give a comparison, M3 medium will do 50 to 75 conversions per second...25 milliseconds or less per image. If you used ImageMagick, it would be like 5 to ten conversions per second before you would actually run out of CPU capacity. If you go to the largest EC2 that Amazon offers, it would cost you several dollars an hour to run it, but the largest EC2 would do about 2.5 thousand conversions per second. That's a lot of data. It wouldn't be limited by the CPU but by the I/O.
In other words, how quickly can the additional images come into the box? How quickly can the networking device send out all the processed images out. That would be the limitation...not the CPU. That box is just that fast.
We actually tested them head to head. We would bring out really large EC2 instances. We literally would get like 50 milliseconds per image converted when the box is fully loaded. Usually it's about 2.5 thousand per second.
Did you learn about all this from ImageShack or from Google?
Essentially, ImageShack was dying under the load. It was getting like 2 million uploads a day. For every additional image, we had to produce five different copies. We were quickly running out of space. I spent time researching and found this Japanese paper that described a method of how JPEGs can actually be converted on the fly into pretty much any format.
Gaining insights from that paper and then doing some early prototypes, I was able to create a Command-Line Utility that runs pretty well. Then we realized that Command-Line Utility is great if you want to do batch operations, but we wanted to do it in real time. Besides really fast algorithms, optimization of the server level would help concurrency.
Yes, I come from an image background. I've dealt with billions of images in the past. The product Imagizer came about because a lot of people approached us because of our experience with images. They asked, "What can you do for us in terms of giving us an ability to scale images and optimize them to any format? How can we partner with you?"
That was about three years ago. We weren't at AWS yet, and our solution was to maybe give them some sort of box, a piece of hardware, maybe some service. I realized how incredibly unscalable this business was. It takes three to six months to sign the customer up. They might agree to run it or not. It was just a waste of time.
I decided to do something better...modify the engine we created for Amazon EC2 so anybody can run it on demand. That makes my company have no infrastructure. We just develop software.
We could be in Google Compute as well as other cloud providers. We use those as operating systems. We are using their infrastructure. We are passing on this dynamic flexibility to our customers.
This engine is constantly available at the marketplace, can be used for however long the customer wants, and there are no contracts.
We hope this becomes a part of the cloud stack. You have the S3 or an equivalent where your image library is. Then you have this processing layer where the images can be chopped up, processed, manipulated, and optimized for different clients. Then you have your CDNs which could be EdgeCast, Akamai or Level3 splitting into CloudFront, put it on top and pushing the image out to the client in a very fast manner.
As an example, is the image processed on the fly and saved in a bucket that you specify or is it always processed on the fly?
Here is how it works: Whenever there is a query to Imagizer, it will inspect its cache. If the processed image is not in the cache, it will look at a different layer of cache for the additional image. If it's not in cache, it will pull the image, put it in cache and apply the transformation you requested. The follow up request will hit cache, open Imagizer, and retrieve the image.
Cache will always be faster than transcoding. Transcoding actually uses CPU cycles where caching is mostly I/O from memory or SSDs. We're still able to achieve 20 to 25 milliseconds per conversion, so you don't need to store the image anywhere. You can just show it in the browser. You have your browser load up all the images...they'll show up in real time. Why would you need to store them? You can always just query more images later.
Generally there is a CDN sitting on top, so it will retrieve the processed images, load it into its own cache system, push it around the world so people can actually query those cache boxes.
Are you talking about the machine cache...the same machine that transcoded the image will hold it into memory or on disc? Is that what you're calling the cache?
Yes, but there is also CDN that you can put on top like CloudFront for example. I have a customer called TonTon Malaysia (a soap opera Malaysian channel) very similar to Netflix. We're partnering with a company called MaxCDN and we're kind of managing the CDN for them as well. What's interesting is that TonTon has everything when it comes to original images...lives somewhere on the east coast. Looking at their CDN, you can see that their CDN box is in Hong Kong. 95 per cent of traffic is in Malaysia hitting Hong Kong box. Even though our EC2 units are launched in Virginia, the CDN retrieves all the processed images, loads them up into cache, and passes them onto the Malaysian population.
People ask us if we are providing CDN solutions. No, we're not in that layer of the stack. Imagine a cheeseburger... kind of like a stack. The bottom bun is your image library. The images are being passed into the meat. There's the secret sauce which is Imagizer. The top bun is the CDN. The whole thing makes it very tasty.
When you run Imagizer:
- You get rid of all the redundant images you don't need because you can process them on the fly. Space requirements go down and you save money.
- Images are compressed to match screen sizes of the clients. Saves bandwidth. We can achieve 60 to 70 percent compression for any image depending on the app.
How do you tell Imagizer what image you need?
Imagizer is a proxy. Let's say you have http://s3.amazonaws.com/file.jpg and you want that file to be 200 x 200 pixels. You go to Amazon AWS and spin up your Imagizer box. It comes out pointing to s3.amazonaws.com. All you need to do is say http://mybox.imagizer.com/200x200/file.jpg. As soon as you run that, the file becomes that thumbnail. It gets transcoded on the fly.
So you're just storing the largest resolution that you want...so you have, for example, a mobile device or retina display comes in, you can tell Imagizer which image or format and resolution you need and it spits it out to the client?
Absolutely. The coolest thing here is that you can do responsive design web pages and mobile sites by simply looking at the screen size, capturing that sort of resolution, and passing it onto the url structure. Then, Imagizer will output the files precisely in the format and screen density that you want.
It's pretty simple stuff. What's not simple is how to do it in real time without delay.
I know that now you don't have infrastructure. Will it make sense at one point to have a central location that customers can plug into instead of spinning up their own EC2 machines? Are you more interested in staying a software company without the headache of scaling infrastructure?
We want to be an image processing engine the same way that S3 is your storage cloud. We don't need any add-on on top. We actually don't want any data to go through our infrastructure.
You're not worried that some engineer will press the wrong button and all the images go down. Stuff like that happens.
We're not locking you in through our infrastructure. Just use your own cloud for as long as you want and run as many instances as you want.
A large company can quickly spin up a hundred instances and do a million queries per second. We don't have to worry about running a data center at all. Let's say they run it for an hour, finish, and then shut it off. They just spent about a thousand dollars.
You mentioned using the Google Cloud Compute Engine or Azure, or Digital Ocean or software. Do you plan on adding that on in the future?
We want to be in every cloud. We're treating cloud providers as operating systems essentially ... like a cloud operating system where you can bring up your own demand resources to life.
We want people to have choices ... they can run Imagizer anywhere they want. Even if your images are stored at Google, you can still use Imagizer and Amazon. The data input into Amazon is free. You pay for the data output of course, but by then your images are appropriately sized and would be passed onto the CDN anyway. You would do it the same way with Google except your images are optimized.
How do you measure this kind of performance? What kinds of tools measure the beginning and end of processing?
Testing: Basically, we use three tools:
Siege is an open source utility that can concurrently query another server. We bring up a really powerful instance (32 cores). We'll bring up a tester box in the same zone at AWS. We load up a list of all the urls we want to test onto Imagizer. We attack this Imagizer instance with this Seige box. They're kind of virtual boxes sitting next to each other and we just slam and measure what's going on in the Imagizer level.
Httperf. There you can actually feed it your access log. That's really useful when you're qualifying potential clients. Usually we would ask,"Let's see if we could actually improve the delivery times for your apps." Give us your access logs...Like a million entrees. We would play back this access log on Imagizer. We would present measurements in the form of a graph with milliseconds per unit, how much bandwidth you've saved...
AB ApacheBench. Pretty standard. You can do a lot of things with this.
If you have someone that is interested in learning more about images as you have, how would you recommend they get started?
I would recommend looking at OpenCV guides. That library is really well designed. It can really introduce you to vector math and linear algebra so you can really understand what is going on at the pixel level. Once you understand how OpenCV works, you can jump into pixel by pixel processing. That is just straight C. You will already know that here's an open C function that does matrix multiplication. Then, you can ask, "How can I do it better just using straight C or maybe even Assembly?"
In OpenCV, play around with things like facial recognition, histogram analysis, and play with color spectrums.
Christophe: There seem to be a lot of interesting and fun use cases for that kind of operation. Image recognition is super interesting to me.
You were at CES announcing the product weren't you? Did you see some crazy things?42:20
We saw a quadcopter you can fly in. You get in and use your phone to enter your destination and it will take you there.
That goes back to recognition and autopilots for vehicles, recognizing objects, animals, people, roads. That's huge isn't it?
Yes, in Computer Vision, I have seen flying wings that can dodge trees. Real time object detection and avoidance is really big.
In order to have a quadcopter hover in place, they're using a thing called Optical Flow which is essentially a camera looking down. If you look at the mouse pointer, it's using something similar to Optical Flow. The mouse pointer uses the reflection of the table.
The quadcopter has a camera facing down that is mapping and recognizing whether it's moving at all with respect to the ground. To hover, it's giving its controller inputs on how to stay in place. Optical Flow is one of the Computer Vision things that is really interesting.
Are you setting up Nventify Inc to be ready for these kinds of recognitions. You're really intimate with image processing and recognition. Could that potentially be part of your future?
I'm mostly interested in making people's websites and apps faster. There's really a need for that now. Everyone wants to build an app.
We're taking advantage of all the benefits of the cloud infrastructure. It's an untraditional approach. We're a puristic company that's building software specifically for the cloud. We don't have some modified thing that runs somewhere else and that also runs on the cloud.
Where did you get this great interest in performance? Was it the culture at Google or has it progressively grown by being a user and seeing the need for really fast websites or really fast apps?
It's both. At Google, I worked closely with Larry Page who was adamant about providing beautiful, fast, user experience. That's what brought a lot of customers to Google initially. He was big on quality and infected me with his enthusiasm.
From the early days of my professional life, I was always looking for things to run faster. It was a personal challenge for me to see if it was possible to develop something like image processing at insanely fast speeds.
Time is money: Looking at the stats, if your site is delayed by 1 second, you are going to lose about 7 per cent on your convergence. That's huge. If you are making about a $100,000 per day for a large company, that's like 2 or 3 million dollars of loss yearly just if your site is slow by 1 second.
We wanted to contribute to people's success. We can give them Imagizer to use on demand so they can focus on the product's key features rather that trying to understand how to resize all those images, building all those pipelines, worrying about storage and hard drives...
A good analogy is that if you want to build cars that fly, why would you design a factory to build wheels? You could buy the best tires out there already. You could remain focused on the ion drive that's going to make the car fly.
That's the message to people out there. Everyone wants to build something interesting and new and different. We don't want to build a clone of Instagram that's insanely fast or a Facebook clone... We want to build something amazing, so we feel like we can bring those people closer to their goals. That's really exciting to me.
If you could give advice to anyone that's trying to start a career or take their established career to the next level, what would you say to them?
Interesting question...I would say to them, "Work your ass off." There are a lot of really smart people...
My message to them: "Do you want to be successful? Do you want to understand how the technology world works?" It actually takes a lot of effort. It takes a lot of self motivation to apply yourself to learn things that you otherwise would not know. I would say, "Don't be discouraged by difficult things. Work your ass off and you're going to just be better every month."
When you first started your career, did you see the work at Google and ImageShack as really interesting or did you work on it because you thought it was going to be big?
I think it was the first thing you said. For me, it was basically a challenge. I had no idea Google would be so successful. In fact, I thought I would give it six months and see how it would go because, I figured there were plenty of jobs out there. I figured they only had 15 to 20 people and they really needed help. I thought I could get busy and help them out and maybe learn something new. At that time, Google was just a bunch of Stanford students. For me, it was a personal challenge. "Can I help the company become something amazing?" It did.
When I came to ImageShack, it was like shifting gears. At Google, I had all the help I needed; project managers, all kinds of people out there...eight thousand people.... the challenge changed to how to make an impact in a company that large. With thousands of employees, I felt less and less impactful than when I was one of twenty people. So when the ImageShack opportunity came along I was like,"Yeah! This is going to be an interesting project. I thought, "This definitely has some scaling challenges, so I'm just going to jump in and work my ass off, get this thing off the ground, and see what it will become."
Christophe: It obviously worked really well for you and I'm glad. Jack, I really appreciate your time and coming on the show today and thanks for reaching out to me. I really enjoyed that you had seen that other interview, and that you said you had a new product that could do it in 25 milliseconds. That's what got me! I said to myself, " Hey, that's really fast, how the heck do they do that?" Thank you so much for sharing with us.
If people want to reach out to you, or check out your products under Nventify Inc or Imagizer, how do you recommend they do that?
Nventify.com; my personal email is Jack at nventify dot com. You can also find me on LinkedIn. If you have any interesting projects out there that you want my feedback on, I will provide feedback. If you want to connect to an investor community, by all means, reach out to me, I'll try to hook you guys up.
Christophe: That's right! We didn't talk about that. You are an angel investor.
If you Google Jack Levin, you can easily find that information. Are you on Twitter? Yes, @nventify (Nventify Incorporated) is my Twitter account.
How did this interview help you?
If you learned anything from this interview, please thank our guest for their time.
Oh, and help your followers learn by clicking the Tweet button below :)