So let's say you just bought some sort of new Boxee box or Apple TV, and you have a lot of videos you want to watch on them. You need to process those videos into a device friendly format, and you’re thinking about building or buying a dedicated machine to handle the job. What do you buy?
Not surprisingly, the answer depends pretty heavily on what you're trying to optimize for.
Optimize for peak throughput. For example, a tv show airs, you record it to your PC, and you want it ready to stream or transfer to any number of mobile or in-home devices as soon as possible.
Optimize for total throughput. You have a bunch of DVD's or Blu-Ray's and you want to queue them up and convert them as quickly as possible, in total.
Optimize for energy consumption. You want a little bit of everything above, but what you don’t want to do is inflate your energy bill or consume watts like crazy over the course of the year.
And let’s just assume you’re always optimizing for cost, within. You need to decide how much roughly how much throughput you want … but going too cheap actually results not only in low total throughput but also unacceptably low throughput per dollar. Going expensive will net you higher peak throughput, but higher total throughput might be better served by extra machines, not just a single beefy machine. We’re not building supercomputers here. (yet!)
Peak throughput – Your best bet here is a single processor system with a lot of cores (probably around 4), clocked as high as possible. A lot of modern encoding systems can use multiple cores, but simply adding cores doesn't scale indefinitely.
The most economical single processor systems for this scenario tend to be Core i7 based. Even though AMD’s processors are generally a good bang for your buck, Core i7 systems actually have a leg up over AMD's processors in this case because they are:
1. Easily overclockable
2. Especially efficient at encoding videos
3. Hyperthreaded (The extra logical processors actually help significantly here … on the order of 10 to 20%)
GPU-based encoding solutions. Video encoding, falling squarely in the parallelizable camp of problem sets, is often held up as a poster child for GPU-accelerated software. In practice, however, these encoders are poorly optimized compared to general purpose encoders. These tend to yield interesting speed increases at the expense of quality per bit and all of the flexibility that traditional software provides. In short, academically interesting, but not particularly practical.
For reference, I took a 1080p reference video and encoded it to an 720p .m4v using the High Profile settings in Handbrake. The system had a 3.06GHz Core i7 with hyperthreading enabled (4 cores, 8 logical processors). The source video was 134 min., and the encode itself took 103 min. Faster than real-time is always good! I don't want to overload this post with stats, but that should give you a feel for the scale that we are talking about.
If you want to spend more money, you can always buy a 12-core Mac Pro or build the equivalent. But benchmarks show that jumping from four to twelve cores here doesn't make the encodes three times faster … only 1.42 times faster. So you'll have to decide whether the extra money is worth the diminishing returns increase in speed.
OK, so we established above that, past a certain point, multiple cores stop helping as much with the performance of a single encode. Looking at Handbrake, it scales pretty well up to 4 cores, starts to diminish from 4 to 8 cores, and drops off significantly from 8 to 12 cores. In other words, that 48-core system isn’t going to help you much. But what you CAN do with that many cores, assuming you have multiple files to process, is run multiple encodes at once … enough so that you have roughly the ratio of encodes to optimal number of cores. Sure, the encodes will bump into each other somewhat, but the total throughput/utilization of the system will be much closer to optimal than if you just ran one file at a time.
Unfortunately, buying more than four cores on a single system right now is just not that cheap. The multiprocessor scenarios for adding extra cores seem to fall into the business segment of the market and now your cost increases rapidly. If your objective is not raw speed on a single system, you can get more total processing power per dollar by simply buying and building more single processor Core i7 systems. The problem then becomes the coordination necessary to distribute the encoding work amongst multiple machines.
Now this was a topic I found interesting. It turns out that going really green with a full encoding load is kind of difficult. Computers range in green-ness all the way between really green (Mac Mini 2010) to really power hungry (Mac Pro 2010). Most desktop computers aren't designed to be that green. The best example I could come up with … the Mac Mini 2010 … still doesn't compete that favorably with a Core i7 running at 3.6 GHz if you are comparing them at full load.
For example, if you assume, living out here in California, that your cost per KWh is $0.30, then you end up with the following, based on some other tests I ran on my own machines using a 720p source as the reference (so not comparable to the other test above).
|Mac Mini 2010||Homemade Core i7|
|Purchase price||$999 ($699 base)||$699|
|Watts under full load||30||200|
|Source hours processed per day||11||50|
|Cost per 1000 source hours||65.00||96.00|
So yep, the Mac Mini does pretty well on the energy efficiency side of things, but surprisingly the high powered Core i7 holds up pretty well too, just because it's so damned good at encoding video!
Where a machine like the Mac Mini 2010 really wins out is on the idle side of things, tho. So if your pipeline sits around unutilized for decent stretches, the 10 watt idle power consumption of a Mac Mini vs a desktop’s 100-150 watts adds up quickly. It's actually for this reason that I use a Mac Mini as my primary desktop and leave the high powered machines off unless I want to do something requiring serious horsepower … like playing a video game.
It turns out, however, that you really shouldn't expect to win much back over time in terms of the dollars saved in energy costs. Encoding 1000 hours of video is a LOT, and you only make back 30 dollars over that range, even with the high energy costs here in California. It just doesn't add up to much. And Core i7 desktops are not that expensive, especially compared to Mac's! You can bet the Core i7 system is going to have a much longer useful lifetime.
In short, what I found was that, despite the desire to be energy efficient, you'd be hard pressed to make your money back in energy savings over time.
What isn't that important
RAM – You only need enough to run the encoding jobs comfortably and no more. 2GB in a pinch, 4GB is plenty. I would go with 4GB at least just so that you can actually repurpose the computer for something else if desired.
Hard drive space – You need just enough space to buffer the input and output files reasonably. Regular hard drives are fine and SSD's are overkill unless you plan on multitasking the machine heavily with other workloads. Even something like a 120GB drive is probably safe … but nowadays, reasonably cheap drives start around at least 320GB for notebook drives and almost 1TB for internal desktop drives.
Network throughput – In a more complex encoding scenario involving multiple computers, network throughput becomes a significant factor as you start having to shove bits around to get them encoded on different nodes. But it's not really relevant to the current discussion.
Obviously there are quite a few factors I haven't included here for the sake of brevity, but the upshot is that a lot of factors push you towards the single Core i7 build when it comes to encoding performance as defined on a number of fronts. What's going to be really interesting in future is the advent of massively cored systems on the horizon. Eating the energy overhead of individual systems is not so great and having many cores on one system could alleviate that problem as well as push the performance envelope in terms of encoding parallelization.