Friday, January 15, 2010

Avatar vs. All the Cats on the Internet

If you haven't seen Avatar yet, that is OK. No, really, it's fine, you do not have to see it, because this is a free country and we are in a recession and maybe you just aren't into that sort of thing. But lots and lots of other people have seen Avatar, and I think most of them would agree that if you are going to see it, you should see it in in 3-D and on IMAX, if possible. I'm not giving away any plot points here, I'm just saying that this movie would probably look pretty weird on a 12 inch black and white television. When you let all the computer power that it took to make this film flourish in three stories and three dimensions, it is pretty amazing to look at.


The visual amazingness of Avatar is due in large part to the work of Weta Digital, a visual effects company that is pretty much destroying (in the awesome sense of the word) in the field of movie visuals right now. They were hired for The Lord of the Rings (which I'm sure had no small part in their growing popularity), the King Kong remake, District 9 and a handful of amazing others. A major part of producing visual effects like that is being able to hold onto them while the movie is being made, and that takes some incredible computing power.

The data storage system that held the movie as it was being shot is one of the top two hundred most powerful supercomputers in the world. Like, wow. It consists of 40,000 processors and 104 terabytes of RAM. Most desktop computers have around a gigabyte of RAM (where 104 terabytes is about 106500 gigabytes). The final cut of the film takes up over 17 gigabytes per minute, where as a compressed version of a two-hour, highly visual movie is usually less than ten gigabytes for the whole two hours.

Weta Digital reports that rendering the film is what took up most of the computer power and time during the last month of production (where rendering means putting all of the data that makes up each individual frame into a computer and turning that data into an image). Those 40,000 servers were running 24/7 pumping in 7 or 8 gigabytes of data per second, which is incredible. Even though a 75 year old woman now has a 40 Gbt internet connection. Related? I don't know.

Nerdy as it might be to say this (I am writing on a physics blog) I think the growth of data storage is really interesting. NO REALLY! It's pretty amazing how fantastic we humans are at producing data. We are incredible at it. First there are all those books and things we produced when we used to write on paper, which Google is trying its hardest to chronicle - but there is also everything we produce on the internet. Think of everything on the internet! Think of all the cats! ALL THE CATS! THAT'S A LOT OF CATS!

Just look at all of them! In fact, there are so many cats (and other things) on the internet that companies like Google and Ebay may soon compete with science experiments for who has the most data. That is to say, it used to be that science experiments (mostly particle physics, but also some biology, like mapping the human genome) produced far and away the most data. For however small subatomic particles might be, studying them can produce incredible volumes of data. So much so, that particle physicists were sort of sitting around waiting for computer scientists to catch up.

Take the Large Hadron Collider. It has plans to produce roughly 15 petabytes of data per year collected from the very tiny explosions it will create. Most particle experiments actually toss out between 80 to 90% of the total data produced by their machines. Most of that data is pretty uninteresting, so they aren't really wasting anything, but if they wanted to, the scientists at the LHC could produce hundreds of petabytes of data a year - but the capabilities to handle that much info all at once don't exist yet.

So now internet companies are staring to compile these mountains of data as well. To store the data mountains, they must build databases. You can buy databases from companies like IBM, but only up to a certain size. Petabyte databases are not commercially available yet. They require too many specific guidelines to be built generically. Building an extremely large database (which is actually the "official" name that people in this industry are calling petabyte databases - they call them extremely large databases or XLDB's) isn't as simple as say, doubling a recipe. It's like having built a one room house and then being asked to build a sky scraper. While some might view it as just a bunch of one room houses stacked on top of each other, there's a bit more art to it than that.

Sifting through so much data would be daunting for one computer center, so the LHC has spread the work out over the planet. CERN's computing center has put together what's known as a tiered computing grid. The grid consists of a total of 140 computing centers in 33 countries. Tier 1 centers will filter through the rough, raw data (the portion that looks like it could be interesting and is not thrown out immediately by automated computer programs). Tier 2 will filter through the stuff the tier 1 groups pick out, and tier 3 will follow. After all that, there will hopefully be some nicely combed, ready for analyzing packets of data. This requires some good group participation by institutions around the world who want to get in on the physics taking place at the LHC. It goes to show that physics is a world-wide collaborative effort, since mostly everyone is after the same thing.

I can't think of how to wrap this up so...more cats.






1 comment:

  1. I take it this is inspired by www.rathergood.com/cats. Certainly Physics makes an entrance into the lyrics.

    ReplyDelete