Thursday, May 29, 2008

Yahoo builds two petabyte PostgreSQL database

James Hamilton writes about Yahoo's "over 2 petabyte repository of user click stream and context data with an update rate for 24 billion events per day".

It apparently is built on top of a modified version of PostgreSQL and runs on about 1k machines. In his post, James speculates on the details of the internals. Very interesting.

Please see also Eric Lai's article in ComputerWorld, "Size matters: Yahoo claims 2-petabyte database is world's biggest, busiest". On that, note that the Google Bigtable paper from 2006 says Bigtable handles "petabytes of data", so the Yahoo claim may depend on what you consider a database.

No comments:

Post a Comment