[Talk] Hadoop

Hadoop (http://hadoop.apache.org/), an open source volunteer project under
the Apache Software Foundation, is a framework for running applications on
large clusters built of commodity hardware. It lets one easily write and run
applications that process vast amounts of data (terabytes to petabytes).

Hadoop implements a computational paradigm named Map-Reduce, where the
application is divided into many small fragments of work, each of which may
be executed or reexecuted on any node in the cluster. In addition, it
provides a distributed file system (HDFS) that stores data on the compute
nodes, providing very high aggregate bandwidth across the cluster. Yahoo! is
one of the main contributors to Hadoop and uses it extensively to manage
large clusters of machines.

I hope to engage the open-source community on Hadoop and encourage
participation in its development. I will present an overview of Hadoop and
its architecture with a focus on the Map-Reduce component. I will describe
the engineering challenges and briefly talk about how Hadoop clusters are
used in Yahoo!.

Categories

,

0 TrackBacks

Listed below are links to blogs that reference this post: [Talk] Hadoop.

TrackBack URL for this entry: http://osdc.tw/cgi-bin/mt-tb.cgi/338

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Schedule

About This Post

This page contains a single entry by posted on March 13, 2008 1:44 PM.

[Talk] Branch Management with SVK was the previous post in this blog.

[Speaker] Vivek Ratan is the next post in this blog.

Find recent content on the main index or look in the archives to find all content.

Powered by Movable Type 4.32-en
hosted by PhotonVPS