(Create intro using wikipedia:Hadoop page intro)
Revision as of 10:42, 16 September 2011
Apache Hadoop is a wikipedia:software framework that supports data-intensive distributed applications under a free license. "Hadoop is a framework for running applications on large clusters of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named map/reduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. In addition, it provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both map/reduce and the distributed file system are designed so that node failures are automatically handled by the framework." Hadoop Overview</ref> It enables applications to work with thousands of nodes and wikipedia:petabytes of data. Hadoop was inspired by wikipedia:Google's wikipedia:MapReduce and Google File System (GFS) papers.