Hadoop: Difference between revisions
Jpr@uab.edu (talk | contribs) (fix wikipedia interwiki links) |
Jpr@uab.edu (talk | contribs) (Add requirements section) |
||
Line 1: | Line 1: | ||
'''Apache Hadoop''' is a [[wikipedia:software framework|software framework]] that supports data-intensive [[wikipedia:distributed computing|distributed applications]] under a [[wikipedia:Free software|free license]]. "Hadoop is a framework for running applications on large clusters of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named map/reduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. In addition, it provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both map/reduce and the distributed file system are designed so that node failures are automatically handled by the framework." [http://wiki.apache.org/hadoop/ProjectDescription Hadoop Overview]</ref> It enables applications to work with thousands of nodes and [[wikipedia:petabytes|petabytes]] of data. Hadoop was inspired by [[wikipedia:Google|Googles]]'s [[wikipedia:MapReduce|MapReduce]] and [[wikipedia:GoogleFS|Google File System]] (GFS) papers. | '''Apache Hadoop''' is a [[wikipedia:software framework|software framework]] that supports data-intensive [[wikipedia:distributed computing|distributed applications]] under a [[wikipedia:Free software|free license]]. "Hadoop is a framework for running applications on large clusters of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named map/reduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. In addition, it provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both map/reduce and the distributed file system are designed so that node failures are automatically handled by the framework." [http://wiki.apache.org/hadoop/ProjectDescription Hadoop Overview]</ref> It enables applications to work with thousands of nodes and [[wikipedia:petabytes|petabytes]] of data. Hadoop was inspired by [[wikipedia:Google|Googles]]'s [[wikipedia:MapReduce|MapReduce]] and [[wikipedia:GoogleFS|Google File System]] (GFS) papers. | ||
The MapReduce model is documented in this [http://labs.google.com/papers/mapreduce.html paper] from Google labs. | |||
== Requirements == | |||
Running Hadoop on [[Cheaha]] has two dependencies. | |||
1) you must download and unpack hadoop into your home directory | |||
2) you must reserve a mini-cluster in which to run your hadoop cluster as described in the [[Cheahe_GettingStarted#Hello World (mini-cluster environment)|mini-cluster example]] | |||
Additionally, you may want to set up a VNC session to interact with your hadoop cluster via a web browser, especially when first exploring hadoop. |
Revision as of 17:01, 16 September 2011
Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license. "Hadoop is a framework for running applications on large clusters of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named map/reduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. In addition, it provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both map/reduce and the distributed file system are designed so that node failures are automatically handled by the framework." Hadoop Overview</ref> It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Googles's MapReduce and Google File System (GFS) papers.
The MapReduce model is documented in this paper from Google labs.
Requirements
Running Hadoop on Cheaha has two dependencies. 1) you must download and unpack hadoop into your home directory 2) you must reserve a mini-cluster in which to run your hadoop cluster as described in the mini-cluster example
Additionally, you may want to set up a VNC session to interact with your hadoop cluster via a web browser, especially when first exploring hadoop.