Quantcast
Channel: Hortonworks » Knowledgebase
Viewing all articles
Browse latest Browse all 31

Hadoop Distributed File System (HDFS) Defined

$
0
0

The best place for a deep dive into HDFS is the HDFS Architecture page. Here we’ll take an abbreviated view of what HDFS is, and why it matters.

The Hadoop Distributed File System is the backbone of a Hadoop cluster. It provides redundant, high available, high I/O performance for Hadoop MapReduce. It works like this: A hadoop cluster is a collection of normal, commodity servers with 8-12 disks each, connected together by ethernet. Large files are stored on HDFS in blocks of at least 64MB (and often as large as 1TB), and they are replicated three times across different machines. When a file is read, one of the three machines storing that block(s) of data streams entire the entire block from disk sequentially through the program reading the data. This results in very high I/O performance, meaning that HDFS dramatically outperforms SAN and NAS systems in terms of streaming I/O.

Once files are stored in HDFS, they are carefully minded by the namenode. The namenode(s) are the head of HDFS – they map which blocks match to which files. If one of the three machines holding a given block goes down with a hardware failure or disk corruption, the data from the remaining two nodes will automatically be copied to a new third node. HDFS is therefore self-healing.

When data is being read by Hadoop MapReduce, triple-replication is again helpful – the namenode will try above all else to keep data reads and transfers ‘local’ to the three nodes where a given block of data is stored. Furthermore, if reading one block is taking a long time – another read on another copy of the data will be started, and the winner is whichever reads first. This is called speculative execution.

These features of HDFS are what enable Hadoop to be reliable, and for MapReduce to work at all!

The post Hadoop Distributed File System (HDFS) Defined appeared first on Hortonworks.


Viewing all articles
Browse latest Browse all 31

Trending Articles