Distributed File System for Hadoop, Architecture, And Modules

What is Hadoop?

Hadoop is an Apache open source platform used to store, process, and analyse very large volumes of data. Hadoop is not OLAP and is written in Java (online analytical processing). Handling in batches or offline is done with it. Facebook, Yahoo, Google, Twitter, LinkedIn, and many other sites use it. In addition, scaling up only requires adding nodes to the cluster.

To learn more about Hadoop domain, join Hadoop Training in Chennai at FITA Academy.

Modules of Hadoop

HDFS: Hadoop Distributed File System modules. Following the publication of Google’s paper GFS, HDFS was created. The files will be divided into blocks and kept in nodes using the distributed design, it says.
Yarn: The cluster is managed and job scheduling is done using yet another Resource Negotiator.
Map Reduce is a framework that enables Java programmes to do key-value pair-based concurrent computations on data. The Map task transforms input data into a data collection that can be calculated as a Key value pair. The reduce task uses the output from the map task, and the output from the reducer produces the required outcome.
Common Hadoop: These Java libraries are used by other Hadoop modules and to launch Hadoop.

To learn more about the Layer of Hadoop Architectutre and to gain information about big data enroll in Big Data Online Course

Architecture Hadoop

The HDFS, MapReduce engine, and file system are all included in the Hadoop architecture (Hadoop Distributed File System). MapReduce engines come in two flavours: MR1 and MR2.

A single master and numerous slave nodes make up a Hadoop cluster. DataNode and TaskTracker are on the slave node, whereas Job Tracker, Task Tracker, NameNode, and DataNode are on the master node.