Hadoop, the free Java-based programming framework, is designed to support the processing of large data sets in a distributed computing environment that is typically built from commodity hardware.
At its core, Hadoop consists of a storage part called the Hadoop Distributed File System HDFS , and a processing part called MapReduce.
Basically, Hadoop works by splitting large files into blocks which are then distributed across nodes in a cluster to be processed.
The base framework is made up of Hadoop Common, which contains libraries and utilities for other Hadoop modules; HDFS, a distributed file system that stores data on commodity machines; YARN, which works as a resource management platform; and MapReduce, which is for large scale data processing.
The MapReduce and HDFS components of Hadoop were originally inspired by Google papers on their MapReduce and Google File System, the paper was published in 2003.
Java is the most common language on the Hadoop framework, although there is some native code in C and command line utilities written as shell scripts.