logo
logo
Sign in
avatar
priti pawar
Apache Spark

Apache Spark

Spark is based on the Hadoop distributed file system but does not use Hadoop MapReduce, but its own framework for parallel data processing, which starts with the insertion of data into persistent distributed data records (RDD) and distributed memory abstractions, which computes large Spark clusters in a way that fault-tolerant. Because data is stored in memory (and on disk if necessary), Apache Spark can be much faster and more flexible than the Hadoop MapReduce task for certain applications described below. 

 
collect
0
avatar
priti pawar
guide
Zupyak is the world’s largest content marketing community, with over 400 000 members and 3 million articles. Explore and get your content discovered.
Read more