Spark RDD¶
Resilient Distributed Datasets¶
Resilient Distributed Datasets (RDD) is a fundamental data structure of Spark.
There are two ways to create RDDs − parallelizing an existing collection in your driver program, or referencing a dataset in an external storage system, such as a shared file system, HDFS, HBase, or any data source offering a Hadoop Input Format.
Iterative Operations on MapReduce¶
iterative_operations_on_mapreduce-min
Interactive Operations on MapReduce¶
interactive_operations_on_mapreduce-min
Iterative Operations on Spark RDD¶
iterative_operations_on_spark_rdd-min
Interactive Operations on Spark RDD¶
interactive_operations_on_spark_rdd-min