Spark RDD¶

Resilient Distributed Datasets¶

Resilient Distributed Datasets (RDD) is a fundamental data structure of Spark.

There are two ways to create RDDs − parallelizing an existing collection in your driver program, or referencing a dataset in an external storage system, such as a shared file system, HDFS, HBase, or any data source offering a Hadoop Input Format.

Iterative Operations on MapReduce¶

https://s0.wailian.download/2019/05/30/iterative_operations_on_mapreduce-min.jpgiterative_operations_on_mapreduce-min

Interactive Operations on MapReduce¶

https://s0.wailian.download/2019/05/30/interactive_operations_on_mapreduce-min.jpginteractive_operations_on_mapreduce-min

Iterative Operations on Spark RDD¶

https://s0.wailian.download/2019/05/30/iterative_operations_on_spark_rdd-min.jpgiterative_operations_on_spark_rdd-min

Interactive Operations on Spark RDD¶

https://s0.wailian.download/2019/05/30/interactive_operations_on_spark_rdd-min.jpginteractive_operations_on_spark_rdd-min