# bigdata-module-7-spark-sourcecode **Repository Path**: penglanglang/bigdata-module-7-spark-sourcecode ## Basic Information - **Project Name**: bigdata-module-7-spark-sourcecode - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-10-16 - **Last Updated**: 2021-11-02 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # bigdata-module-7-spark-sourcecode #### 题目 简答题: 以下代码: ```scala import org.apache.spark.rdd.RDD import org.apache.spark.{SparkConf, SparkContext} object JoinDemo { def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName(this.getClass.getCanonicalName.init).setMaster("local[*]") val sc = new SparkContext(conf) sc.setLogLevel("WARN") val random = scala.util.Random val col1 = Range(1, 50).map(idx => (random.nextInt(10), s"user$idx")) val col2 = Array((0, "BJ"), (1, "SH"), (2, "GZ"), (3, "SZ"), (4, "TJ"), (5, "CQ"), (6, "HZ"), (7, "NJ"), (8, "WH"), (0,"CD")) val rdd1: RDD[(Int, String)] = sc.makeRDD(col1) val rdd2: RDD[(Int, String)] = sc.makeRDD(col2) val rdd3: RDD[(Int, (String, String))] = rdd1.join(rdd2) println(rdd3.dependencies) val rdd4: RDD[(Int, (String, String))] = rdd1.partitionBy(new HashPartitioner(3)).join(rdd2.partitionBy(newHashPartitioner(3))) println(rdd4.dependencies) sc.stop() } } ``` 两个打印语句的结果是什么,对应的依赖是宽依赖还是窄依赖,为什么会是这个结果; join 操作何时是宽依赖,何时是窄依赖; 借助 join 相关源码,回答以上问题。 #### 回答 结果有些看不懂 从打印结果看都是OneToOneDependency rdd3 join 的时候使用的是默认分区, rdd4 join 的时候使用的是制定分区