我的集群:1个主节点,11个从节点,每个节点有6gb内存。

我的设置:

spark.executor.memory=4g, Dspark.akka.frameSize=512

问题是这样的:

首先,我从HDFS读取一些数据(2.19 GB)到RDD:

val imageBundleRDD = sc.newAPIHadoopFile(...)

其次,在这个RDD上做一些事情:

val res = imageBundleRDD.map(data => {
                               val desPoints = threeDReconstruction(data._2, bg)
                                 (data._1, desPoints)
                             })

最后,输出到HDFS:

res.saveAsNewAPIHadoopFile(...)

当我运行我的程序时,它显示:

.....
14/01/15 21:42:27 INFO cluster.ClusterTaskSetManager: Starting task 1.0:24 as TID 33 on executor 9: Salve7.Hadoop (NODE_LOCAL)
14/01/15 21:42:27 INFO cluster.ClusterTaskSetManager: Serialized task 1.0:24 as 30618515 bytes in 210 ms
14/01/15 21:42:27 INFO cluster.ClusterTaskSetManager: Starting task 1.0:36 as TID 34 on executor 2: Salve11.Hadoop (NODE_LOCAL)
14/01/15 21:42:28 INFO cluster.ClusterTaskSetManager: Serialized task 1.0:36 as 30618515 bytes in 449 ms
14/01/15 21:42:28 INFO cluster.ClusterTaskSetManager: Starting task 1.0:32 as TID 35 on executor 7: Salve4.Hadoop (NODE_LOCAL)
Uncaught error from thread [spark-akka.actor.default-dispatcher-3] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[spark]
java.lang.OutOfMemoryError: Java heap space

任务太多?

PS:当输入数据约为225 MB时,一切正常。

我该如何解决这个问题呢?


当前回答

你把你的主垃圾收集日志扔掉了吗?所以我遇到了类似的问题,我发现SPARK_DRIVER_MEMORY只设置Xmx堆。初始堆大小仍然是1G,堆大小永远不会扩大到Xmx堆。

传递“——conf”spark.driver。extraJavaOptions=-Xms20g”解决了我的问题。

Ps aux | grep Java和您将看到以下日志:=

4178294 pts/0 Sl+ 18184 pts/0 Sl+ 18:49 0:33 /usr/java/latest/bin/ opt/spark/ com / /

其他回答

根据我对上面提供的代码的理解,它加载文件并进行映射操作并保存回来。没有需要shuffle的操作。此外,没有任何操作需要将数据传输到驱动程序,因此调优与shuffle或驱动程序相关的任何内容都不会产生影响。当任务太多时,驱动程序确实会有问题,但这只是在spark 2.0.2版本之前。可能会有两件事出错。

There are only one or a few executors. Increase the number of executors so that they can be allocated to different slaves. If you are using yarn need to change num-executors config or if you are using spark standalone then need to tune num cores per executor and spark max cores conf. In standalone num executors = max cores / cores per executor . The number of partitions are very few or maybe only one. So if this is low even if we have multi-cores,multi executors it will not be of much help as parallelization is dependent on the number of partitions. So increase the partitions by doing imageBundleRDD.repartition(11)

看看启动脚本,Java堆大小设置在那里,看起来你在运行Spark worker之前没有设置这个。

# Set SPARK_MEM if it isn't already set since we also use it for this process
SPARK_MEM=${SPARK_MEM:-512m}
export SPARK_MEM

# Set JAVA_OPTS to be able to load native libraries and to set heap size
JAVA_OPTS="$OUR_JAVA_OPTS"
JAVA_OPTS="$JAVA_OPTS -Djava.library.path=$SPARK_LIBRARY_PATH"
JAVA_OPTS="$JAVA_OPTS -Xms$SPARK_MEM -Xmx$SPARK_MEM"

您可以在这里找到部署脚本的文档。

为了添加一个通常不被讨论的用例,我将在本地模式下通过Spark -submit提交Spark应用程序时提出一个解决方案。

根据Jacek Laskowski的giitbook Mastering Apache Spark:

您可以在本地模式下运行Spark。在这种非分布式单JVM部署模式下,Spark在同一个JVM中生成所有执行组件——驱动程序、执行程序、后端和主机。这是驱动程序用于执行的唯一模式。

因此,如果您在堆中遇到OOM错误,调整驱动程序内存而不是执行程序内存就足够了。

这里有一个例子:

spark-1.6.1/bin/spark-submit
  --class "MyClass"
  --driver-memory 12g
  --master local[*] 
  target/scala-2.10/simple-project_2.10-1.0.jar 

你把你的主垃圾收集日志扔掉了吗?所以我遇到了类似的问题,我发现SPARK_DRIVER_MEMORY只设置Xmx堆。初始堆大小仍然是1G,堆大小永远不会扩大到Xmx堆。

传递“——conf”spark.driver。extraJavaOptions=-Xms20g”解决了我的问题。

Ps aux | grep Java和您将看到以下日志:=

4178294 pts/0 Sl+ 18184 pts/0 Sl+ 18:49 0:33 /usr/java/latest/bin/ opt/spark/ com / /

广义上讲,spark Executor JVM内存可以分为两部分。Spark内存和User内存。这是由spark.memory.fraction属性控制的——值在0到1之间。 在spark应用程序中处理图像或执行内存密集型处理时,请考虑降低spark.memory.fraction。这将为应用程序工作提供更多内存。Spark可能溢出,所以它仍然可以在较少的内存共享下工作。

The second part of the problem is division of work. If possible, partition your data into smaller chunks. Smaller data possibly needs less memory. But if that is not possible, you are sacrifice compute for memory. Typically a single executor will be running multiple cores. Total memory of executors must be enough to handle memory requirements of all concurrent tasks. If increasing executor memory is not a option, you can decrease the cores per executor so that each task gets more memory to work with. Test with 1 core executors which have largest possible memory you can give and then keep increasing cores until you find the best core count.