分类交叉熵和二元交叉熵之间这种明显的性能差异的原因是用户xtof54已经在他的回答中报告的,即:
用Keras方法计算的精度很简单
当使用超过2个标签的binary_crossentropy时错误
我想对此进行更详细的阐述,展示实际的潜在问题,解释它,并提供补救措施。
这种行为不是bug;潜在的原因是一个相当微妙且未被记录的问题,即当你在模型编译中简单地包含metrics=['accuracy']时,Keras实际上是如何根据你所选择的损失函数猜测使用哪个精度的。换句话说,当您的第一个编译选项
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
是有效的,第二个
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
不会产生你期望的结果,但原因不是二元交叉熵的使用(至少在原则上,这是一个绝对有效的损失函数)。
Why is that? If you check the metrics source code, Keras does not define a single accuracy metric, but several different ones, among them binary_accuracy and categorical_accuracy. What happens under the hood is that, since you have selected binary cross entropy as your loss function and have not specified a particular accuracy metric, Keras (wrongly...) infers that you are interested in the binary_accuracy, and this is what it returns - while in fact you are interested in the categorical_accuracy.
让我们来验证一下,使用Keras中的MNIST CNN示例,并进行以下修改:
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # WRONG way
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=2, # only 2 epochs, for demonstration purposes
verbose=1,
validation_data=(x_test, y_test))
# Keras reported accuracy:
score = model.evaluate(x_test, y_test, verbose=0)
score[1]
# 0.9975801164627075
# Actual accuracy calculated manually:
import numpy as np
y_pred = model.predict(x_test)
acc = sum([np.argmax(y_test[i])==np.argmax(y_pred[i]) for i in range(10000)])/10000
acc
# 0.98780000000000001
score[1]==acc
# False
为了解决这个问题,即使用二进制交叉熵作为你的损失函数(正如我所说的,这没有错,至少在原则上),同时仍然得到手头问题所需的分类精度,你应该在模型编译中明确要求categorical_accuracy,如下所示:
from keras.metrics import categorical_accuracy
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=[categorical_accuracy])
在MNIST的例子中,在我上面展示的训练、评分和预测测试集之后,两个指标现在是相同的,因为它们应该是:
# Keras reported accuracy:
score = model.evaluate(x_test, y_test, verbose=0)
score[1]
# 0.98580000000000001
# Actual accuracy calculated manually:
y_pred = model.predict(x_test)
acc = sum([np.argmax(y_test[i])==np.argmax(y_pred[i]) for i in range(10000)])/10000
acc
# 0.98580000000000001
score[1]==acc
# True
系统设置:
Python version 3.5.3
Tensorflow version 1.2.1
Keras version 2.0.4
更新:在我的帖子发布后,我发现这个问题已经在这个答案中被确定了。