我在哪里调用Keras的BatchNormalization函数?

如果我想在Keras中使用BatchNormalization函数，那么我只需要在开始时调用它一次吗?

我阅读了它的文档:http://keras.io/layers/normalization/

我不知道该怎么称呼它。下面是我试图使用它的代码:

model = Sequential()
keras.layers.normalization.BatchNormalization(epsilon=1e-06, mode=0, momentum=0.9, weights=None)
model.add(Dense(64, input_dim=14, init='uniform'))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(64, init='uniform'))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(2, init='uniform'))
model.add(Activation('softmax'))

sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy', optimizer=sgd)
model.fit(X_train, y_train, nb_epoch=20, batch_size=16, show_accuracy=True, validation_split=0.2, verbose = 2)

我问是因为如果我运行包含批处理规格化的第二行代码，如果我不运行第二行代码，我会得到类似的输出。所以要么我没有在正确的地方调用函数，要么我猜这没有太大的区别。

当前回答

它是另一种类型的层，所以你应该把它作为一个层添加到你的模型的适当位置

model.add(keras.layers.normalization.BatchNormalization())

请看一个例子:https://github.com/fchollet/keras/blob/master/examples/kaggle_otto_nn.py

2016-01-12 18:31:08

其他回答

它是另一种类型的层，所以你应该把它作为一个层添加到你的模型的适当位置

model.add(keras.layers.normalization.BatchNormalization())

请看一个例子:https://github.com/fchollet/keras/blob/master/examples/kaggle_otto_nn.py

2016-01-12 18:31:08

Keras现在支持use_bias=False选项，所以我们可以通过编写这样的代码来节省一些计算

model.add(Dense(64, use_bias=False))
model.add(BatchNormalization(axis=bn_axis))
model.add(Activation('tanh'))

model.add(Convolution2D(64, 3, 3, use_bias=False))
model.add(BatchNormalization(axis=bn_axis))
model.add(Activation('relu'))

2016-12-29 07:42:36

增加了关于批处理归一化应该在非线性激活之前还是之后调用的争论:

除了在激活前使用批量归一化的原始论文外，Bengio的书《深度学习》第8.7.1节给出了一些理由，说明为什么在激活后(或直接在输入到下一层之前)应用批量归一化可能会导致一些问题:

It is natural to wonder whether we should apply batch normalization to the input X, or to the transformed value XW+b. Ioﬀe and Szegedy (2015) recommend the latter. More speciﬁcally, XW+b should be replaced by a normalized version of XW. The bias term should be omitted because it becomes redundant with the β parameter applied by the batch normalization reparameterization. The input to a layer is usually the output of a nonlinear activation function such as the rectiﬁed linear function in a previous layer. The statistics of the input are thus more non-Gaussian and less amenable to standardization by linear operations.

换句话说，如果我们使用relu激活，所有的负值都映射为零。这可能会导致一个已经非常接近于零的平均值，但剩余数据的分布将严重向右倾斜。试图将这些数据正常化为一条漂亮的钟形曲线可能不会得到最好的结果。对于relu家族以外的激活，这可能不是一个大问题。

请记住，有些模型在激活之后使用批处理归一化时会得到更好的结果，而其他模型在激活之前使用批处理归一化时会得到最好的结果。最好使用这两种配置来测试您的模型，如果激活后的批处理规范化显著降低了验证损失，则使用该配置。

2021-01-15 18:08:24

批处理归一化是通过调整激活的平均值和缩放来归一化输入层和隐藏层。由于这种在深度神经网络中附加层的归一化效应，网络可以使用更高的学习率而不会消失或爆炸梯度。此外，批归一化对网络进行了正则化，使其更容易泛化，因此不需要使用dropout来缓解过拟合。

在使用Keras中的Dense()或Conv2D()计算线性函数后，我们使用BatchNormalization()来计算层中的线性函数，然后使用Activation()将非线性添加到层中。

from keras.layers.normalization import BatchNormalization
model = Sequential()
model.add(Dense(64, input_dim=14, init='uniform'))
model.add(BatchNormalization(epsilon=1e-06, mode=0, momentum=0.9, weights=None))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(64, init='uniform'))
model.add(BatchNormalization(epsilon=1e-06, mode=0, momentum=0.9, weights=None))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(2, init='uniform'))
model.add(BatchNormalization(epsilon=1e-06, mode=0, momentum=0.9, weights=None))
model.add(Activation('softmax'))

sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy', optimizer=sgd)
model.fit(X_train, y_train, nb_epoch=20, batch_size=16, show_accuracy=True, 
validation_split=0.2, verbose = 2)

如何应用批处理规范化?

假设我们向层l输入a[l-1]，并且我们有层l的权值W[l]和偏置单位b[l]。添加非线性后)为层l, z[l]为添加非线性前的向量

Using a[l-1] and W[l] we can calculate z[l] for the layer l Usually in feed-forward propagation we will add bias unit to the z[l] at this stage like this z[l]+b[l], but in Batch Normalization this step of addition of b[l] is not required and no b[l] parameter is used. Calculate z[l] means and subtract it from each element Divide (z[l] - mean) using standard deviation. Call it Z_temp[l] Now define new parameters γ and β that will change the scale of the hidden layer as follows: z_norm[l] = γ.Z_temp[l] + β

在这段代码摘录中，Dense()取a[l-1]，使用W[l]并计算z[l]。然后立即的BatchNormalization()将执行上述步骤以得到z_norm[l]。然后立即激活()将计算tanh(z_norm[l])给出一个[l]，即。

a[l] = tanh(z_norm[l])

2019-04-09 08:08:34

关于BN应该应用在当前层的非线性之前还是应用在前一层的激活之前，这个线程有一些相当大的争论。

虽然没有正确答案，但批处理规范化的作者是这么说的它应立即应用于当前层的非线性之前。原因(引自原文)-

"We add the BN transform immediately before the nonlinearity, by normalizing x = Wu+b. We could have also normalized the layer inputs u, but since u is likely the output of another nonlinearity, the shape of its distribution is likely to change during training, and constraining its first and second moments would not eliminate the covariate shift. In contrast, Wu + b is more likely to have a symmetric, non-sparse distribution, that is “more Gaussian” (Hyv¨arinen & Oja, 2000); normalizing it is likely to produce activations with a stable distribution."

2017-08-20 12:54:02

我在哪里调用Keras的BatchNormalization函数?

推荐文章

最新文章

标签