Cheap and Secure Web Hosting Provider : See Now

[Solved]: Batch regularization & L2 regularization

, , No Comments
Problem Detail: 

Is performing batch regularization in addition to L2 regularization redundant?

Batch regularization:

Notice how when performing batch regularization, you forgo using a bias term (and instead normalize your layer output, followed by scaling with a 'scale' term, and shifting with a 'beta' term. I was under the impression that L2 regularization punishes large weights & biases... but if I'm not using biases because of batch regularization, then would I just punish the weights with L2 regularization?


Asked By : sir_thursday

Answered By : D.W.

No. They are not equivalent. They are two different techniques, that each provide some kind of regularization, but using one doesn't necessarily make the other redundant. For instance, in the paper you cite, the authors are using both (see Section 4.2.1 for mention of L2 regularization).

Specifically: whether or not it is more effective to use just batch regularization or to use both will probably depend on the specific learning task you have in mind. But there is no theory that implies the two are equivalent.

Batch normalization still has a bias term that is added after the normalization step (the $\beta$); it does not eliminate it. L2 regularization penalizes large weights and large biases. As far as I can tell, batch regularization doesn't try to do either (at least not explicitly).

When using L2 regularization with batch regularization, I imagine you could either L2-penalize just the weights; or L2-penalize the weights, the $\gamma$ terms (scaling), and the $\beta$ terms (shift). I don't know which will produce better results. My intuition says the latter might be preferable, but experiments and data beat intuition any day.

Best Answer from StackOverflow

Question Source :

3.2K people like this

 Download Related Notes/Documents


Post a Comment

Let us know your responses and feedback