NVIDIA interview question

How does batch normalization improve training of neural networks?

Interview Answer

Anonymous

11 Mar 2019

- gradients on any given layer don't dominate other layers and hide their effects - faster convergence because gradients are on similar scale, so weights are also on similar scale