How does batch normalization improve training of neural networks?
Anonymous
- gradients on any given layer don't dominate other layers and hide their effects - faster convergence because gradients are on similar scale, so weights are also on similar scale
Check out your Company Bowl for anonymous work chats.