19
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

      journal-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.9% top-5 validation error (and 4.8% test error), exceeding the accuracy of human raters.

          Related collections

          Author and article information

          Journal
          arXiv
          2015
          11 February 2015
          12 February 2015
          13 February 2015
          16 February 2015
          02 March 2015
          03 March 2015
          February 2015
          Article
          10.48550/ARXIV.1502.03167
          35496726
          acc6964f-dc8e-4d22-bd86-57dc42896c20

          arXiv.org perpetual, non-exclusive license

          History

          Machine Learning (cs.LG),FOS: Computer and information sciences

          Comments

          Comment on this article