{"ID":16211,"CreatedAt":"2026-02-27T13:00:40Z","UpdatedAt":"2026-02-27T13:00:40Z","DeletedAt":null,"paper_url":"https://paperswithcode.com/paper/emergence-of-invariance-and-disentanglement","arxiv_id":"1706.01350","title":"Emergence of Invariance and Disentanglement in Deep Representations","abstract":"Using established principles from Statistics and Information Theory, we show\nthat invariance to nuisance factors in a deep neural network is equivalent to\ninformation minimality of the learned representation, and that stacking layers\nand injecting noise during training naturally bias the network towards learning\ninvariant representations. We then decompose the cross-entropy loss used during\ntraining and highlight the presence of an inherent overfitting term. We propose\nregularizing the loss by bounding such a term in two equivalent ways: One with\na Kullbach-Leibler term, which relates to a PAC-Bayes perspective; the other\nusing the information in the weights as a measure of complexity of a learned\nmodel, yielding a novel Information Bottleneck for the weights. Finally, we\nshow that invariance and independence of the components of the representation\nlearned by the network are bounded above and below by the information in the\nweights, and therefore are implicitly optimized during training. The theory\nenables us to quantify and predict sharp phase transitions between underfitting\nand overfitting of random labels when using our regularized loss, which we\nverify in experiments, and sheds light on the relation between the geometry of\nthe loss function, invariance properties of the learned representation, and\ngeneralization error.","url_abs":"http://arxiv.org/abs/1706.01350v3","url_pdf":"http://arxiv.org/pdf/1706.01350v3.pdf","authors":"[\"Alessandro Achille\", \"Stefano Soatto\"]","published":"2017-06-05T00:00:00Z","tasks":"[\"Disentanglement\"]","methods":"[]","has_code":false}