During deep learning, connections in the network are strengthened or weakened as needed to make the system better at sending signals from input data – the pixels of a photo of a dog, for instance – up through the layers to neurons associated with the right high-level concepts, such as “Dog.” After a deep neural network has “Learned” from thousands of sample dog photos, it can identify dogs in new photos as accurately as people can.
The magic leap from special cases to general concepts during learning gives deep neural networks their power, just as it underlies human reasoning, creativity and the other faculties collectively termed “Intelligence.” Experts wonder what it is about deep learning that enables generalization – and to what extent brains apprehend reality in the same way.
Some researchers remain skeptical that the theory fully accounts for the success of deep learning, but Kyle Cranmer, a particle physicist at New York University who uses machine learning to analyze particle collisions at the Large Hadron Collider, said that as a general principle of learning, it “Somehow smells right.”
Tishby and Shwartz-Ziv also made the intriguing discovery that deep learning proceeds in two phases: a short “Fitting” phase, during which the network learns to label its training data, and a much longer “Compression” phase, during which it becomes good at generalization, as measured by its performance at labeling new test data.
The scientists saw the same convergence of the networks to the information bottleneck theoretical bound; they also observed the two distinct phases of deep learning, separated by an even sharper transition than in the smaller networks.
Brenden Lake, an assistant professor of psychology and data science at New York University who studies similarities and differences in how humans and machines learn, said that Tishby’s findings represent “An important step towards opening the black box of neural networks,” but he stressed that the brain represents a much bigger, blacker black box.
Tishby believes his information bottleneck theory will ultimately prove useful in both disciplines, even if it takes a more general form in human learning than in AI. One immediate insight that can be gleaned from the theory is a better understanding of which kinds of problems can be solved by real and artificial neural networks.