(Medical Xpress)—Visual perception is far more complex and powerful than our experience suggests. Moreover, in attempting to both understand vision and implement it in a computational device, the fact that a species' senses developed in concert with the ecological niche in which that species evolved. In our case, that means an evolutionary visual context consisting of natural objects, including mountains, rivers, trees, and other animals. Noting that neural representations of visual inputs are related to their statistical structure, and natural structures display an inseparable size hierarchy indicative of scale invariance, and scale invariance also occurs near a critical point in wide range of physical systems including ferromagnetic), researchers at the Salk Institute for Biological Studies and the University of California-San Diego recently demonstrated what their paper describes as "a unique approach to studying natural images by decomposing images into a hierarchy of layers at different logarithmic intensity scales and mapping them to a quasi-2D magnet."
Prof. Terrence J. Sejnowski describes the research he and Dr. Saeed Saremi conducted, starting with the challenges they faced. "The traditional way images are represented in vision is by an array of pixels with gray levels," Sejnowski tells Medical Xpress. "However, we know that visual perception is based on a log scale of luminance. The challenge was to find a new representation that would make the log levels explicit." In addition, Sejnowski points out that Dr. Saremi came up with the idea of using bit planes, later generalized to power in any integer base.
"Once we started looking at the bit planes of natural images," Sejnowski continues, "it became apparent that each layer looked like a 2D Ising model at different temperatures – that is, the high-order bits were cold and the low order- bits were hot." An Ising model is a mathematical model of ferromagnetism in statistical mechanics, consisting of discrete variables that represent magnetic dipole moments of atomic spins that can be in one of two states (+1 or −1). Taken together, Sejnowski explains, these bit planes represent a 3D quasimagnet with interesting properties.
Understanding retinal encoding – and possibly obtaining further insight into how the neocortex represents scale invariance – requires, in turn, an understanding the statistical structure in natural image hierarchies. Moreover, the brain is not a passive image receptor, but rather actively generates sensory models derived from sensory experience. Since the so-called Bolzmann machine (spin glasses with arbitrary connectivity running at a finite temperature, generalizing Hopfield nets, which run at zero temperature) can represent image statistical structure, Sejnowski and Saremi developed a unique approach in which certain aspects of the Boltzmann machine's input representations are learned from natural images
"Geoffrey Hinton and I introduced the Boltzmann machine in the 1980s as a model for multilayer neural networks," notes Sejnowski. "We showed that there is a remarkably simple learning algorithm that finds the connection weights for a network that could represent the probability distribution for an ensemble of inputs." When Sejnowski and Saremi applied Boltzmann machine learning to natural images as inputs, they found positive pairwise connections that fell off with distance on each layer, much like the 2D Ising model for a ferromagnet, and negative pairwise weights between the layers, representing antiferromagnetic interactions.
The theory of second-order phase transitions was key to understanding the significance of what the scientists had found, Sejnowski says. "There were 15 bit planes (corresponding to pixels with 15 bit integers), each corresponding to a different temperature. We were astonished to find that there was a phase transition at bit plane 6 with the same critical exponent as the 2D Ising model."
"Scale invariance had been observed in natural images for decades based on the power law drop-off in power as a function of spatial scale," Sejnowski explains. "At a phase transition, the spatial correlation length becomes infinite and there is a critical slowing." This suggests, he adds, that the reason there is structure at every spatial scale in the natural world is because nature is, in some sense, sitting at a phase transition between order and disorder.
In terms of the evolution and neurobiology of perceptual invariants, Sejnowski notes that biological systems that have evolved to survive in this world may take advantage of this structure, and in particular the organization of the visual system may reflect those statistics – and most of the information in natural images is captures in 3 bit planes, which may be why photoreceptors are linear over a single order of magnitude. Finally, he adds, adaptation mechanisms in the retina shift the linear region over 10 orders of magnitude in luminance.
Moving forward, Sejnowski says, "We've trained the Boltzmann machine on only the connections between pixels in the "visible" input layer. "The next step is to use this as the input layer in a hierarchy of "hidden" layers, such as that found in our visual systems, which is around 12 layers deep." He adds that since the 1980s, there have been great advances in computer power and algorithms that now allow Boltzmann machines to be trained in deep networks.
In the longer term, Sejnowski concludes, this new input representation may benefit computer vision.
More information: Hierarchical model of natural images and the origin of scale invariance, PNAS February 19, 2013 vol. 110 no. 8 3071-3076, doi:10.1073/pnas.1222618110