Researchers at Purdue have developed a system that shows what people are seeing in real-world videos; the video is decoded from their fMRI brain scans.
(Human brain decoded)
Convolutional neural network (CNN) driven by image recognition has been shown to be able to explain cortical responses to static pictures at ventral-stream areas. Here, we further showed that such CNN could reliably predict and decode functional magnetic resonance imaging data from humans watching natural movies, despite its lack of any mechanism to account for temporal dynamics or feedback processing. Using separate data, encoding and decoding models were developed and evaluated for describing the bi-directional relationships between the CNN and the brain. Through the encoding models, the CNN-predicted areas covered not only the ventral stream, but also the dorsal stream, albeit to a lesser degree; single-voxel response was visualized as the specific pixel pattern that drove the response, revealing the distinct representation of individual cortical location; cortical activation was synthesized from natural images with high-throughput to map category representation, contrast, and selectivity. Through the decoding models, fMRI signals were directly decoded to estimate the feature representations in both visual and semantic spaces, for direct visual reconstruction and semantic categorization, respectively. These results corroborate, generalize, and extend previous findings, and highlight the value of using deep learning, as an all-in-one model of the visual cortex, to understand and decode natural vision.
(Watching in near-real-time what the brain sees)
Visual information generated by a video
(a) is processed in a cascade from the retina through the thalamus (LGN area) to several levels of the visual cortex
(b), detected from fMRI activity patterns
(c) and recorded. A powerful deep-learning technique
(d) then models this detected cortical visual processing. Called a convolutional neural network (CNN), this model transforms every video frame into multiple layers of features, ranging from orientations and colors (the first visual layer) to high-level object categories (face, bird, etc.) in semantic (meaning) space (the eighth layer). The trained CNN model can then be used to reverse this process, reconstructing the original videos — even creating new videos that the CNN model had never watched.
(credit: Haiguang Wen et al./Cerebral Cortex)
In No, No, Not Rogov!, a 1958 story by Cordwainer Smith, an espionage machine is described that would actually let you see what another person saw by probing their brain:
He had then turned away from the reception of pure thought to the reception of visual and auditory images. Where the nerve-ends reached the brain itself, he had managed over the years to distinguish whole packets of microphenomena, and on some of these he had managed to get a fix.
With infinitely delicate tuning he had succeeded one day in picking up in picking up the eyesight of their second chauffeur... and had managed to see through the other man's eyes as the other man, all unaware, washed their Zis limousine sixteen hundred meters away...
(Read more about Cordwainer Smith's espionage machine)