And then the following:
I'm not sure if anyone has ever considered to join neural networks together in a sort of serial combination. The difference between these networks is that for example vision is only allocated the task of recognizing images / shapes / forms / colors and translate then into numbers. And the auditive system processes sounds. If you look at the images closely, you see that there are actually two kinds of networks for each perception method. An associative cortex and a primary cortex.
If you look at the production of speech, it's a number of different areas all working together. This gives us clues about how human speech is really put together.
Imagine those networks all working together. As a matter of fact, there are more neurons at work than just the brain. The eyes also have neurons and are already the first stage of processing optical information. Suppose we'd like to make a computer utter the word "circle" without simply recognizing the circle and play a wave file. We'd have to make it learn to do so:
- Convert the pixels (camera?) to a stream of digital information, which can be processed by the visual cortex.
- Analyze the shapes or image at the very center of the image (see motor reflexes and voluntary movement of the eye to accomplish the scanning of the environment for receiving more information).
- The visual cortex will then produce different signals as part of this exercise and the reverberating cell assemblies generate new input signals for the more complex processing of such information (memory? context?)
- This is then output to the speech area, where the "words" are selected to produce (mapping of signals to concepts).
- The information is then passed to the Broca area, where it is contextualized and put into grammar.
- The instructions of the Broca area (which could have a time-gating function and verifies the spoken word with the words that should be uttered), are sent to the primary motor cortex, which produces speech by frequent practice
- The speech organs move as in concert by the simple emission of information towards the speech organs.
So, this sequence shows that these areas work together and that together, the emergent phenomenon can produce very interesting behaviours.
I'm not sure if these networks can be built by just trying them out at random. There's also a huge problem with the verification of the validity of a network. We can only validate its validity whenever the output at the other side (hearing the word) makes sense due to the input (the visual image of a circle). Everything that happens inbetween can develop problems in this entire circle. Also, there is expectedly a very large learning curve required to produce the above scenario. Remember that children learn to speak only after about 1.5-2 years or so, and then only produce words like 'mama' / 'papa' (as if those words are embedded memory in DNA).