Monday, January 25, 2010


For my work, I've been looking at neural networks in more detail, but specifically how they may be applied to problems with temporal properties. Most artificial neural networks do not store their values or activations as some kind of remanent activation very well. Such neural networks can only work in situations where the input applied to the network is complete enough to derive conclusions from it. In that case, it doesn't matter if there is a known formula for the correlation between input and output values or whether there is one, albeit a complex one, that is being approximated. Such direct correlations after one single frame of observance is the typical neural network that you get today. More recent developments also give you recurrent networks or LSTM cells, but those model patterns or 'data' over time. Thus, they establish correlations or relations between sequences of events. For all these networks, you'd generally expect a mathematical explanation or approach to back up the learning methods applied for it (if they haven't been trained with evolutionary algorithms).

Even in the case of evolutionary algorithms though, there is significant complexity involved in training the network so that it performs well. Unless the input/output correlation between classification or pattern recognition is very strong, which makes any problem quite easy. Basically, this means that in the same way that a formula calculates a set of output value(s) in a deterministic way (and 1+1 is always assumed to equal 2 and never changes), a neural network is just a very deterministic approximator. Given a trained network, any two inputs yield always the same output. This is true, unless you use an RNN. But the RNN is designed to work on temporal problems, so given the same sequence, this yields the same output to classify or reproduce the sequence.

The picture above shows a neuron receptor that is involved in a process called chemo-taxis. This process allows the worm to navigate in an environment where food is available. The worm has no eyes, can feel heat (thermotaxis), and wriggles its way within some environment. If the worm is wriggling towards a food source, it's climbing up an increasing gradient where the chemicals picked up by the neuron increase. When it climbs up an increasing gradient, the worm is less likely to make sharp turns. If the worm is on a decreasing gradient or doesn't detect any food source, it is more likely to make sharper turns.

Such a problem sounds very simple to resolve, but remember that we have eyes. How do you locate a particular source of smell if you only have a nose? You need to move around in a room, being careful not to cause any harm to the body. Depending on the sensitivity of the olfactory sense and the distance of the smelly source, the distance between one sample and another must be larger or may be smaller. The samples also must be taken within a certain amount of time, otherwise no significant difference can be observed.

The above shows how some organism, actor or agent must interact with the environment in order to sense it and make conclusions about the layout or composition of the environment. Without the ability to perform actions in the environment, I don't think there can be any development of intelligence at all.

Such statements are derived from a particular field of knowledge called 'cybernetics'. This sounds like some science fiction term I'm sure, but it's not just related to cyborg, cyberspace, the internetz or what have you.

Cybernetics is a recent term from the 40's and 50's defined by Norbert Wiener. The word is derived from the greek "kybernētēs", which means helmsman, rudder, etc. and has the same root as government. It was initially intended to study self-regulatory systems, but quickly expanded to other areas like management (Stafford Beer), biology and also computer science. The science is quite broad in scope now and poses questions about how systems make sense of their environment by interacting with it. Finding out what an environment is about isn't exactly the same as just perceiving it.

The difference with Artificial Intelligence is quite strong. Cybernetics is more theoretical, whereas A.I. has basically just gone ahead with a lot of assumptions, trying to wrap reality into mathematical models and executing them on a computer. My criticism of A.I. until this point is that it is simply too deterministic for anything 'new' to come out of it. The term A.I. has been a bit bold as well. Sure, we have large-scale systems analyzing emails and other kinds of data and knows how to extract information from it to make predictions, classifications or specific groupings of data. But those capabilities have been intended by designers. There is nothing that such algorithms or machines can ever do different but execute those particular implementations.

In that sense, expecting anything truly intelligent to come out of A.I. as it is currently approached (through mathematical formulas) is very hopeful at best. A better understanding of knowledge, its roots and how it may be represented over time and interacts with new environments is needed.

That is not saying that A.I. is useless. It's saying that A.I. certainly has approaches for doing pseudo-cognitive work by helping out massaging details out of very large mountains of data or finding specific correlations that we cannot even imagine. So there's certainly room for the A.I. as we know it now and some new kind of vision on what truly intelligent machines can bring us. Those kinds of machines however need a different basis than just mathematics and strong correlative behaviour. They need to find out for themselves what is important and have their own ways of interacting with the environment (which is how they find that out :).

No comments: