Showing posts with label restricted boltzmann machines. Show all posts
Showing posts with label restricted boltzmann machines. Show all posts

Sunday, June 28, 2009

Can computers become conscious?

This is a post to contemplate about a paper I wrote, available here. When discussing the possibility whether machines can become conscious, reference is made towards the necessity to localize memory, cognition and all these other factors. In short, to have a discrete description of the system we call the Brain. Should we not understand it in full, then all hope is lost.

I'm taking a different view on things. "Emergence" for example shows that many, many small actions that are discrete in nature and often very simple, interact together to eventually become a massively complex system that no single, descriptive, general rule can describe. It is easy to describe the simple behaviour from a single agent, but it's impossible to understand the actions and consequences of the system as a whole. I reckon that we may not need to understand this entire system, but can start from the bottom by replicating certain behaviours and look at certain clusters in the detail that is still fathomable. Then attempt to replicate those clusters and move upwards in the chain.

Oh well, to prevent a rant on the same, on to the whitepaper then:
This whitepaper draws a comparison between Restricted Boltzmann Machines and human consciousness using a quantitative analysis of the capacity for the integration of information. The probability that computers can become somewhat conscious of their inputs is discussed. Consciousness of computers implies the capacity to interpret data, understand it, manipulate it and possibly to produce new data based on previous examples.

The same neurons activated by observation are also activated when dreaming or imagining. Restricted Boltzmann Machines work in a similar way; [....] This makes it plausible to construct computers that have some kind of imagination, [....] the type of consciousness isn't necessarily equal to our own [...]
Happy reading!

Tuesday, May 12, 2009

Why RBM's are so strangely weird

I'm getting quite obsessed by RBM's for some strange reason. There's a very strange simplicity to the RBM, a very elegant method for learning through contrastive divergence and a very strange ability for an RBM to model many things. The current science shows and understands that RBM's certainly have limitations, but here we go to try to expand on that.

An RBM is a very strange kind of neural network. Artificial neural networks the way we know them generally work the signal in a forward direction, but RBM's work in a forward and backward direction. In a sense, you could say that it's a little bit similar to our minds in that, when we observe something, we both use the details from the input signal to enrich the actual observations, but at the same time use the information from our experience to enrich or expect what is being observed. I reckon that if we were to only rely on the observed state, that state wouldn't nearly be as rich as our mentally induced state, which blends our experience with our observations.

Here's a video that might blow your mind or not... It's a presentation from Giulio Tononi, which I found very compelling. In this theory, it's not a search about the quantity of neurons required to become conscious, or the localization of consciousness within the brain, but it's more of a theory of the most effective organization of neurons within a network for such a network to exhibit consciousness. (text)

Here's where I paid huge attention. Apparently, having a network that has all neurons connected together is total crap. And a network that is very large and has a high number of local connections can be good at something, but it doesn't have the right properties for consciousness. The best thing is a network with specialized neurons, connected in patches, with long connections now and then to other parts of the entire mesh. Much of the work there is related to quantifying consciousness. By quantifying consciousness, and if this quantification is in step with actual consciousness, one can continue to search for more effective methods of building neural nets or machines.

The property about "patchy-ness" suggests that blindly connecting neurons together isn't the most effective way to build a network. A highly regular network makes any system act like a general on/off machine, losing its specificity of function. Neurons that are not connected enough make it work like having a number of independent classifiers, which isn't good either.

Most NN's and RBM's build their theories around having x number of neurons or elements connected evenly together with other layers and then calculate a kind of "weight" from one element to another. Putting more neurons into a certain layer generally makes the network more effective, but improvement is generally asymptotic.

I wonder whether it's possible to develop a theory, complementary to the theory of the quantity of consciousness, which perhaps as some derivative allows a neural network to shape the network itself, or whether such theories provide better rules for constructing networks. One good guess would be to do observations of biological growth and connection-shaping of a brain or simpler parts and then assess the patterns that might be evolving in the generation of such a network.

Finally, the most interesting words of the hypothesis:

Implications of the hypothesis

The theory entails that consciousness is a fundamental quantity, that it is graded, that it is present in infants and animals, and that it should be possible to build conscious artifacts.

This is a huge implication. And in order to understand it, one should go to the start of this post. Consciousness == experiencing things. As said before, it means that our observations carry detail, which are processed by itself, but which are also completed by previous experiences. Thereby, our actual experiences are not just the observations we make, but a total sum of those observations plus memories, evoked emotions, etc. In a way, you could say that what we observe causes us to feel aroused, or have some kind of feelings, and seeing similar things again at a later point in time might cause us to see the actual observations + previous experiences (memory) at the same time. It's very likely that not all experiences are actually consciously lived, in the sense that we're aware of all possibilities of experiences that we could actually experience, very likely there are many experiences just below the surface of consciousness as some kind of potential or stochastic possibility, waiting to be activated by changes in the temporal context.

For example, rapid changes in our direct observations can cause instant changes to behaviour. This implies that next to observing the world like a thought-less camera, consuming light-rays and audio waves, we're also experiencing the world as a kind of stochastic possibility. The easiest example of demonstrating this is the idea of movement, of intent, of impact and likely effect.

The phrase: "I'm standing at a train station and I see a train coming towards me" contains huge amounts of information. The recognition of the train in the first place, the experience that it's moving towards you by the train becoming larger, the knowledge that the train runs over tracks that you're standing next to, the knowledge that train stations are places where trains stop and your intent to get on the train. Just telling here how much knowledge we apply to such a simple situation demonstrates how we're accepting our consciousness as the most normal thing on earth, which it certainly is not.

Well, so why are RBM's so strange in this sense? Because old-school neural networks don't have these properties. RBM's can both recognize things, but also fantasize them back. There are certainly current limitations. In previous posts I've talked about consciousness that we shouldn't perhaps limit the definition by "consciousness == when humans think or experience". When maintaining a broader definition of consciousness, one can also consider machines or A.I.'s which are extremely effective in a very particular area of functioning and might just be consciousness in that relevant area without having any kind of consciousness of things around. The definition of consciousness here however is a dangerous one, since it shouldn't be confused with behaviour, which it certainly is not.

Food for thought...

Tuesday, April 21, 2009

Digging RBM's

Okay, so here's some little information about Restricted Boltzmann Machines as it applies to the Netflix prize. I haven't got it working perfectly yet, but getting close. The paper may be a little bit challenging to start off with, but once you get the objective right, things are essentially pretty easy. I'm referring to the paper "Restricted Boltzmann Machines for Collaborative Filtering". The picture here shows the gist of the technique of producing predictions using this method. A user can be represented by its vector. The user vector is basically per column the rating of a movie. If a movie was not rated, then that column is not used in the calculation. By using all the movies that the user rated, the 'hidden part' of the network is loaded into a certain state. This state can then be used to reproduce figures on the visible state of the network, where the missing movie ratings are. And yes, to calculate user/movie ratings, it's simply the act of calculating this rating based on the hidden state in the network. This is done by weight multiplication with the active features in the hidden part, generating the softmax units for that missing movie rating. Hopefully it should approximate what the user really rated.


So is it a neural network? Not really. This Boltzmann machine has the ability to fantasize about what something should look like, based on parts of its observation. So you can use it for various purposes, from pattern recognition to completing parts of a picture or pictures, as long as there are recurring patterns (you can't ask it to reproduce something it's never seen before). Well, at least not yet :).

The way how this thing is trained can get pretty complicated. But enter "Contrastive Divergence". This method allows the machine to be trained pretty quickly. Imagine that by the vectors at the lower part, you're multiplying each individual softmax (5 softmaxes per movie) by its individual softmax weight and then add the bias from each hidden unit to that. Each rated movie in the user vector will either contribute or diminish the activation of the hidden unit. This is the positive activation phase. By the end of this phase, the hidden layer has F units, where x are activated ( 1.0f ) and y are not activated( 0.0f ). Yes, that is thus a binary activation pattern in the simple case (not Gaussian). Sampling in this paper means:

if ( hidden_unit_value > random_value_between_0.0_and_1.0 ) {
hidden_unit_value = 1.0f;
} else {
hidden_unit_value = 0.0f;
}

As soon as the hidden layer is calculated, we now reverse the process. What if the hidden layer was already trained? Then by having the state in the hidden layer as it is now, we should be able to reproduce the visible layer. But if the hidden layer has not yet been trained, we'll soon see that there's a certain error that'll occur. And if we re-do the positive phase again after that, we'll also see differences in the hidden layer activation.

Now, here comes the beatdown of the contrastive divergence algorithm:
  1. If a hidden feature and a softmax unit are on together, add 1 in the box for that observation (visible_unit_i_softmax_p_feature_j). Thus, stores the frequency that both units were on together. You probably need a 3-dimensional matrix to store this information. And this makes sense, because you also have a weight per movie per softmax per feature. And we want to train those. Let's call this matrix CDpos.
  2. After performing the negative phase and the positive phase again, repeat this process. We store those numbers in other structures suitable for this. Thus, we now have the frequency that softmaxes and features were on together in the first positive phase and the softmaxes and hidden features that were on after one negative and one positive (but you can actually do this a number of n times, as described in the paper, after learning has progressed). The number of epochs for learning is small, the paper mentions 50. Let's call this matrix CDneg.
  3. The learning phase basically comprises subtracting CDneg from CDpos, then updating the weight through a learning rate. Thus:

    W(i=movie,j=feature,p=softmax) += lrate * (CDpos - CDneg);

  4. You can make this fancier by including momentum and decay as mentioned in the paper:

    CDinc = (MOMENTUM*CDinc)+(LRATE*cd)-(DECAY*W);
    W += CDinc;


    UPDATE 30/04 | This should be:

    CDinc = (MOMENTUM*CDinc)+LRATE * (cd - (DECAY*W));

  5. The trick in the negative phase is also if you want to sample the visible vector reconstruction or not. Or do this only in the training phase and more of those decisions. I'm sampling always in the training phases, but only the hidden layer in the prediction phase.
  6. In the prediction phase, I'm normalizing the softmax probabilities, then add them multiplied by their factor, then divide by 5.0f. You could also take the highest probability and then guess on that number. I chose my method, because I think it's likely more accurate in the future. It's got x probability to be a 1, y for a 2, and so forth. Thus, it's dealing with probabilities and if there's a strong pressure for a 5.0, it'll be a five. Otherwise somewhere between 4 and 5.
The rest of the paper are further elaborations on this model. Gaussian hidden features and conditional RBM's. Conditional RBM's basically allow the machine to also learn from missing ratings (so, rather than training just on what you have, you also train on what you don't have. Brilliant!). It also allows to use the information in the qualifying and probe set. So, the machine will know they were rated, but doesn't know to what. That is information and a great thing to add to the model.

Hope that helps!

Wednesday, April 15, 2009

Consciousness and RBM's

I was exploring some thoughts regarding the origins of consciousness. It's a huge subject to read about and certainly one that allows one to specialize in different ways on this subject. So far, I've read books from Steven Pinker, Roger Penrose, John Holland, general articles on the Internet on this topic, philosophy of mind and the likes.

The one thing that really struck me when I was watching this video, is how important the "fantasizing step" for consciousness is. Fantasizing == imagination == abstract thought and (abstract) manipulations of things seen before in order to construct something new out of past experiences.

So far, neural networks have been viewed from the perspective of recognition, not so much from reproduction of certain action. Also, most neural network activity is one-way. It's true that the learning process requires backwards propagation for weight adjustment, but the general execution phase is from input -> output, never backwards.

But RBM's have the property that they can both be used for recognition as well as production. The production phase is useful for things like prediction. Basically, prediction is all about recognizing patterns or important forces that strongly suggest that reaching a certain state or value is a higher probability than any other. This can be done by reasoning, or calculation or whatever. (see from 21:38 in the video mentioned above to see this).

Now, here comes the fun part. One could consider imagination (+innovation) to be:
  • Constructing coarse solutions through reasoning (needs to have A + B not C).
  • Filling in the blanks of generated course solutions.
  • Backtracking over complete blanks, defining the unknown as a subproblem to resolve prior to resolving the bigger picture.
The interesting parts of these thoughts is that it provides ways for a machine to actually construct thought itself, as long as the premise is true that inputs into the machine can be represented on a very abstract, symbolic level and the machine actually has some goals to follow. Thus, given some goals, it develops subgoals, interprets the environment and constantly redefines subgoals and so forth. There are things missing here of course, but you should think at a very abstract level of representation.

Think of the mind as a huge network of connections like a RBM with different stacks, where different types of processing occur. At the neurons near the eyes, the light gets interpreted and already it contains some sort of pre-processing like edge detection and so on. The next step is to start recognizing shapes between the edges and blotches of color. What does it all mean? I highly believe that we don't nearly store as much detail as we think we do for visual processing. And 100 billion neurons isn't really that much when you think about the amount of information we're really storing, especially when parts of these neurons are contributed to specific tasks like speech production, visual recognition, speech recognition, audible signals recognition, pre-frontal cortex processing (high-level abstract thought), emotional supression / understanding, and so forth.

Now, with consciousness... what if what we're seeing really is the induction of a high-level abstract thought in the pre-frontal cortex towards the lower hierarchical layers in this huge network? Then consciousness is more or less like reliving past experiences in sound, vision, emotion and the likes. It still raises questions on where this induction starts (ghost in the machine), but this may also be explained by (random?) the inverse of the operation, namely the occurrence of a certain emotion, the observation of a visual appearance, the hearing of some sound or the smell of something.

Now, especially olfactory memory, the latter one, is interesting. By smelling freshly cut grass or other specific smells, we sometimes immediately relive experiences from our youth. This is not something that is consciously driven for example, but as of yet totally happens. This is relived not just by smelling the smell, it's a visual, audible and emotional thing as well. The interesting part in this that we seem able to steer our thoughts (concentrate) on certain parts. Steering away from certain thoughts is much more difficult (don't think of a black cat! oops, I asked you not to think of one! :).

So... here goes... can thought be seen as a loop between abstract ideas and reproductions of those abstract ideas made from previous experiences? Someone not having many experiences won't have a lot of imagination in that sense and others with loads of experiences and knowledge may be able to infer more about certain qualities and consider a range of other possibilities and options (experience).

And the other question... can this be used as a stepping stone to model thought in machines? and if possible, could we call this consciousness?