Monday, September 29, 2008

memory network

I've looked into some links about constructions of memory networks. I'm dreaming up a design for modeling an NN within a computer memory where it's better aligned or fitter for fast processing or recollection:

Food for thought when going to work today :).

Friday, September 26, 2008

On the use of memory

The difference in memory in computers and organic memory is a strange thing. The memory in computers is linearly aligned and stores elements very differently than organic memory. Where the storage area is not large enough in computers, we tend to use databases to make the memory "searchable" and mallable.

Computers generally need to trawl through the entire space in order to find something, albeit with clever algorithms rather than an exhaustive search. We're modeling the world around us into a specific, definition to work with and then compare new input to those things we know.

Organic memory is very different. Through the input of elements and situations, the recall of other elements in memory seems activated automatically without the need for a search. As if very small biological elements recognize features in the stream of information and respond. This may cause other cells to become activated too, thereby leading to the recognition of one set of features with previous experiences. It's as if the recognition drifts up automatically, rather than that a general CPU searches for some meaning based on a description or broken down representation.

The problems of computers then is our limitation / lack of imagination to be able to represent memory in a non-linear form. The way how computers are modeled require the programmer to define concepts, elements and situations explicitly or less explicitly using rules and then search, within a certain space, for similar situations and reason further from there. It's totally lost if the situation cannot be mapped to anything.

I was thinking lately that it would be a ground-breaking discovery if memory could be modeled in similar ways to organic memory. That is, the access to memory being non-linear and a type of network, rather than developing a linear access algorithm. If you consider a certain memory space where a network may position its recognition elements (neurons?), then the connection algorithm of a current computer should map the linear memory differently into a more diffuse space of an inter-connected network. Basically, I'm not saying anything different than "create a neural network" at this time, but I'm considering other possibilities to use the mechanical properties of memory access in a clever way and as such to reduce the required memory for connection storage and to figure out a possibility to indicate "similarity" by the proximity of memory address locations.

Or, alternatively to that, use a neural network to determine a set of index vectors that will map to a large linear space. The index vectors can be compared to a semantic signature of any element. This signature should be developed in such a way that it is categorizing the element from various perspectives. Basically, considering the ability for semantic indexing of text, the technique is used to find texts that are semantically similar.

The larger the neural network, the finer its ability to recognize features. But our minds do not allocate 100 billion neurons to the (same) ability of pattern recognition. Thus, you could talk of specialized sub-networks of analysis that together define a certain result (binding problem).

But perhaps we're thinking too much again in the terms of input-processing-output as I've indicated before. We like things to be explicitly defined, since it provides a method of understanding. What if the networks don't work together in a hierarchy (input->network1->network2->output->reasoning), but work together in a network themselves?

Then this would mean that such a network could aggregate information from different sub-networks together to form a complicated mesh itself. The activation of certain elements in one part could induce the activation of neurons in another, leading to a new sequence of activation by reasoning over very complicated inputs of other neuronal networks. For example, what if thinking about a bear causes our vision analysis network to fire up / become induced and then produce a picture of such a bear?

Imagine a core of a couple of neural networks that have complicated information available to them from "processing" neural networks before them. If those core networks are interconnected in intricate ways and influence one another, then it's likely that a smell causes the memory of a vision or hearing in another network, albeit slightly weaker than normal.

Leaving this thought alone for now...

Thursday, September 18, 2008

Reusable A.I.

A miscellaneous crazy thought I had today was to look into the potential of reuse for artificial intelligence algorithms or networks. Think sharing the factors, definitions, neurons... Persisted artificial neural networks, which is only possible in computers because they are determined and extractable from memory.

The different functions of sight, smell, hearing and so forth are always seen as very specific properties of a system and it's specialized after that. What would happen once the networks are interchangeable, pluggable and can start talking to one another? Even then we'll need to make decisions on which networks to support and which not to, but next to this problem, the underlying platform... the glue between the networks, could serve as a basis for connecting grid / cloud computing of A.I. networks that each serve a particular purpose.

Some very complicated problems require some heavy processing power. I strongly believe that subdividing the problem space into smaller problems, each with its own expertise and domain, can help to significantly decrease the eventual complexity and required hardware.

Bayes theorem

In artificial intelligence courses, a start into an exploration of the Bayes theorem. In simpler words, Bayes is about discovering (strengths of) relationships between cause and effect. It's looking at events and developing a hypothesis or observing an occurrence which may have led to that event and an expression of the chance that the event occurs based on the occurrence/trueness of the hypothesis (potential cause).

This theorem is used in medical analysis for example (what is the chance for a person to have meningitis, considering the person has a headache?). Of course, the theorem can be expanded by the union of two hypothesis's. When both occur at the same time, *then* what is the chance for the event (meningitis) to occur?

For another direct application, consider credit-card fraud. If you have a large training set and you have a large database of credit-card frauders, you could determine from the data-set the probability that a female, a certain age group or people from a certain neighborhood/city/area commits credit card fraud. You could theoretically even come up with a number for a credit card applicant to state the probability the person would commit credit card fraud and so on. Of course, maintaining the view that you're dealing with probability, not certainty.

Bayes can also be used by learning systems to develop associations or relationships. It doesn't thus produce a boolean true|false relationship between elements, but a probability relationship. Theoretically, I think it should even be possible to organize different kinds of relationships (dependencies, associations, classifications) using Bayes just by looking at a large data-set. The problem here is that such an engine shouldn't be looking at all possible combinations of cause / effect, but logically reason within those, so make deductions about possibly sensible combinations.

Then one can question whether we as human beings absolutely exclude some silly statements. If we did employ true|false for each hypothesis with nothing inbetween, then we would have trouble understanding the world around us too, since it's full of exceptions. Does this suggest that some sort of Bayesian theorem is at the basis of our association determinations in a neural network?

So, it's interesting to read about this...

Wednesday, September 17, 2008

Semantic priming

Psychology and neuro-science has done research on the effects of semantic priming. Priming is the effect where you're prepared to interpret a certain word, vision or thing in a certain way. When you're primed to react to a soon-to-be-fired stimulus and you're paying attention to it and the stimulus actually occurs, the reaction time is very quick. As soon as another stimulus is given however, the reaction to that is generally very slow.

I think priming is very interesting in the context of semantic processing of text or the world around us, in that a context of a story may also prime us for what comes next. That is, we always build up some kind of expectation of what will happen next and read on happily to see what's really going to occur.

Some time ago, I posted something about latent semantic indexing. It's more or less an indexation of latent semantics. Latent means:

1.present but not visible, apparent, or actualized; existing as potential: latent ability.
2.Pathology. (of an infectious agent or disease) remaining in an inactive or hidden phase; dormant.
3.Psychology. existing in unconscious or dormant form but potentially able to achieve expression: a latent emotion.
4.Botany. (of buds that are not externally manifest) dormant or undeveloped.

So, latent means hidden or dormant. It's about semantic meaning that you can't see inside the text, but use as a key for indexing that meaning or definition. In other posts, I doubted the viability of constructing formal knowledge systems (where knowledge is explicitly documented), due to the huge size and the huge efforts required in defining this knowledge (and the obvious ambiguities and disagreements that go along with it). Other than that, knowledge is also dynamic and changing, not static.

Considering priming thus and binding this with a technique for latent indexing, one could achieve a system where related symbols are primed before they are interpreted. Given different indices for vision, smell, audio and somatosensory information, each specific index could eventually (without saying how) be made to point to the same symbol, thus strengthening the interpretation of the world around a robot or something similar.

Thus, rather than explicitly defining relationships between concepts, consider the possibility of the definition (and growing) of indexed terms which partially trigger related terms (prime the interpreter), as the interpreter moves on to the next data in the stream. This could allow a system to follow a certain context and distinguish relationships of things in different contexts, because the different contexts have different activation profiles of each symbol.

Coupling this with probability algorithms, it would be interesting to see what we find. In fact, using probability is the same as the development of a hypothesis or "what-if" scenario. Whereas a certain relationship does not yet exist, we seek ways to prove the relationship exists by collecting evidence for it.

Some other activities that we learn are subconsciously learned. That is, the action/reaction consequences of throwing an object and having it drop on the floor. If the object is of metal, it probably won't break. If it's made of glass, it'd probably shatter. Those things are not obvious to children, but can quickly be learnt. Glass is transparent, feels a certain way, and there are a number of standard elements which are generally of glass. Plastic looks similar, but makes a different sound. We should aim to prevent dropping the glass on a hard floor. This bit of knowledge is actually a host of different relationships of actions, reactions, properties of objects either visible or audible and by combining these things together, we can reason about a certain outcome.

The psychology book also importantly notes the idea of attention. It specifically states that when attention is not given, performance of analysis, reasoning or control drops significantly. This means that we're able to do only one or two things at a time. One consciously, the other not so. But that it's the entire mind with control, audible and visible verification mechanisms to control the outcome.

The interesting part of this post is that it assumes that symbols as we know them are not named explicitly by natural language, but are somehow coded using an index, which has been organized in such a way that neighboring indexed items become somewhat activated (primed) as well to allow for the resolution of ambiguities. An ambiguity is basically the resolution of two paths of meaning, where the resolution should come by interpreting further input or requesting input from an external source in an attempt to solve it (unless assumptions are made to what it means).

Another thing that drew my attention is that recent strongly primed symbols may be primed strongly in the future independent of its context. This is mostly related to audio signals and related to for example the mentioning of your name. You could be in a pub hearing a buzz, but when your name is called somewhere, you can recognize it immediately within that buzz (thus, the neurons involved in auditory recognition are primed to react to it).

It's probably worthy to extend this theory by developing the model further and considering human actions, reasoning, learning and perception within that model (as opposed to building a network and trying out how it performs). Since it's already very difficult to re-create human abilities using the exact same replicas of biological cells, why not consider simpler acts and verifying parts of this reasoning with such a smaller network?

The first elements of such a network require a clever way of indexing signals and representations. In this way, the indexing mechanism itself is actually a clever heuristic, which may re-index already known symbols and place it in a different space. The indexing mechanism doesn't feel static.

Monday, September 15, 2008

Pre-processors of information

The psychology course I'm taking requires reading through a pretty large book (albeit in not too small type and with loads of pictures). The sensory system is explained, so it's sometimes more like a biology book. It's basically stating that rods are to analyze dim-lit places and cones are for richer-lit places. The cones can discern color and have lower sensitivity.

Researchers have determined that right after the light is transduced after the cones and rods, that nerve cells already start pre-processing the information. You should compare this pre-processing to the execution of an image filter of Photoshop. It runs some edge detection filters for example, improves contrast here and there and then sends it back to the primary visual cortex for further analysis.

I'm taking some personal experiments by looking at a scene for 1-2 seconds, then closing my eyes and attempting to reconstruct that scene. Looking at it for longer or more frequently makes the image more perfect, but the actual image with eyes open has a lot more information than that which I can reliable reconstruct. Or rather... I can reconstruct details of A tree or road, but possibly not THE tree or road out there. So my belief system of what I think a tree is or a road is starts to interfere. The interesting thing is that the actual image I can reconstruct is mostly based on the edges and swats of colors.

Example detail image

Example "memory" image

It's not very clear (due to time constraints) that the middle part of the picture still has the highest detail, whereas the sides of it has less due to peripheral vision.

The mind thus deconstructs the scene first by edge detection, finding lines, but at the same time highly depends on the ability to identify complete objects. Very small children for example are already surprised or pay attention when objects that were thought to be together suddenly seem to be actually apart.

It does take some time to identify something that we've never seen before, but pretty quick we're able to recognize similar things, although we may not know the expert name for it.

By deconstructing the scene, you could say it also becomes a sort of "3D world" that we can personally manipulate and visualize further (mind's eye). So I don't think we're continuously re-rasterizing heavy and complex objects, but have the ability to consider an object whole by its edges/traces, then rotate it, translate it or do with it as we please.

In these senses, the sciences that deal with signal processing and so on should depend on these techniques heavily. It is possible to recognize objects through its pixels, but perhaps by running filters on it before, the features are easier detected and the pattern recognition mechanism might just be significantly better. Thus... the way in which signals are presented probably always require pre-processors before they are sent to some neural network for further processing. In that sense, the entire body thinks, not just the brain.

Thursday, September 04, 2008

A.I. is about search

Well, so far I've enrolled on a couple of courses. One of them being "A.I. Kaleidoscope". It's a great course with very good course material and exceptional course material from the professor.

The book I'm reading, called "Artificial Intelligence", is very well written and highlights a couple of philosophical understandings as well as explains mathematical underpinnings of A.I. that have established so far. So it's quite a broad area it is discussing.

One of the statements I come across is that A.I. is about searching problem spaces. Whereas some problems have algorithms, other problems have a state space, where states are mapped onto and where transitions from one state to another are shown as arcs in a graph. Bringing the discussion to graph theory and trees. And breadth-first and depth-first searches, heuristics, and so on.

The idea is thus that A.I. is about mapping knowledge within a certain domain and understanding the phases or steps that an expert goes through in order to come to a reasonable conclusion (reasonable meaning not necessarily optimal, but certainly acceptable).

In previous posts, I sometimes discussed that we human beings aren't necessarily purely rational, but act emotionally as if we're programmed. We think we're exceptionally clever though. Well, another part of the book discusses the fact that we only consider things intelligent that act in ways that we ourselves would and could do. You could argue for example that intelligence shouldn't be subject to such a "narrow?" definition. But philosophically, there is no common agreement on the actual definition of intelligence, so this discussion isn't that useful at this time (within a blog that is).

I'd like for the moment to disconsider the general "folk" consensus that intelligence is solely determined by human observation and imitation (dolphins are considered intelligent because they seem able to have intricate conversations in their speech and behave in seemingly human ways to our stimulus and interactions). Taking thus a slightly wider interpretation of intelligence, and accepting the statement that "A.I. is about search", you can only conclude that Google built the most intelligent being on the planet. It's capable of searching through 70% of the internet at lightspeeds, you always find what you're looking for (unless it's too specific or not specific enough) and so on. Now, perhaps the implementation isn't necessarily intelligent, but the performance of the system surely demonstrates to me, using my browser, that it's a very intelligent system.

One important branch in A.I. is about "emergence", something I blogged about recently. It's when simple individuals within their own little context and environment execute actions, which within a greater context build up to a very intricate system that no individual could control, but together displays highly sophisticated attributes of intelligence. An example could be free market mechanisms. You could say that the information that a single individual has to control the logistics of vegetables in a single city would be limited, and most likely a single individual couldn't optimize this task. But all vegetable sellers in a certain city are very likely to be apt in optimizing their local inventory in such a way that it has least waste and optimal profit. Optimal profit means having just enough for people in their local environment to benefit, but not too much to have it thrown away.

These "agents" as they are called in A.I. act on their immediate environment. But taken together on a higher level, their individual actions contribute to a higher level of intelligence or optimization than possibly a single instance, computer, individual or thing could be if they were to understand the entire problem space, understand it and optimize in it. The core of the above is that many intricate and complex systems consist of simple agents that behave according to simple rules, but by consistently applying these rules, they can achieve "intelligence" that far exceeds their individual capacity.

So, A.I. does seem to be about search, but it's not about finding the optimal. Maths is about finding optimals and truths, it's an algorithm, thus (must be/needs to be) absolute and consistent. A.I. is about a problem space, possible solutions and trying to find optimal solutions (applying "intelligence") as best as you can, but always taking into account the cost to get there.

Humans don't always find optimal solutions to problems. They deal with problems at hand and are sometimes called "silly" or "stupid" by other humans (agents).

One of the things I liked about the book is that "culture" and "society" are instrumental to intelligence. It clearly suggests that there's a need for interaction for intelligence to occur. In fact, for intelligence to exist. It highly suggests that intelligence is thus cultural, but also infused and created by the culture itself.

If A.I. is about search, and more recent posts are about semantic models, where does this leave neural networks? I think the following:
  1. You can't build a human brain into a computer due to memory, bandwidth, cpu and space constraints. So forget about it.
  2. A.I. shows that you can model certain realities in different ways. There are known ways to do this through graphs, but those graphs have too harsh and clear relationships between them. They should be softer.
  3. Searching a space doesn't exclude the possibility of indexing knowledge.
  4. Relational databases may have tables that have multiple indices. Why not knowledge embedded in A.I. systems with multiple entry points, based on the input sensor?
Thus... what if we imagine an A.I. which behaves unlike the human brain but in other ways like it, uses multiple "semantic" indices for interpreting certain contents and contexts?

Latent Semantic Indexing is a technique to describe a certain text and then give it some sort of index (rating)?. You could then do the same to another piece of text and compare the two. The rate to which the two are alike is a certain score for the similarity. Thus, LSI could serve as a demonstration of the technique for semantic indexing (and possibly storage) of other receptors as well (sensors/senses).

Imagine that a computer has access to smells (artificial nose), images (camera), audible sounds (microphone) and so on and it has the ability to maintain a certain stream of this information in memory for a certain amount of time. The information together is a certain description of the current environment. Then, we code the current information using an algorithm yet to be constructed such that it can be indexed. And we create a symbol "A" in the table (the meaning) and create indices for smell, vision and hearing to point to A. Any future perception of either the smell, or the vision or the hearing might point to A, but not as strongly as when all indices point to it (confusion).

The problem space in this example is more limited to the combination of the senses and what it means and searching for possible explanations within each "sense" area.

The difference with more classic A.I. is that the classic version attempts to define context and define reality IN ORDER to classify it. The above version doesn't care much about the actual meaning (how we experience or classify it with our knowledge after x years of life). It cares about how one situation is similar to another one. In that sense, the definition of meaning is about how similar some situation is to another.

Now... if the indices are constructed correctly, similar situations should be close to one another. Thus, a computer should be able to quickly activate other memories and records of possibly similar situations.

Monday, September 01, 2008

VU start

I visited the university today because of the start of colleges. Picked up most books except a few and then... start studying in the evenings and weekends.

The recent post about frozen realities is a nice one to extend further. The meaning and definition of "intelligence" is also one to think about before one calls a system "intelligent". It's used as a buzzword. As wikipedia states it:

"Intelligence (also called intellect) is an umbrella term used to describe a property of the mind that encompasses many related abilities, such as the capacities to reason, to plan, to solve problems, to think abstractly, to comprehend ideas, to use language, and to learn. There are several ways to define intelligence. In some cases, intelligence may include traits such as creativity, personality, character, knowledge, or wisdom. However, most psychologists prefer not to include these traits in the definition of intelligence." (source: wikipedia).

As with "emergence", we should not consider intelligence to be subject to one entity within a system, we should remain open enough to allow definitions where relating entities together, as part of one system. demonstrate intelligent activities. Just as with ants, it's possible with computers that interacting components are smarter than the sum of their individual parts.

The following is a current imagination and I may change my mind on it. Think of the human brain as a set of components that control motor functions and analysis / pattern recognition functions. The ability and function of each component is pre-determined through DNA, but the way how they interact with other components is to be learned. A supervisor in the brain controls through feedback mechanisms of the input sensors whether a certain desired result was achieved. The desired result is also dynamic and a pattern.

Actually, everything is a pattern of some kind. The patterns are stored in huge containers where each container has patterns of the same type. Pattern recognition and indexing sounds like a very complicated affair. For example, the smell of hot pizza is not just "pizza". There is no such smell. It's the smell of cheese, dough, hot cheese, tomato and everything else what's on there. The individual smells make up the rest. Depending how trained you are, we could still wonder whether someone can guess what's in the oven? The more information we use to construct our environment, the more senses we need to make sense of it.

We tend to automatically direct other senses for the confirmation of certain impulses. Such as looking into the oven, listening intently to some events and so forth.

The idea here is then that it sounds difficult for a single sense to function properly by itself. It's missing a lot of information to "get around". So the conjunction of patterns from different senses can be used to provide a deeper definition of the environment than a single sense can.

A very difficult thing is visual recognition and deconstruction. I don't think we're able to actually store every pixel of everything we see. Rather, I believe we store some kind of gist, a simple three-dimensional deconstruction of the image and some color features for each. This could differ from person to person of course and explains why certain people are artists and others are not :). The ability to see perspective is a very difficult one. We're able to see that, because we know that some things are larger than just what we see, thus we know something is in front of something else. Also, we strongly use shadows and tint differences to further analyze a scene.

In OpenGL, we speak of a pipeline to construct a scene. But when you look at an image, we also need to think of a pipeline for de-rasterization. Possibly, as soon as a computer is able to construct simple wireframes from simple images, we're steps closer to creating a computer that can recognize objects easier.

Whereas many people consider the definition of things equal to "naming it", this is also wrong I think. A definition of something can also be considered: "a formal recognition and method of communcation through symbolic means, such that it calls up the same pattern and recognition with another intelligent being". Thus, we need not restrict ourselves to using words. If we were telepathically endowed and could transfer our thoughts, we'd probably call that "definition" instead.

So, calling something "car" then is just a specific name for a specific generic pattern. Moreover, the specific car that one has in mind is likely different from somebody else. So the patterns we really evoke are possibly different from person to person, yet we all catch the gist of the message.

If you consider that a pattern can be associated with a term, why not consider the possibility that certain indexes can be given a certain name or list of names?

Then, intelligence becomes the ability to reason with those patterns that are similar to some extent in order to make something out of it that re-defines your reality. A certain kind of juggling and analysis on similarity, possibilities, abilities and relationships (plus defining new ones, aka learning) where certain rules exist that should not be broken. For example, most cars can't drive on water. When your car leaves, so do its tires. A car should not drive with its door open. A car should have wheels to drive.

The problem of reasoning then is how rules are embedded within this reasoning system. The patterns *are* the symbols and the names it has been given. Semi-patterns that are lightly activated form new possible paths. Is it that the path of one pattern to another forms the embodiment of a rule?