Monday, July 16, 2007

The problem: inferring meaning for computers

I browsed Wikipedia on the "Meaning of meaning". In order to allow computers to search the web semantically, it is necessary to allow a computer to understand meaning or at least map it to a category/number/element, so that it can infer relationships between words, passages and texts overall (between documents). I reckon this is computationally very intensive. It is necessary to better understand the concept of meaning in an attempt to represent it for a computer.

Well, reading Wikipedia, which is of course not the best reference on knowledge but acceptable for starters like me, I see that there are a number of very difficult problems arising when mapping meaning towards a mathematical element.

Meaning is induced by the environment and the interpretation of elements of a language. One text noted that knowledge is not stored as a linear corpus of text in the mind, but rather more like a network of elements that together represent the idea or concept. This means that rather than recalling the text corpus that describes the idea (after reading it the first time for example), knowledge is continuously reconstructed from the stored elements that we find (individually) important and relevant. This seems to mean that memory and the method how things are stored are very relevant for semantics. This explains also quite well how interpretation (based on experience) allows one person to totally misunderstand another, even though the language may be correct.

The problem with computers is that they are in general stateful (stacks, memory, CPU cache) and process one thing at a time. Consider for example the following paragraph from Wikipedia:

"In these situations "context" serves as the input, but the interpreted utterance also modifies the context, so it is also the output. Thus, the interpretation is necessarily dynamic".

It's easy to understand that when we process a certain corpus of text, the meaning and interpretation of that text will change as we scan it. This to me means that the analysis of a text in itself in one pass does not equate to the continuous, recursive analysis of that text, since the text itself is able to modify the context in which it is read. There is a feedback in the text that a computer will need to simulate. It seems that the more I read about semantics, the less I find computers able to simulate the mind processes that lead to understanding of meaning and communication of ideas. Let alone searching for it in a 400TB database (Internet).

Besides natural language in text form or speec, we are able to make sounds, facial expressions and we communicate through body language. The total of these elements will form a larger message that a computer cannot process. Also the emotional weight of certain texts is difficult to simulate for computers.

As I have written before, it does not seem possible at the moment to reliably construct a mathematical model for semantic search that works. There are only parts of the problem as a whole that can be simulated (a better word is approximated ).

Whereas it would certainly be very interesting to see whether semantics as a whole can be better approximated if we apply further matrix operations on matrixes of different purposes. For example, we could use LSI and LSA to consider relevance of one text to another on a very dry level, but multiply this with the knowledge of a particular context of reference, also represented in another matrix in the hope to find something more meaningful.

Matrices seem very useful in the context of deriving knowledge out of something we don't really understand :). A neural network is a matrix, LSI uses matrices and probably it's possible to come up with different matrices that represent contextual information or an approximation of context itself.

Assuming that we have a matrix for a concept or context, what happens when we apply an operation of that matrix on an LSI document? It may be far too early to do that however. In order to come up with anything useful it's necessary (from the perspective of the computer) to come up with a certain processing pipeline for semantic search.

These efforts probably also require us to re-think Human Computer interaction. A lot of our communication abilities are simply lost when we interact with a computer over the keyboard, unless we assume that our ability to communicate those concepts through language is very precise. As I said before, when we communicate and we communicate with people that have similar experiences, the level of detail in the communication need not be very large. This is because the knowledge reconstruction at the other end is happening more or less the same way (based on rather crude elements in the communication), which means that a lot of details are not present in the text. A computer might then find it very difficult to reconstruct the same meaning or apply it to the right/same context.

A further problem is the representation of knowledge, context and semantics. We invented data-structures like lists, arrays and trees that represent elements from quite restricted sets. The choice between these structures is governed by the general operation that is executed upon them and decisions are led by resource or processing limitations. However, the data structures were generally developed on the basis that the operations on them were known beforehand and the kind of operation (and utility of each element) is known at or before processing time.

Semantic networks (or representation of knowledge and/or context) do not exhibit this requirement, seemingly:
  • A representation of a concept, idea or element is never the root of things, or at least not a root that I can easily identify at the moment. Does the semantic network have a root at all? I imagine it more to be an infinitely connected network without a specific parent, a network of relationships.
  • The representation of a network in a computer data structure is not basic computer science.
  • Traversing this network is very costly.
  • The memory requirements for maintaining it in computer memory as well.
  • It is unclear how a computer can derive meaning from traversing the network, let alone apply meaning to the elements for which it is traversing the network.
  • Even if there are specific meanings that can be matched or inferred, the processing power is likely very high.
  • The stateful computer is not likely to be very helpful in this regard.
The latter is based on my imagination that the mind does not maintain a lot of state, but seems more a very rapid "functional language computer". Rather than retrieving meaning A or meaning B from memory directly based on the factors of a lookup, it reconstructs a meaning from smaller elements.

This goes back to a philosophical discussion on what the smallest elements of meaning are and how they interact together.

No comments: