Tuesday, August 26, 2008

The frozen model of reality

As soon as you've gotten out of the coding trenches as a developer, you'll soon be confronted with the need to model reality. Basically, the idea about software development is to take a snapshot in time of reality that works for the current set of requirements and then automate it into some program or system. The larger the system, the more difficult the freeze gets and thus the more frequent are the change requests or people trying to push things through.

The problem, obviously, in freezing reality is that as soon as you defrost it again, you'll need to find a new equilibrium or frozen point in time, the target. The ability to move from one frozen state to another is your flexibility of the architecture, the clarity of the solution to information analysts and everything and everyone inbetween that can put their foot in the door or wants to have anything to do with it. Successful implementations tend to attract a lot of success-share-wannabees, poor projects tend to attract a lot of attention from people that know how it's done and people that refuse to work that way.

Anyway, the problem with freezing reality and development is that you're in a stop & go process. You can never go in a continuous changing way to some new hybrid form of operation, it's always a little bit or a lot, come to a complete stop, wait, then determine your new direction. Confusing to many, since we tend to differ in opinion on the best direction to take afterwards, or even differ in opinion what the frozen state looks like or even what the soon-to-be-defrosted state should look like.

The freeze is required because development and coding is the formalization of rules and processes in a particular moment in time. Thus, software engineering is basically freezing reality as little as we need to, but as much as we should, to make us more effective from that point onwards. Luckily we still employ people that use software and can think up their own ways around us to still enable a business to grow and bring it forward, otherwise we'd really be in the ... .

Anyway, a very simple conclusion could be that any formalization of reality for a period of time is therefore subject to inflexibility in the same way that a formal representation of anything is just a particular perspective of that thing in time (and fashion?).

If you look into the problems of software engineering, the actual problems that we still encounter nowadays have not changed a single bit, but the technologies have. Any new technology comes with promises that "modeling" the enterprise with that technology is going to make it more flexible, yet it always starts with the formal chunking of reality so that a program can actually carry out some work. It's true that technologies have made it easier and faster to develop programs, mostly the 1GL, 2GL, 3GL and 4GL phase and we're getting closer to the business language due to methods of specification, but we're not changing the real method behind it, the formalization of reality at a point in time.

In order to make machines really intelligent, we should exceed our own limitations, since we depend on formalized and static models to comprehend something and from those models we re-build our solutions.

As an example, I imagine an artificial intelligent system that doesn't attempt to formally describe an object once, but reshapes and details the object as soon as more information becomes available to describe it and should probably even be able to split objects into different ones as soon as a major axis of separation (a category) becomes available to make it distinct.

Depending on who you ask in life, people give you different answers on trees. Some people know only one tree: "the tree". Other people know their pine trees from their oak trees and yet other people can identify trees by their leaf silhouette. So somewhere and somehow, we tend to further categorize items as soon as we get swamped by too many symbols in the same category. We're luckily very apt in finding specific differences between types and especially how they are common, so that the categories have valid descriptors.

But... one does not grow up and think of a tree as a pine or an oak, we think of it as a tree first, then later it is identified as a tree of a specific type. We can use smell and vision to identify a pine tree, even tactile functions. The combination of smell and vision is a very powerful identifying function, vision alone or smell alone might still throw us off.

Now, making this post a bit specific to python. Python has a process called "pickling" that is used to persist objects in storage space. In artificial intelligence, the neural network often acts in certain phases. The phase were it learns and adjusts according to feedback and a phase where it executes and recognizes new input symbols. We're too afraid to let the network run in those two modes at once. Either because we can't predict the result or because we're not using the right model for it even then. The human brain though is constantly stimulated by recognizing something it saw before that was alike, but slightly different or still "exactly the same". I believe we're constantly adjusting our images, smells and other input signal patterns as we experience them.

But without a suitable learning process on the outside that is connected to a working learning machine in such intelligence, it won't go far. In this view, I'm considering that the human brain has the innate capacity to learn about symbols, but just needs to experience the symbols to make sense of them eventually how they interact and relate to one another. It's not very uncommon to meet somebody that has entirely different viewpoints and experiences, or interpretation about their environment than you do.

Thus, the problem in A.I. at this time considered isn't necessarily so much about how we define a proper ontology (since that ontology is also based on the current snapshot and perspective we have, our "frozen model"), it's about how we define a machine that knows nothing about symbols, but has the capacity to understand how symbols (patterns from different inputs) relate together and perhaps even has the ability to reason with those symbols, but that's taking it another step further.

I'd recommend that after this post, you keep track how much detail you're observing in your everyday world in different situations. You'll be amazed how much you 'abstract away' from the environment and even though you see it doesn't mean that you notice it. And it's also possible that missing things create strange situations in which you notice that it's not there, even though you expected it to be. Or without it, it doesn't exactly look like the object as you expected. That change in observation, should and does that change your reality? Does it change how you observe the world in the future? Is that the real process of learning? Continuous adaptation of reality based on observation?

Tuesday, August 19, 2008

AI ambience or collective web intelligence?

I've been busy a lot with some administrative things. In september classes are starting and I'm commencing with A.I. Registrations are done, just register for a couple of courses and go. There's a new direction for the next year: "Human Ambience". I had quite some interest into intelligent systems, but I do like ambient intelligence as well. I think it's really done well when you don't notice it at first, but then later go: "oh, that was actually pretty cool!".

For the rest Project Dune is trucking on as usual. I'm preparing a plan for a manager of the company I work for to possibly spend a bit of budget on getting the project a little bit further, but then with the help of the effort of some colleagues. All done open source of course. So that's exciting.

I'm also looking at perhaps providing a Java API to interface with CUDA. The objective is to make it available to Java users. Not sure how to write the "java" program and compile that for CUDA use though :).

Friday, August 01, 2008

Communication, interpretation and software engineering

A majority of problems in software engineering are due to inefficient social activities, well... beyond poor estimation, poor assessment and poor verification of course. :)

Here's a very nice website I found:

http://www.radio-subterranean.com/atelier/creative_whack_pack/pack.html

It's focused on creativity, but can be applied to innovation and some of those issues can be applied to general software engineering that's simply the development of applications.

There are still quite a lot of apps out there that have not been designed properly. They start out, but from that point are already dead in the water. It's simply no use to extend it further. It may work when it's done, but every effort spent on it simply isn't worth it. "Design-dead", it's called. This can be prevented if you look for sounding boards and more experienced peers. Don't be proud, let others contribute and seek some help in what you're doing.

The other thing are assumptions or assumptions from business that turn into dreamed up requirements but eventually appear to be problems. Especially for security this could be the case.

The latter problem is mostly the lack of root analysis. Rather than directly assimilating what a person wants, it's about asking the question why something is wanted and what the root problem is. Many, many times, people come to you directly with a request to do something which they consider the resolution to their problem. They're not telling you the problem they're having. Keep on asking, identify why something is wanted and maybe it's possible to come up with a much easier alternative.

Lack of definition is another. It's difficult for some people to understand that others are not experts in the same domain. Spend a bit more time to explain your own process and activities, then see if you can develop a correct cross-over between the two domains of expertise for an optimal result.

Well, and further... It's mostly about continuous communication and verification. Going off for half a year and then coming back with an end result is bound to give a lot of deviations from ideas. Maybe the ideas were wrong in the first place, maybe the interpretation. It doesn't matter at that point.

Friday, July 25, 2008

Software Architecture "methodology"

As a software architect, buzzwords and the buzz itself comes at you in various guises. Sometimes it's the business itself that heard about company X employing methodology Y and they want to do the same. Most of the times it's a hotshot from company Z that wants to change the world and do things radically different, promising totally different flexibility and capabilities.

I'm a bit tired of all these words. The only thing that counts are activities, as they produce the output that can really be used, and revolving around that output are only a number of factors. No single methodology will ever succeed in setting up a dynamic company. It's the people themselves, how they interact and communicate, the level of politics, the culture and how quickly the development team picks up new technologies that are really worthwhile.

Architecture methodologies are basically all about the same thing... guidelines about how to execute the development activities and a registration of the foreseen limitations and recommendations for that team.

The latest thing is of course SOA. It's often thought of as a new methodology, but it's actually just a different perspective on your organization. It cannot be called an architecture though. It does force systems and data to be organized a certain way, but it does not dictate how. So, SOA is a direction that a company decides to take, which might make sense, but it isn't anything else than that.

I see EAI architectures at the core of architecture, influenced by SOA, whereas SOA is the organization of the infrastructure into a set of services. It's as if SOA is basically a aggregation of resources to make it an artificial and digital interface that other organizations can use. So, SOA is more like a vertical whereas EAI definitions are mostly describing the trees of the "what-otherwise-would-be" forest of resources. EAI defines the guidelines in getting the resources set up, SOA defines what to expose. EAI is driven by the output of SOA and SOA is limited by the level of organization in EAI.

Once you think about it, the only thing that changes are the levels of connectivity and the level of digitalization. Organizations in the 60's were already capable of exchanging information, it just mostly traveled on paper. Currently, the velocity and size of information has grown so much that paper processing is just not an option any more. That is why the complexity of security and interpretation is now being born by digital systems, whereas this previously was interpreted and verified by human beings.

So, although SOA makes it really interesting for companies, it's far from a silver bullet, although pulling off a successful infrastructure of that kind will very likely make you more flexible. The problem is however that in every architecture, a number of assumptions are always baked in. Sometimes, when businesses significantly change, the architecture can't change along with it, thereby requiring large investments to bring about that change. Every business and organization has a "heart" to it where data is kept or basic business processing is done. If the business changes direction, then that's where the most changes need to take effect, impacting all other services that depend on it.

As a methodology for architecture I quite like TOGAF. If you've read the book, you'll understand the role of an architect better and the place such a person takes in the organization. Within each organization, the politics are different though and thereby the expectations. It's always a challenge to find your right place and produce those results that make people happy. Sometimes you need to make some noise in order to wake up the masses.

As discussed in earlier posts, I strongly believe that taking more time for a design prevents a lot of troubles at a later stage. Troubles at a later stage are times more expensive than resolving that in the correct stage. Remember that if you're a project/program manager or director. You can't take shortcuts in IT. The taxman always rings the doorbell later on and will ring it more than once along different projects.

The latter statement basically means that a single shortcut taken now to get things done will eventually require interest payments later down the line that far exceed the initial amount and from my experience, this can be multiples of that. I've worked in a project that should have taken about 250,000 EUR or so to develop. It crossed the million by a good couple of thousands.

Further reading:

http://www.cis.gsu.edu/~mmoore/CIS3300/handouts/SciAmSept1994.html

Project Dune was released yesterday and I intend to re-evaluate software engineering from the ground up, first as a set of activities and practices rather than thinking of it as a process only. In another sense, it's as if the texts assume that people basically deliver good work, but as if each activity isn't always followed and each activity isn't bound to the next phase. I don't really believe that. I think that too many engineers are insufficiently aware of the impact of not executing certain activities or not following a certain best practice. The problem is mostly to do with pride, wanting to be creative and differentiating, or over-estimating their own set of skills (especially against fellow engineers in the same team). Better yet, it's difficult to come up with an experienced team of engineers that beyond knowing their profession, also know how to work together and get things done without introducing new unknown technologies (for the sake of curiosity and learning something new).

Thursday, July 17, 2008

Maven

I've never had a good chance to look at maven, but just today I've for the first time generated the "project(code) management" layout through maven. I'm very impressed at its capabilities and dependency management and intend to use it for any project following up. It's just too good really and there's not much sense to keep hacking ant files in order to manage the project.

As an intro to maven... It's a great tool to start out a new project and manage the project configuration, build, dependencies, tests, releases and sites with. It does a lot of stuff out of the box in a standard way and you could more or less say that it's a 3rd-4th generation "make" tool. Where with any you still had to "program" tasks, with ant you can just configure pre-defined tasks that generally run in a consistent way (since every project needs it) and be done with it. Having a consistent layout also means that you don't need those ${src} and ${build} variables any longer.

It has the ability to produce manifests for jars, version/build numbers and can automatically deploy to locations you register. There is inheritance between configuration files, so that you can set up a chain of configuration files for your development "street" as you call it. And due to the registration and management of library repositories, your dependencies are automatically tracked and updated whenever the project thinks it needs them. No more looking for lib files and downloading them, it does it for you.

For Project Dune, there's an effort starting where we'll document software engineering in a very straightforward way. Rather than calling it a "methodology" like Scrum, XP, RUP and whatever, the idea is that we just focus on the activities that must happen anyway regardless of the approaches used to fulfill those activities.

Then the idea is that besides the activities, you might need the output of one activity as input to another or trace back information back to the source. You could imagine a certain "thread of information" running through a development life-cycle.

Every sw project has such a cycle, but not everybody commits to executing all the required activities to guarantee quality. For example, I imagine that RUP adopters execute a lot of activities, but not every activity necessarily adds value. And people working solo on a project may never do active reviews.

The idea is that the process doesn't become the holy grail of an organization (the administrative efforts), but the activities that are part of engineering. I don't have a full view yet how this is going to work out, but the site on project dune will be modeled after that vision. It's not very likely to get the same size as a process-driven description, but a minimal site where activities are clear and what they contribute to a project is already a very good start.

Tuesday, July 15, 2008

Work on roadmaps, sw engineering, technical debt

Project Dune is developing a new roadmap for the last half of 2008 and to kick off 2009. Being open-source, you can always read the latest developments on the project wiki.

Part of the roadmap now became strategy, which will probably be moved out to the start page close to the mission, thereby making the roadmap a practical guide of where something is going and why, without necessarily focusing too much on the actual activity.

Another part of the roadmap of Dune is the initiation of a set of processes, practices and tools / checklists / material to support the process of software engineering.

A particularly interesting term that was coined by Ward Cunningham is the term technical debt. As a quote from the wikipedia site:
The analogy to financial debt is that the poorly-written code requires "interest payments" of maintenance effort, which would be smaller or non-existent had the code been developed more carefully and with better long-term planning. Rearchitecting and rewriting the code to be more maintainable is analogous to paying off the debt principal.
Other sites:

http://www.agileadvice.com/archives/2006/12/technical_debt.html


http://forums.construx.com/blogs/stevemcc/archive/2007/11/01/technical-debt-2.aspx

The graphs here are very good examples of the consequences:
http://kanemar.com/2006/07/23/technical-debt-and-the-death-of-design-part-1/

As part of Project Dune, I'll need to think about how to come up with practical guidelines for software quality and explain these terms in more detail, probably with extensive use of links. The idea is to become the main site where people look for quality information. That should include links and understanding about software engineering, development, language choices, etc... Ideally documented in such a way that it goes from a high-level overview to a detailed overview.

The first objectives are to come up with a generic process / vision of software engineering itself. This is no easy feat! It's difficult enough to align people on the same vision within a single team as there are always people that have different opinions or people that have the thorough conviction that things are done better in a different method.

Thus, the idea is not to suggest "Agile Development" or any other more specific method for software engineering. It should go one level above all this and just state the following:
  • Objectives of the activity
  • The activity's function
These two sound exactly the same, but are somewhat different. The objective is basically what you're trying to achieve. The function is what you're trying to prevent/discard during that process or what you should pay specific attention to.

I'm wondering whether project managers in SW projects nowadays have sufficient knowledge of project analysis and the terms used for sw engineering to be able to steer back to a winning situation. If you look at the term "technical debt" for example, its actual scope is quite large. Besides technical debt though, there are other reasons why projects fail, which are more on the area of communication and social interaction (the requirements need to be correct).

A good initiative for the Dune project would be to try to come up with 2 or 3 main areas and coined terms that contribute to project failure. Then subdivide the areas and identify the causes.

The idea being that the identification of root and main causes for project failure (and supporting cases?) would clarify the need and use of certain activities in a project. I'm not using the term process, as I'm not too fond of that term how it's interpreted in general. Activity means something that a tester/developer carries out as part of his job or task and which is easily part of a process. The process is thus basically a sequence of activities done with a number of available tools and materials, nothing more. Understanding the process thus doesn't mean how to comply with some corporate or industrial standard. It means that you understand what you're doing and how that fits in the big picture.

Tuesday, June 24, 2008

Artificial Intelligence

I'm starting a studies on A.I. in September at the Free University in Amsterdam. It's a bit closer to Computer Science than other universities are offering.

It's interesting how I'm now picking up news articles on the topic and how things are changing around the very definition of what A.I. really is. Initially, the idea was that intelligent machines should eventually think like human beings and be just as versatile, flexible, inflexible and emotional and detached as humans can be.

It seems that the general definition of A.I. is changing in such a way that it's more a definition of very intelligently applying science to real-world problems that we can solve intelligently, but perhaps not optimally.

Thus, A.I. is not a simulation or replication of the entire scala of processes in the human brain, it's a specific simulation or replication of a specific problem that our brain can resolve.

As I have stated before in one of the blog posts on communication with machines, I think A.I. is about getting closer to the real cognitive processes that are occurring instead of replacing our tactile activities. Thus, A.I. is also a specific kind of Information Technology. It's about finding out how we resolve very difficult problems, the input to those problem resolution processes, applying science over increasing the efficiency and accuracy of the resolution of the problem and then deploying out so that it can be used by the user.

There are however some other specific branches. One branch comes closer to knowledge management (that is basically the problem of digging out the right information at the time when it's most needed). Another is about creating user interfaces and human interaction in different ways (more like emotional intelligence and empathy). Others are more about information processing on very large amounts of information.

Saturday, June 14, 2008

EK & GWT

Holland is playing really well in the EK. It's a joy to watch. Got 6 points in the pocket and all of Europe is commenting how well-greased the team is playing this year. I'm expecting Holland to win this EK.

Well, for something totally different, a new version of GWT is out, 1.5. The beta version has good new capabilities and standard themes that prove very useful. I'll be plugging that into my project and keep improving the user interface. Then probably release 1.5.1 before I'm hosting a demo next month at the company I work for, in front of project managers, architects, testers and other people.

Sunday, June 01, 2008

CUDA and GPU's

I've read a lot on the architecture of GPU's and how it's possible to take advantage of them. The design of a neural network running on a GPU is substantially different from those that run on single-core CPU's. I've worked on a small framework network to process neurons on a dual-core, but haven't been very impressed with the results so far.

The GPU however seems very promising. It actually requires a lot of threads and blocks to become efficient. My 8600GT has 128 little processors ready to work. The GPU basically organizes units of work into blocks, so that those processors can cooperate amongst little units of work. And those blocks are allocated threads, which could also be compared to the smallest unit of work within a block, a single iteration that is to be executed by that block.

The GPU of Nvidia is mostly data-parallel driven. You decide what you want to do and then run a very simple function by a single thread. The activation of a neuron by another neuron is an example of such a very simple function.

There are a lot of hardware-related optimizations that need to be taken into account. Ideally, the architecture of parallel systems may synchronize within blocks, but should never synchronize inbetween blocks themselves to prevent deadlock situations, plus that synchronization is a killer for performance.

The biggest problem for making graphics cards very useful for A.I. is the memory storage capacity *or* the bandwidth between the host memory / disk and the graphical card memory. It's basically 8 Gb/s on a standard PC with PCIe card, whilst internally on the card the bandwidth from it's memory to GPU is orders of magnitude higher, about 60 - 80 Gb/s. So staying on the card for calculations is definitely better for performance. The bandwidth to CPU memory is about 6.4 Gb/s by the way, so it's faster writing to the graphics card than reading/writing to its own memory.

The card contains 256MB of memory. If 4 bytes are used for various information needs like fatigue, threshold and excitation information, then it can store 67 million neurons on one card. It might be possible to use an extremely clever scheme to store connection information, because that information is where most memory is lost on. If you assume 1,000 connections per neuron, that is where 4,000 bytes of information are lost per neuron due to pointer size. Maybe a clever scheme where the neurons are repositioned after each cycle may help to reduce the need for such capacity.

Thus, assuming 4000 bytes for each neuron on average without such optimization or clever scheming, the network can only be 16,750 neurons in size at a maximum.

The design is the most interesting part of this challenge. Coding in CUDA isn't very special, although someone showed that from an initially 2x increase in processing power you can actually attain 20x increase in processing power if you really know how to optimize the CUDA code. So it's worth understanding the architecture of the GPU thoroughly, otherwise you're just hitting walls.

Kudos to CUDA

CUDA is an nvidia technology that allows programmers access to the processing power of GPU's, virtually turning your computer into a super-computer by loading certain complex mathematical processing on the GPU itself. Actually, a GPU is a collection of little processors of generally about 128, which are running functions in parallel. So, a GPU will probably benefit you most if you're processing a large dataset that fits into memory. For many simple business applications, it's not worth the trouble.

The GPU needs loads of different threads to become more efficient. The program thus slices up pieces of the algorithm to be processed and then executes its processors on it, that means the design of your 'program' will need to be adjusted accordingly. In comparison with a single CPU, if you have one processor the number of threads quickly saturate the processing power with thread-context switching overhead, thereby mitigating the results of the parallel processing. That's why it's better to have many little processors than one very large one.

The GPU can process algorithms about 250 times faster than a dual-core 2.4GHz CPU. So in other words, if you have a problem that is suitable for being loaded on the GPU, you can have access to a 250node CPU cluster by buying a commercial-grade graphics card of 250 euro's. That is very good value for money! And there are motherboards available where you can load 4 of those cards into your computer. You'll need a 1500W power supply though, but that is far less than 250 * (300/400)W. And there are guys at the university of Antwerp that have built such a supercomputer for the price of 4,000 Euros.

Here's a link to a tutorial on how to install this on Linux:

http://lifeofaprogrammergeek.blogspot.com/2008/05/cuda-development-in-ubuntu.html

One of the obvious applications why I'm looking at this is the ability to use the GPU for applications in neural network, more or less like what has been done here:

http://www.codeproject.com/KB/graphics/GPUNN.aspx

The memory bandwidth is in the order of 20-100 GBs/sec, as you can see here:

http://www.nvidia.com/page/geforce8.html

As I have stated in previous posts, one of the constraints of a working artificial network is the processing power that is available and the amount of memory storage due to the weights and so on that need to be coded. If things are hardware-accelerated, it might open up new avenues for processing. One of the larger problems remaining is probably the design of coding the network into simpler data elements most efficiently. Leaving things to one processor to figure out puts pressure on the frequency that can be developed. But if there are 128 processors instead, it's becoming very interesting indeed.

One of my posts suggested that I don't believe (yet) that there's a kind of output wire where the results of the network end up in, but rather that the state of the network itself, or sub-states of cell assemblies somehow form up consciousness. And the more neurons are available, the more of "consciousness" you'll have.

One should never forget that the idea of something having consciousness or awareness depends on our perception. And perception is always influenced by our own ideas and assumptions. So if we look at someone pretending to be intelligent, we also assume they have the same level of conscience that pertain to those tasks, or more even. But that's not necessarily an accurate assumption.

A probably mistake in a previous post is related to the frequency of the network. Probably, we can't really discuss 'frequency', as there's no clear cycle or sweep in the biological network if we assume that neurons fire whenever they want and the "sweep" that would collect all neurons that are currently activated might just come in somewhat randomly. Computers are generally programmed to define proper "work" cycles against a single cycle that collects the results.

The frequency of brain waves has been defined as between 5 to 40Hz. So the figure of 1000Hz may be way off. If we regard 40Hz, things may become a lot easier. And if we work with simpler processors that work on this frequency and on equal units of work, perhaps it brings things closer to the real human brain.

From the perspective of processing, the GPU would enable a calculation speed of 250 times that of a CPU. And if we lower the frequency from 1000 to 40 Hz, that is another multiplication factor of 25. That brings the new number of neurons that can be processed to 2,500,000,000. This is only a factor of 40 lower than the human brain!

Thus, if we put 40 graphics cards into a computer, we'd probably close to the processing power of the brain itself. That'd be about 8000W versus 12W. Not too bad, because using CPU's that'd be about 2,000,000W. The remaining problem is the storage of network information to be able to use the network. That was set at 2Gb and with some smarter designs and optimizations or reductions in the number of connections this could be brought down to 500M or so, so that a network of 500,000 neurons could run on a single graphics card, but it's not yet enough. A single byte is possibly sufficient though for a neuron and you wouldn't typically use a single byte for processing on a CPU due to byte-alignment optimizations. On the GPU that shouldn't matter too much though.

It's time to delve into the design of the processors and see what the processors of a GPU can deal with efficiently. If they work on bytes rather than words, it makes things a lot easier to fit in memory, thereby increasing the size of the network.

And then it doesn't matter too much. Perhaps SLI can help to distribute tasks across networks and may help to assign specific tasks to specific cards, like visual processing, reasoning and so on. Graphics cards generally use texture maps and so on and those can be extracted from cards and loaded onto others in an effort to share information.

Monday, May 19, 2008

Quantum consciousness, curling and glials...

The book of Prof. Penrose is a very interesting read. For the core A.I. people that believe in the "everything is perfectly computational and so is human thought" (not taking into account memory, processing or storage resources), it can be bit disheartening. Prof. Penrose asserts that there are things about the mind that can be simulated, although not perfectly, but that by simulating things it doesn't make a machine aware.

This of course depends on the definition of awareness, how awareness is raised in the human brain and whether awareness is real or whether we're just thinking we're aware and not in the same computed state as something else.

Another interesting thought here is with regards to the ability to steer thought and the ability to visualize things "in the mind's eye" as it's called. The interesting thing here is that besides regular perception that reacts to stimuli, we can also invoke stimuli on our own brain. Mostly for thought experiments or for dreaming. Although the last objective is nicer :).

I refer to the last post on "The plot thickens". I imagine that glial cells may have a more active role than previously perceived. Neuro-scientists have for example asserted that the glial cells are cleaners.

Ever heard of the sport "curling"? It's a sport where you have a heavy 20kg stone that is "thrown" on ice. The team consists of three players, where one player executes the throw (neuron) and two players clean the track in front of it. The cleaners can affect the trajectory of the stone, lengthening it or shortening it or moving it to a side and thereby affect the nature of the game.

One could imagine that the cleaners in curling have a similar role to the glial cells. If it's true, then it certainly makes things more interesting. And the search for the real "consciousness" would also just be starting. Would it be something in the brain? Would it be more like "waves"? Those glials than sort of become the manipulators of neuron cells, like an overwatch or teacher. If one maintains the Hebb's theory, the plot thickens indeed. Then learning isn't necessarily triggering neurons based on input and then making the cells fit, learning would be controlled by a network 10 times the size of the neurons, which seem to be driven by some other force.

At this point I'm not sure how to imagine consciousness then, nor do I intend very well the activity and power of the glial cells to influence the behaviour of neurons (or slight modifications in its firing pattern, or perhaps entire modification of the network itself?). Is this where consciousness is really located?

Some people resort to quantom consciousness.

Friday, May 09, 2008

The plot thickens...

The plot thickens as they say... And this time it's about glue!

Only since somewhat recently, scientists discovered that the glial cells around the neurons may be a bit more active than just removing waste and feeding the neurons. Read more about glial cells here.

Actually, there are 10-15 more glial cells than neurons. So if you thought 100 billion was a lot of cells, there's 10-15 times more of that in supporting cells.

The cells are mostly responsible for maintenance. So they regulate the chemicals, clean up waste, regulate blood supply. But they also produce a myelin sheath around the axon (helping it to fire) and can also act as scaffolding for the generation of new neurons.

So, you could say that glials are the implementation of the rules to create and maintain the network on a microscopic level. During its lifetime, it continuously checks the environment and can actually reshape the network on a local level. This is interesting, because then we know that there are complicated and active maintenance functions on the neural network taking place.

Thursday, May 08, 2008

Neural Network Hierarchies

The book I read discussed the possibility of cell assemblies and cell assembly resonance through recursive loops in the neural network. It also stated the possibility of cooperating neural networks that are each allocated a specific function. See the following page:

http://faculty.washington.edu/chudler/functional.html

And then the following:

http://faculty.washington.edu/chudler/lang.html

I'm not sure if anyone has ever considered to join neural networks together in a sort of serial combination. The difference between these networks is that for example vision is only allocated the task of recognizing images / shapes / forms / colors and translate then into numbers. And the auditive system processes sounds. If you look at the images closely, you see that there are actually two kinds of networks for each perception method. An associative cortex and a primary cortex.

If you look at the production of speech, it's a number of different areas all working together. This gives us clues about how human speech is really put together.

Imagine those networks all working together. As a matter of fact, there are more neurons at work than just the brain. The eyes also have neurons and are already the first stage of processing optical information. Suppose we'd like to make a computer utter the word "circle" without simply recognizing the circle and play a wave file. We'd have to make it learn to do so:
  • Convert the pixels (camera?) to a stream of digital information, which can be processed by the visual cortex.
  • Analyze the shapes or image at the very center of the image (see motor reflexes and voluntary movement of the eye to accomplish the scanning of the environment for receiving more information).
  • The visual cortex will then produce different signals as part of this exercise and the reverberating cell assemblies generate new input signals for the more complex processing of such information (memory? context?)
  • This is then output to the speech area, where the "words" are selected to produce (mapping of signals to concepts).
  • The information is then passed to the Broca area, where it is contextualized and put into grammar.
  • The instructions of the Broca area (which could have a time-gating function and verifies the spoken word with the words that should be uttered), are sent to the primary motor cortex, which produces speech by frequent practice
  • The speech organs move as in concert by the simple emission of information towards the speech organs.
The above sequence displays a very interesting point. Wernicke's area is involved in understanding heard words, Broca's area is involved in producing words.

So, this sequence shows that these areas work together and that together, the emergent phenomenon can produce very interesting behaviours.

I'm not sure if these networks can be built by just trying them out at random. There's also a huge problem with the verification of the validity of a network. We can only validate its validity whenever the output at the other side (hearing the word) makes sense due to the input (the visual image of a circle). Everything that happens inbetween can develop problems in this entire circle. Also, there is expectedly a very large learning curve required to produce the above scenario. Remember that children learn to speak only after about 1.5-2 years or so, and then only produce words like 'mama' / 'papa' (as if those words are embedded memory in DNA).

Important numbers and statements

The following are statements that are important to remember and re-assess for validity:
  1. The brain consumes 12W of energy. Ideally, artificial simulations of the brain should respect this energy consumption level. But this seems far from possible because the individual elements used in artificial intelligence consume far more power and there are factor thousands involved in this calculation.
  2. It should be parallel in nature, similar to neuron firings (thread initiations) that fire along dendrites and synapses. If not, the model should assess scheduling in a single thread of operation.
  3. It should be stack-less and not have function unwinds.
  4. The brain has about 100 billion neurons.
  5. The fanout (connections) with other neurons is between 1,000 to 10,000, (others report this to be 20,000).
  6. It's not so much the working neural network that is interesting, but the DNA or construction rules that generate the working neural network. That is, it's more interesting to come up with rules that determine how a network is to be built than build a network that works and not being able to reconstruct it elsewhere.
  7. How to observe the state of the network in an intelligent way in order to deduce conclusions from the observation by the network?? (does it need to do so?)
  8. It is possible that successfully working networks can only evolve / develop over a certain time period and that the initial results look like nothing interesting at all. This statement can be deepened out by observing the development of infants.
  9. How does the state of a brain transcend into consciousness? (or is thinking the re-excitation of network assemblies by faking nerve input, imagination, so that images and audio seem to be there?)
  10. Zero-point measurement: My computer (a dual intel E6850 with 2GB low-latency memory) can process 500,000,000 (500 million) neuron structures in 0.87 seconds. That is about 1.14 cycles per second on 500,000,000 neurons. That is still a factor of 100 * 1000 = 100,000 slower than the human brain, assuming it re-evaluates all neurons in one sweep.
  11. For a very simple neuron structure on a 50 that does not yet contain connection information, but 3 bytes for threshold, fatigue and excitation information, 140 GB of memory is required to store this network in memory.
  12. In 2 GB of memory, you can fit 715,000,000 neurons without connection information.
  13. 50 billion neurons need 186404 GB of memory to store an average of 1,000 connections at a pointer size of 4 bytes per neuron.
  14. On my CPU (E6850) and a single thread/process, a number of 400,000 can reasonably be processed in one sweep. That makes it about 1,500 sweeps per second across the entire neuron array.
  15. In 2GB of memory, it's possible to fit 500,000 neurons with connection information.
I'm therefore choosing 500,000 neurons as the basis of the network, which might eventually translate to a frequency of about 1000Hz if the sweeps are designed more carefully (1000Hz is derived from extremely high firing rates in the human brain that are observed to be at 200 pulses per second. Add the absolute refractory period to that, which lasts 3-4 cycles, and 1000Hz emerges).

500,000 seems to be the limit due to memory and due to CPU cycles in order to attain the same frequency. That is a factor 100,000 lower than the human brain and it's more or less maxing out the machine.

Wednesday, May 07, 2008

The emergence of intelligence

John Holland wrote one of the most interesting books I've read so far, "Emergence". And it's not even the size of the Bible. :).

My previous musings on cognitive science and neural networks and artificial reasoning are greatly influenced by this book.

As I've stated in one of the posts on this blog, I've sketched out an argument that the "output-as-we-know-it" from artificial networks isn't so much useful from a reasoning perspective, but the state of the network tells a lot more about "meaning" than measuring output at output tendrils. I'm not sure whether for very complicated and very large neural networks you would even have a type of output.

The book "Emergence" provides a potential new view on this topic. It makes clear that feed-forward networks (as used in some A.I. implementations) cannot have indefinite memory. Indefinite memory is basically the ability of a network to start reverberating once it recognizes excitation at the input and further continuous excitation further on. The capabilities of a network without memory are greatly reduced and after reading the text, I dare say that pure feed-forward networks are very unlikely to be at the base of intelligence.

Indefinite memory is caused by feedback loops within the network. So you'd have a neuron that connects to some neuron of the previous input layer or a previous hidden layer, thereby increasing the likelihood it will fire in the next cycle.

There are however additional features required for a feedback network. It has a fatigue factor and a recently fired neuron has a very high threshold for firing again for a short time period. As neurons are continuously firing, these become fatigued and gradually decrease the likelihood it will fire in subsequent rounds. This helps to decrease the effect of continuous excitation (and may explain boredom). Plus that neurons that have just fired increase their threshold of firing significantly for the next couple of rounds (about 3-4), further decreasing the chances of reverberation across the network in a kind of epileptic state.

The end result for such a network are three important features: synchrony, anticipation and hierarchy. Synchrony means that certain neurons or cell assemblies in the network may start to reverberate together (through the loops), which is an important factor in anticipation, where cell assemblies reduce their thresholds of activation, so that they become more sensitive to certain potential patterns (it's as if the network anticipates something to be there, so it's the memory of where things might lead in some context), and hierarchy, where cell assemblies may excite other assemblies. The other assemblies may then represent a concept slightly higher in the hierarchy (for example a sentence as opposed to a word).

As has been discussed in the post on the implementation of humor, we can derive that humor is probably induced by the felt changes in the network (electricity and fast-changing reverberations to other cell assemblies) as the changes in context develop sudden changes in excitation across the network.

Thus, humor can be described as a recalibration of part of the network that is close enough to the original reverberation pattern, but not as distant as to become incomprehensible.

The final assumption I'm going to make then is that a certain state of the network (reverberating assemblies) correspond to a particular meaning. There is indeed a kind of anticipation in this network, and recently reverberated assemblies might reverberate very quickly again in the future (short-term memory).

Then perhaps memory is not so much concerned with remembering every trait and feature as it is observed, but more concerned with storing and creating paths of execution and cell assemblies throughout the network and make sure they reverberate when they're supposed to. Then memory isn't so much "putting a byte of memory into neuron A", but it's the reverberation of cell assemblies in different parts of the network. Categorization is then basically recognizing that certain cell assemblies are reverberating, thus detecting similarities. We've already shown that the effect of anticipation reduces the threshold of other assemblies to reverberate, although it doesn't necessarily excite them.

Question then is of course how the brain detects the assemblies that are reverberating? It requires a detector that has this knowledge around the entire brain in order for this theory to make any sense. As if it knows where activity is taking place around the network to induce a kind of meaning to it. The meaning doesn't need to be translated to words yet, it's just knowing that something looks like (or is exactly like) something seen before.

Actually, the interesting thing of memory is also that different paths can lead to the same excitation. So the smell of grass, the vision of grass, the word grass, the sound of grass and other representations may all be somehow connected.

In this thought-model, if we would form sentences by attaching nouns to reverberating assemblies, it may be possible to utter sounds from wave-forms attached to those concepts and perhaps use the path of context modification (how the reverberating assemblies shift to new parts) to choose the correct wording. Or actually, I can imagine that multiple assemblies are active at the same time, also modifying the context.

Multiple active assemblies seem like a more plausible suggestion. It would enable higher levels of classification in different ways, although it does not yet explain the ability of our mind to re-classify items based on new knowledge. Do we reshape our neural network so quickly? Although I must say that we do seem to make previous mistakes more often for a certain period of time until at some point we dislearn it and relearn it properly. Dislearning something has always been known as more difficult than learning something.

A very interesting thought here is the idea of the referee. If the network is allowed to reverberate in a specific state, how do we learn so effectively? We continuously seem to test our thought to reason and explanation of how it should be. Is there a separate neural network on the side which tests the state of the network against an expected state? That would however require two brains inside of one, and one to be perfect and correct to measure the output of the other, thereby invalidating that model. Perhaps the validity of the network can at some point be tested against its own tacit knowledge. Does it make sense that certain categories or cell assemblies reverberate in unison? If they have never done that before, then perhaps the incorrect conclusions are made, which should cause the network to discard the possibility, reduce the likelihood of reverberation of a certain cell assembly and keep looking for sensible co-reverberation.

To finalize the topic for now... Emergence requires a network of agents that interoperate together through a set of simple rules. The rules that I found most interesting for now are described in this blog post. But I can't help but wonder about the role of DNA. DNA is said to have its own memory and it's also known to represent a kind of blue-print. Recently, some researchers have stated that DNA isn't necessarily fixed and static, but that parts of DNA can become modified within a person's lifetime. That would be a very interesting discovery.

Anyway, if we take DNA as the building blocks for a person's shape, features and biological composition (besides the shape influences due to bad eating habits and so on), then we have certain body features that are controlled by DNA and probably certain human behaviour that is reflected in our children ( "he takes after him/her" ).

Just the recognition that behaviour can be transcended by children makes a strong case that the building up of the human brain is determined by rules that are prescribed by the DNA, a kind of "brain blue-printing", a recipe for how to build a brain through a set of rules.

So, we could create a neural network through very random rules and see what happens, but we could also think of the construction of that network to have followed certain rules that are determined through evolution. This would make a particular network more effective at each generation. It's a big question. Real connections are formed by neurons that just happen to be close by another and I cannot imagine the possibility that a neuron on one side of the brain manages to connect to a neuron at a significant distance.

Maybe the construction of this network is determined by a lower level of emergence, which is determined by smaller elements like DNA and whatever else is there at an organism level. Perhaps our consciousness starts with those minuscule elements?

Or just maybe the growth of the brain is entirely random. We could then consider the possibility that neurons exist somewhere and grow towards another. Then, through Hebb's rule, it might continuously attempt to reverberate and kill those axons between neurons that never lead to reverberation together (thus, have no useful interconnection with one another). Especially in the first four years, these connections (axons) grow like wildfire in a continuous manner. It takes four years for a network of 50 billion neurons to start producing some sensible results. We generally kick-start a network and almost expect it to produce something interesting after five minutes.

It would be very interesting research to find out if this kind of growth/evolution can be jump-started and done in much less of the time through application of a computer cluster (or whether the brain can run on clusters in the first place :).

Monday, May 05, 2008

On content and process

I read a very interesting post just recently regarding the difference of content versus process. Process is basically determining action based on the context and is very much done in the here and now. Content has to do with analysis of concepts and the relationships between them and could be taken as learning experiences. Process can also be learning, but the enhancement of action (reflex) on the perception of content identified in a way. Content itself is deep-rooted knowledge of how a concept might have gotten somewhere or how it might relate to other concepts (in various different possible ways).

Maybe if you don't appreciate the arts, you're a person that highly prefers process (objectives), getting things done or moving from A to B without caring much about the how and where. People that really dig art and content may not be as efficient in getting their things done, but they understand the relations between concepts better and "enjoy the journey" :).

This is an interesting differentation of course. If not achieving your objective causes frustration, than this might also explain why some people feel depressed, frustrated or stressed more than other people. Some are just there for the journey and the pleasure, others always want to be somewhere else just as they got somewhere.

The argument of the person writing the article was that long, continuous exposure to video games and films for example didn't train the content-analyzing capabilities sufficiently. Therefore, training people to only get things done without training them on the pleasure/enjoyment of analyzing the interrelationships and contents of things.

Historically, humans have mostly lived together and generally spent a lot of time interacting with one another, developing and improving interpersonal relationships. The virtual environments are however loaded with objectives "just to make it interesting", so the argument that social environments improve relationships isn't a natural argument. It might just provide an excuse for achieving your objectives.

The article also articulated that the development of the individual (the recognition of who you are yourself, your "self-idea" ) isn't as developed. Or stated in another way, you're not sufficiently self-aware or "individualawared" enough. This puts pressure on the need to make more effort to be recognized as a specific type of person, or projected self-image.

This lack of individuality could then be compensated by collecting status symbols, generally projected symbols of what is considered success by oneself. Those symbols of success are basically material trophies like cars, houses and other things, material things that are thought to add up to one's identity. Sadly to say though, one can never gather enough items for individualization, there's always place for more "self-articulation", which explains the unexhaustive search for new items to conquer.

Thursday, May 01, 2008

Cocomo II implementation on Project Dune

As discussed in previous posts, I've been designing an approach for Cocomo II software project estimations. The implementation has already been added to Project Dune on the main branch and is destined for finalization into a new major version of the project, Project Dune 2.0.

The first step in estimation is to determine the project size:

http://gtoonstra.googlepages.com/sizing.png

The next step in this approach is to determine the effort multipliers. Those multipliers increase the time required to develop something linearly.

http://gtoonstra.googlepages.com/effortadjustment.png

The scaling factors in the next tab increase effort exponentially, so if those end up really high, the effort required is soaring as well:

http://gtoonstra.googlepages.com/scalefactors.png

After the factors are supplied, the outcome of Cocomo II is a set of numbers. The most useful numbers at this time are listed here:

http://gtoonstra.googlepages.com/result.png

Notice in the different screenshots how the factor being manipulated is explained in the tooltip. The text in the middle is a reflection of the correct assessment of that factor.

The results show the person-months that are required to develop the project, the nominal person-months (if all scale factors and multipliers would be nominal), and the Time-To-Develop together with the number of staff required to implement the project.

Wednesday, April 23, 2008

The art of software estimation

I'm reading up even more on the Cocomo II method and interestingly, I like it. Although it's all maths and you can't run companies through formulas of course, the exercise you're put through is the real value.

Consider the following definitions of estimate:
"to judge size, amount, value etc, especially roughly or without measuring"

"to form an idea or judgement of how good etc something is"

"a calculation (eg of the probable cost etc of something)"
So estimating is about forming ideas, judging and calculation. Cocomo II does just that. No estimate method can replace the judgement part, but it can provide the math part.

There are loads of parameters to consider, but within the context of a formula you get an idea what their impact is when you deviate by some values. For example, some parameters only slightly add cost and time, whereas others have a strong exponential factors and an incorrect evaluation of the real value has great consequences.

One of those factors is regarding re-use. For regular projects I worked in, people wouldn't properly go through the practice of estimating the re-usability of some piece of software. Clean interfaces, excellent documentation and little coupling with a used library means it can plugged in within one day or so. But once you plug in a library that has good documentation, probable bugs, but controls your software (like Spring), the effort of integrating and configuring it into your code suddenly becomes a lot larger. Unfamiliarity with the business sector, the library code, other packages and how they work together also adds up linearly to that effort.

There's a great opportunity here for a new opensource tool on the web that supports these kinds of estimates. And it has to be written for the non-cocomo savvy users. Estimating with Cocomo only really makes sense once you are working in teams of 3 or more for 3 or more months. Anything before that is probably better off with a hand-written estimate on a piece of paper, because the size isn't considerable at all. However, be aware that the accuracy very quickly wears off after that and you need to be put through a real practice of estimation. It's really an art form, but nothing mystic, just common sense. And I think anyone will produce incorrect estimates. As long as you understand by how far you could be wrong and supply that value along with the estimate.

For Project Dune, I intend to make something quite easy. The idea is to explain the purpose of the parameters and allow the user to tweak them. Then on a side-bar show a real-time graph with the impact of the changes. And everything in a kind of wizard form.

Estimates are based mostly on code size. And size can be expressed in either Lines Of Code or Functional Points Analysis points. Then you should use a multiplication factor and apply the formula and out comes your cost and timeline.

As said before in the previous post, I wouldn't be so much interested in the actual estimate produced, but much more so in the potential variance. Probably with some more maths behind it, there should be a possibility to show range graphs that indicate the very worst case, the likely bad case, the real estimate, the likely good case and the best case.

The estimate is used as input to project planning. It doesn't yet adjust for serial / parallel work breakdown structures. So the real planning that states "how long" a project takes is still different. You may have people idling or crashing tasks to get things done.

I'm sure there's more stories on estimation at a later time in this blog.

Sunday, April 20, 2008

Software Cost Estimating with Cocomo II

I'm reading the book Cocomo II. It's an adjusted cost estimating model from, mostly, Barry Boehm. As you can gather from previous posts, I'm sceptical on these models of process improvement and managing projects through "numbers" and forms. Ideally, we know everything at the start of the project and everyone knows how to do things most efficiently.

If you read the book only literally and pay attention to the math, it won't get you far. The power of the book comes from the interpretation of the ideas behind those numbers and formulas and only then use the numbers anyway, since they're the only factual foothold you'll get in a real-life situation. The rest are fictions of our imagination or our personal recalibrations that are influenced by our own ideas of project management.

First you should remember what the word "estimate" means. It's an expectation of cost, time or effort based on the information known at the time of producing the estimate. One of the strongest and most interesting statements is done in Chapter 1, where it is made clear that Cocomo II doesn't give you drop-dead fixed deadlines or cost statistics (as long as your numbers are correct), but that they provide a guideline or foothold on a potential track for your project that is subject to a potentially large deviation. The size of the deviation is determined by the quality and quantity of information at the outset of the project. So, the more you know beforehand, the more accurate your estimate will be. That is a logical given. Maybe that is also why you should consider reproducing estimates at different stages throughout the project.

However, things go further than cost estimates. In estimates we often make assumptions and tend to think along a positive trail of project execution. That is, we generally like to disregard risks and things that will go wrong and just don't take them into account. Or we think we can squeeze the effort anyway and do more in the same time than initially envisaged. In order to be correct in the estimate, regardless of a boss that won't like what you're telling him, you'll need to also factor in the negative issues into the equation.

So, there must be a number of factors that negatively impact the project outcome as envisaged in the estimates. Think of these factors for a future project, as these increase the potential deviation significantly:
  • Incomplete definition of scope at the start of the project.
  • Unclear development process or not living up to that process.
  • A development team that doesn't communicate well or otherwise faces challenges in its communication.
  • Scope creep in future stages of the project.
There are some people that understand that certain projects above a certain size or cost have no chance of succeeding. That is because the environmental factors keep changing and due to the size, the communication needs to increase and all other factors. Simply put, you can't reliably estimate on those projects, since there may a deviation of a factor 4-8.

If you think to the completion of the project, it would be very easy to compile an estimate at that stage. You go through the project history, estimate how much is lost for each event (without considering the real numbers) and chances are you'll arrive at a number that is about 90% accurate. But we can only do that because at the completion of a project, we have all the information we need to produce that estimate.

Now think towards the start of the project. What information do we have available and what are risks or issues that we should foresee?

From cost estimation, I think we can learn that reducing project risk and improving the chances of success is increasing the amount of information for successful development of the project. Think of methods like software proto-typing, iterative deployment cycles, showing things early, etc. It's not yet proven that these provide the correct results, because showing things early may also induce a feeling to your client that things may change at any stage in the process.

So, from all of this, I can conclude that the estimate itself is not the most worthwhile thing produced in the cycle, but the accuracy of that estimate is more valuable. How much deviation can we expect overall? And how do we express it? Since we can't reliably come to any estimate at all to initiate a project, what range can we give to project decision makers, our sponsors, so that we can inform them beforehand whether something should be done or not?

Probably, this conclusion should result in a whole new way of software development. Something based on measuring the quality of information, scope and specification available at the outset and a measure of the risk involved if things are progressed based on that little information.

Wednesday, April 02, 2008

Requirements of a silicon brain: Project Semantique

I've started work on some implementations (research) to elaborate ideas on the implementation of a symbolic network. A good method for me is trying out some implementation, switching to reading, switching to philosophical meanderings and back again. The entire process should feel like some kind of convergence towards a real working implementation (how far away that may be).

In previous posts, I touched base with a couple of requirements that are part of this design:
  1. The energy equation must hold, that is, the time and energy it takes for a biological brain should more or less equal the time and energy of a technical silicon implementation
  2. It should be parallel in nature, similar to neuron firings (thread initiations) that fire along dendrites and synapses
  3. It should be stack-less and not need immense amounts of stack or function unwinds

    Some new requirements:

  4. The frequency of introspection is undetermined at this point, or better yet: "unknown". I don't know how often to check any kind of result in the network to come to any kind of conclusion or result. But I reckon that the frequency is tied to the clock-cycle of the main CPU or anything that can be compared to that. So, someone noted in my blog that the frequency of the brain seems 40Hz. That means it might be needed to inspect 40 times a second (and cleanup old entries, leaving room for new?). The idea is to not push too much in the heap for analysis, but clean up the results regularly and continuously work forward, storing the previous results in different rings of memory
  5. There should not be any "output dendrite" or "return object".
  6. The state of the network at any point in time == the output.
  7. Previous results should eventually be stored in different rings of memory, which have lower qualities of prominence the further away from the source of processing. Most possibly, results in more remote rings of memory may require re-processing in the brain to become again highly prominent.
So where does one start to design this?

I'm looking at "stackless python". It's a modified library of python that allows little tasklets to run that do stuff. Basically it's similar to calling a C function that passes in another function address to execute. The calling function unwinds and the CPU can start executing from the new address.

Python further hides tasklets (to run in the same thread), some kind of "green" thread or micro-thread, since it has a scheduler (that is not pre-emptive, but cooperative).

Check it out here:

http://members.verizon.net/olsongt/stackless/why_stackless.html#the-real-world-is-concurrent

What is the objective?

The objective for now is to load word lists and process text. It's quite a basic process I'm simulating at this point, but that does not matter, I'm mostly interested in seeing if these methods display any kind of emergent intelligent behaviour:
  • loading word lists in memory
  • Enter 'learning mode' for my symbolic network
  • Process 'stories' that I downloaded from the web
  • Establish 'connections' between symbols
  • Verify connection results
  • ... modify algorithm ... modify implementation ... back to 1
  • post results
Project name: Semantique

Tuesday, April 01, 2008

On the implementation of humor...

Cool! April Fool's Day. Well, I did not hear a lot of jokes today, luckily, but I guess that others will have been fooled at some point one way or another.

From cognitive science, I am very interested in the analysis of humor... What is humor? I'm not asking how to tell a good joke or what makes a good joke, but on a lower level I'm trying to understand when we find something funny. So how come something is experienced as funny?

I define humor as the deviation of the most logical / expected path of the change of the context (your expectation) towards something that you didn't immediately see coming as part of the analysis of the development of the story. Well, that's a joke anyway, otherwise if you did expect it, it wouldn't be all that funny. The best jokes and joke tellers keep you from the other logical, explainable path long enough until the punch line, where the actual context all becomes clear.

All well and good... Star Trek seems to suggest that Data could not understand humor, as if humor in its essence would be pure human emotion. Since machines don't by default have access to emotional responses (if emotion is the driving force of our life, in the sense that it is at the basis of our decisions in the morning to get up and start doing something), then Star Trek would assert that Data couldn't laugh at some joke because he didn't have access to an emotional organ or simulation of such.

I'm not sure of that assertion made in the series. I think humor isn't as much emotional, but more a trigger (spike?) in your brain that brings forward an emotional reaction (laughter). That little difference is very large. Scientists performed experiments on "aha" moments, those quick moments that you have solved a puzzle and can complete it in its entirety. Those "aha" moments were accompanied by huge spikes of brain activity for a very short time, after which the context of some problem should be entirely clear.

I do imagine humor (specifically for now) thus to be an emotional reaction to a relatively simple discovery in the brain that the context and path we've been led to believe (our expectation) is not the real path we should have taken to develop the context (chain of symbols) of some story. And by "shifting" this context the right way due to more information becoming available (the punch line), we feel a response to this "aha" moment when the brain solves it. Also, if you concentrate, you can rather easily suppress the urge to laugh in a great extent. (does that suggest that laughter and humor are quite conscious processes?).

(could you say then that the closer the expectation is to the actually developed context, it makes the joke funnier or vice-versa, the farther away using the same words makes it funnier?).

So, anyway, that means that there might be ways to detect humor by software as well, provided there is software that can develop expectations and interpret contexts the same way our brain can.

Therefore, Data won't probably actually laugh in the same way that we humans do (since biologically we react to that aha moment), but probably it is possible to detect if something is humorous by analyzing the contextual difference in two different snapshots of the context and then send the appropriate signals to react to it. Of course... Data is most likely not culturally apt as he does lack real biologically induced emotions, so he may very well laugh inappropriately in contexts that are culturally sensitive (imagine!). But that is another story.

Monday, March 31, 2008

P=NP?

A new post on Silvio Meira's blog today. Apparently, one of the math professors over at Pernambuco in Brazil is going to post the proof that P=NP.

http://silviomeira.blog.terra.com.br/p_np_sera

Can't wait to see the result. It would have very large consequences for everything and computers will definitely be much, much, much more helpful than they are today. Stay tuned for that story.

I'm following it as well and see what happens!

Monday, March 24, 2008

Three Levels of Analysis

One interesting book of Pinker on Cognitive Science discusses the three levels of analysis, it is also on Wikipedia (since it belongs to the field of CS). Anyway, these three levels represent three layers in the OSI model of computing. The physical layer, the algorithmic layer and the computational layer. The algorithmic layer would be compared to the Operating System and the computational layer to the behaviour (which would be applications making use of the infrastructure).

Of course, when mapping these levels to biological counterparts, they map to neurons and so on.

The interesting statement is that it is not only required that you understand the workings of one layer individually (to be able to make sense of something), but you'd need to understand the interaction between the levels of analysis in order to work out how something works.

There are some potentially large pitfalls there. We make assumptions that a neuron has a certain use and can never have more than one use at the same time. I have no information at this time to make any further statements on that though.

One of the questions in the books is what intelligence really is, and it is then purported as some kind of computational machine. Computation is often associated with algorithms, formula and maths.

I feel happy about people saying that mathematics and algorithms somehow describe the world (or can be good means to communicate difficult issues or describe some kind of event), but I don't think it's a good idea to turn this upside down and then state that the world is mathematical in the first place. It's way more dynamic then that. It's maths that describe the world (and then somewhat poorly, as some kind of approximation).

Although very complex things can be described with a sequence and combination of many algorithms together, this presents enormous problems. First of all is that it makes the world deterministic (although incredibly complex with algorithms connected and influencing together) and second is that in order to come to a full implementation, you'll need to understand absolutely everything and model it in some kind of algorithm. That sounds pretty much impossible and a 'dead path'. I think AI needs something more dynamic than that.

There was a lot of enthusiasm on Neural Networks. I've used them too and I like how they operate, but in the end a network with its factors is only useful for one single task. Another task cannot be handled by it, unless it is retrained. So those are very limited as well, plus that I find the speed and method of the human brain for learning immense. Another limitation of NN that I find is that they require inputs and outputs and have a fixed composition for the problem at hand (a number of layers and a fixed number of neurons and synapses inbetween).

So, NN's are also deterministic and limited to its purpose. What and how should something be designed so that it can seem to learn from something and also start reasoning with its knowledge? Reasoning in this context to be interpreted as a very wide term. I explicitly did not use the term compute to make a difference. A Design is often very limited, limited to its area of effectiveness and to the knowledge at hand at the time. The design is limited to our disability to design indetermine things, items that may produce inconsistent and unexpected results.

When we design something, we do that with a purpose in mind. If this is the input, then the result should be that. But this also limits the design to not be capable of doing anything more or different from what it was designed to do.

How difficult is it then, with our minds focused on algorithmic constructs (consistent results), if we are just now trying to work out a design for something that may produce inconsistent and indeterminate results? It's the toughest job ever!

Monday, March 17, 2008

Some more about symbolic networks

I've created some images to explain the ideas of the previous post. I think those ideas are very promising, since they indicate they could comply with the requirements/statements of previous posts (I can't academically make this statement until I have scientific proof of this, so it's a hypothesis):
  • The energy equation between a biological network and a mechanical network should more or less hold, within a certain range and some research should be made if there needs/should be a factor in this equation.
  • There shouldn't be an unroll to the functions that are called as part of the symbolic network
  • There should not be an expectation of an output state/measurement (the network *is* the state and is always in modification)
The following is a PET scan that shows brain activity. Think of this as a screenshot at a certain point in time, when the network is processing some kind of thought:

You can see some areas of the network are totally unused, whilst others display high states of activity. Of course, it is very important to assess the brilliance factor and brilliance degradation/fallout (time it takes to decrease the brilliance) within the context of this picture.

The brilliance is basically activation of neighboring nodes. So thinking about one concept can also easily trigger other concepts. The "thread" of a certain context would basically guide the correct activation path.

I imagine a kind of network of symbols that are interconnected as the following picture:

The "kind-of" association is not shown here, because I'm not sure it really matters at this point. The "kind-of" assocation in itself can also be associated with a concept, that is, the "kind-of" can be an ellipse itself. So there is some loss of information in the above diagram, but that loss is not being considered at this time.

You can see that concepts are shared between other concepts to form a very complicated mesh network. It's no longer ordered in layers. If you consider the strength of an association (how strongly you associate something with something else) as the line that is inbetween it, then I could ask you: "What do you think about when I mention exhaust gas?". Then your response could be car or bus. The lines thus represent associations between concepts.

Wheels are known by both the concept car and bus. Also notice that this network is very simple. As soon as you gain expert knowledge in a topic, this network will eventually split up into sub-topics with expert knowledges about specific kinds of wheels and specific kinds of buses and specific kinds of cars and how they relate to one another. Generally, we distinguish my car and other cars, which is one example of a topic split. This statement of expert knowledge is derived from my little nephew looking at his book. For him, a motor bike, a bus, a cabriolet, a vw and things that look the same are all cars at this point in time. Later on, he'll recognize the differences and store that in memory (which is an interesting statement to make, as it indicates that this network is both a logical representation and association, but also memory).

The connections in this kind of symbol network can still be compared to dendrites and synapses. The strength of an association of one concept with another is exactly that.

Now, if you consider that you are reading a story and you have certain associations, you can also imagine that these concepts "fire" and are added to a certain list of recently activated symbols. Those symbols together form part of the story and the strength of their activation (through the synapse strength, their associations with other topics and a host of factors, basically what the network has learned) will in different contexts slightly change the way how the gist of that story is remembered.

If you store the gist of this list (produced by a certain paragraph or document), it should be possible to compare this with other gists through some clever mathematical functions, so that gists of one document can be compared with others. Gists are also methods of reducing storage details and storing it in a much compressed form.

Consider the final picture in this post:

It shows a simple diagram of, for example, what could be a very short children's story (well, we're discussing this text at that level basically). Dad goes home in his car and enters the house. He sits on the couch and watches the tele. If you remove the verbs of these statements, you'll end up with a small network of symbols that have some relation to one another. I feel hesitant to jot down the relationships between them in this network of symbols. I'd rather add some layer on top of these symbols that manipulate the path that a certain story or context takes. So, the concepts are always somehow related, but the thread of a story eventually determines how the concepts really relate to one another. Therefore, the thread manipulates the symbolic network in different ways.

So... what about design for an implementation? In game design, even when it was still 2D, the designers already started with large lists of events and lists of nodes for path finding for example. Between each frame, these lists were rebuilt and re-used to update the AI or action. Those design patterns should be reusable in this context:
  • Start with reducing a text to its nouns only
  • Process the nouns one by one
  • For each noun:
    • Reduce the activation factor of current concepts in the activation list
    • Apply the synapse factor to the current noun
    • Add concept to the activation list
    • With a reduced activation factor by synapse, add related concepts that are connected to the currently processed concept to the list as well
  • Get an inventory of the highest activated concepts in the list
  • Store the gist list to describe the text
Obviously, the thing that is missing from the above is the intention, the thread. So a text that describes a guy in India getting of a bus to his house may equate to the same gist of a bus in San Francisco that happened to drive past an Indian restaurant.

So motivation and thread of a story is something entirely different from its concepts. Should this be part of the network?? In a way, I think it should be possible to think of it as a layered network above the symbolic network, a different kind of representation with links to the other network to describe actions and objects that are acted upon.

It's the state of the network, stupid!

Back to the discussion on Artificial Intelligence and Neural Networks. This blog hosted some meanderings and thoughts on AI and how this relates to NN. I read books from Steven Pinker and others regarding NN's and I understand how these work from a helicopter view (I even implemented some). I then analyzed that the perspective of NN's on AI is probably horribly wrong, if you compare biological networks against mechanical ones.

Computers were from the very start designed to work on input and then produce output. Output is a very binary-like state. It's either this or that, it's a range of numbers and the output is generally an exact number. There's not really a way to represent an answer that represents two (slightly) different states in one answer by itself.

This morning, I woke up and considered that this general approach that was taken as part of AI way back is probably wrong. Even though computers are supposed to produce output that can only be interpreted in one single way, the "output" of human brains doesn't really exist as output per se. I'm more thinking of answer and thought as a kind of "state of the network" at some point in time. The frequency of thought is given by the "frequency" of the network, although this seems a very weird term to use for biological networks. It's probably totally independent.

If you look at CAT-scans though, you'll see something interesting. Not all neurons are active at all points in time (very contrary to mechanical networks, which generally have all their nodes and parts connected, so that turning one part will turn and impact another). And the granularity of our analysis on the human brain is not at neuron level, but at the level where we see a general number of neurons receiving activity. So if one neuron A next to neuron B is fired, only A would be active, but B would not be assessed.

So it's like regions of interconnected neurons are active at one sweep, not the entire network. And there's no output like a machine, only a list of active and recently active neurons. Every sweep, the active list is modified and moved back into a ring of memory.

If we reconsider neurons as nodes in a network and replace them with symbols instead, we can probably come close to a logical representation of a thought network. So, a neuron by itself doesn't represent anything, but impacts something physically. A symbol is highly representative of something, but doesn't necessarily impact anything, it is only connected to other symbols.

The symbolic network is then like a neural network, only it works with nouns, with symbols. The symbolic network allows an infinite number of nodes to be added, as long as there exists a process to interconnect symbols as soon as there is a relation to be determined between them.

Now, imagine what happens and assume this symbolic network is properly connected. When you mention a word car, or the smell of exhaust gas, or a picture of a car, those symbols are activated. The joint activation of car, exhaust gas and the picture should activate a symbol of car (without annotation) as a concept, so that the network understands that car is being discussed.

If you now introduce a text with nouns and verbs and assuming the computer has grammatical analysis capabilities, you can process this text within the symbolic network and at the end of some paragraph / passage of text, the network has a certain state of activity. Some regions are highlighted and other regions are black. If you'd keep a list of symbols that are activated, then you could store that list (region) as a representation of the text.

So, the objective is not to store the text word for word, but to store the associations and context of the paragraph. Mentioning the words in a search term would probably produce the text again and the more aligned with the paragraph it is, the more likely it is to be found.

Memory is also important. There are actually different rings of memory in this model (short-term and long-term are too generic). Reading a passage would store the gist of that passage into a different symbol. The gist are basically those nodes that had the highest activation of a certain paragraph after the cycle is completed. So storing the gist of one paragraph with another may develop a description of a document that is highly descriptive. It's not necessarily the word that is mentioned, it's the concept and the relation to other symbols. It's possible that a symbol is highly activated that was not explicitly mentioned in the text.

The symbolic network is the representation of nouns of our language, but verbs are the activities, the manipulations in our mind of those symbols. It seems then that within the context of this blog post, the verbs correspond to real intelligence (which is not described in this post yet). The nouns are just perceptions and mapping them to symbols. Real thought, the one that creates (is creative) and can reproduce and come to certain conclusions is a totally different matter. That sounds like real artificial intelligence.

Wednesday, March 05, 2008

Project Estimation

There are some interesting discussions taking place on project estimation. Project Dune 1.4.2 came out yesterday, where you can register your estimates per activity per scope statement. So that's basically "ballpark" level of detail.

There are some new requirements that are coming up that look really interesting. PM's really seem to (want to) use some kind of math magic to get to certain numbers and then tweak them.

So, what really matters in a project? What are factors that, if they change, significantly affect the timeline of a project?

I've already mentioned that the level of communication is a huge issue. So, if your team increases, you'll increase contention within the team, the need for communication (which loses time) and so on. Especially at the start, where the PM actually intended to move fastest. So the rest of the trajectory people are generally told to "get on with it", leading to unsatisfiable levels of communication and lower standards of quality.

Oh well! :)... Expect some more notices in the future with regards to this topic.