Tuesday, June 24, 2008

Artificial Intelligence

I'm starting a studies on A.I. in September at the Free University in Amsterdam. It's a bit closer to Computer Science than other universities are offering.

It's interesting how I'm now picking up news articles on the topic and how things are changing around the very definition of what A.I. really is. Initially, the idea was that intelligent machines should eventually think like human beings and be just as versatile, flexible, inflexible and emotional and detached as humans can be.

It seems that the general definition of A.I. is changing in such a way that it's more a definition of very intelligently applying science to real-world problems that we can solve intelligently, but perhaps not optimally.

Thus, A.I. is not a simulation or replication of the entire scala of processes in the human brain, it's a specific simulation or replication of a specific problem that our brain can resolve.

As I have stated before in one of the blog posts on communication with machines, I think A.I. is about getting closer to the real cognitive processes that are occurring instead of replacing our tactile activities. Thus, A.I. is also a specific kind of Information Technology. It's about finding out how we resolve very difficult problems, the input to those problem resolution processes, applying science over increasing the efficiency and accuracy of the resolution of the problem and then deploying out so that it can be used by the user.

There are however some other specific branches. One branch comes closer to knowledge management (that is basically the problem of digging out the right information at the time when it's most needed). Another is about creating user interfaces and human interaction in different ways (more like emotional intelligence and empathy). Others are more about information processing on very large amounts of information.

Saturday, June 14, 2008


Holland is playing really well in the EK. It's a joy to watch. Got 6 points in the pocket and all of Europe is commenting how well-greased the team is playing this year. I'm expecting Holland to win this EK.

Well, for something totally different, a new version of GWT is out, 1.5. The beta version has good new capabilities and standard themes that prove very useful. I'll be plugging that into my project and keep improving the user interface. Then probably release 1.5.1 before I'm hosting a demo next month at the company I work for, in front of project managers, architects, testers and other people.

Sunday, June 01, 2008

CUDA and GPU's

I've read a lot on the architecture of GPU's and how it's possible to take advantage of them. The design of a neural network running on a GPU is substantially different from those that run on single-core CPU's. I've worked on a small framework network to process neurons on a dual-core, but haven't been very impressed with the results so far.

The GPU however seems very promising. It actually requires a lot of threads and blocks to become efficient. My 8600GT has 128 little processors ready to work. The GPU basically organizes units of work into blocks, so that those processors can cooperate amongst little units of work. And those blocks are allocated threads, which could also be compared to the smallest unit of work within a block, a single iteration that is to be executed by that block.

The GPU of Nvidia is mostly data-parallel driven. You decide what you want to do and then run a very simple function by a single thread. The activation of a neuron by another neuron is an example of such a very simple function.

There are a lot of hardware-related optimizations that need to be taken into account. Ideally, the architecture of parallel systems may synchronize within blocks, but should never synchronize inbetween blocks themselves to prevent deadlock situations, plus that synchronization is a killer for performance.

The biggest problem for making graphics cards very useful for A.I. is the memory storage capacity *or* the bandwidth between the host memory / disk and the graphical card memory. It's basically 8 Gb/s on a standard PC with PCIe card, whilst internally on the card the bandwidth from it's memory to GPU is orders of magnitude higher, about 60 - 80 Gb/s. So staying on the card for calculations is definitely better for performance. The bandwidth to CPU memory is about 6.4 Gb/s by the way, so it's faster writing to the graphics card than reading/writing to its own memory.

The card contains 256MB of memory. If 4 bytes are used for various information needs like fatigue, threshold and excitation information, then it can store 67 million neurons on one card. It might be possible to use an extremely clever scheme to store connection information, because that information is where most memory is lost on. If you assume 1,000 connections per neuron, that is where 4,000 bytes of information are lost per neuron due to pointer size. Maybe a clever scheme where the neurons are repositioned after each cycle may help to reduce the need for such capacity.

Thus, assuming 4000 bytes for each neuron on average without such optimization or clever scheming, the network can only be 16,750 neurons in size at a maximum.

The design is the most interesting part of this challenge. Coding in CUDA isn't very special, although someone showed that from an initially 2x increase in processing power you can actually attain 20x increase in processing power if you really know how to optimize the CUDA code. So it's worth understanding the architecture of the GPU thoroughly, otherwise you're just hitting walls.

Kudos to CUDA

CUDA is an nvidia technology that allows programmers access to the processing power of GPU's, virtually turning your computer into a super-computer by loading certain complex mathematical processing on the GPU itself. Actually, a GPU is a collection of little processors of generally about 128, which are running functions in parallel. So, a GPU will probably benefit you most if you're processing a large dataset that fits into memory. For many simple business applications, it's not worth the trouble.

The GPU needs loads of different threads to become more efficient. The program thus slices up pieces of the algorithm to be processed and then executes its processors on it, that means the design of your 'program' will need to be adjusted accordingly. In comparison with a single CPU, if you have one processor the number of threads quickly saturate the processing power with thread-context switching overhead, thereby mitigating the results of the parallel processing. That's why it's better to have many little processors than one very large one.

The GPU can process algorithms about 250 times faster than a dual-core 2.4GHz CPU. So in other words, if you have a problem that is suitable for being loaded on the GPU, you can have access to a 250node CPU cluster by buying a commercial-grade graphics card of 250 euro's. That is very good value for money! And there are motherboards available where you can load 4 of those cards into your computer. You'll need a 1500W power supply though, but that is far less than 250 * (300/400)W. And there are guys at the university of Antwerp that have built such a supercomputer for the price of 4,000 Euros.

Here's a link to a tutorial on how to install this on Linux:


One of the obvious applications why I'm looking at this is the ability to use the GPU for applications in neural network, more or less like what has been done here:


The memory bandwidth is in the order of 20-100 GBs/sec, as you can see here:


As I have stated in previous posts, one of the constraints of a working artificial network is the processing power that is available and the amount of memory storage due to the weights and so on that need to be coded. If things are hardware-accelerated, it might open up new avenues for processing. One of the larger problems remaining is probably the design of coding the network into simpler data elements most efficiently. Leaving things to one processor to figure out puts pressure on the frequency that can be developed. But if there are 128 processors instead, it's becoming very interesting indeed.

One of my posts suggested that I don't believe (yet) that there's a kind of output wire where the results of the network end up in, but rather that the state of the network itself, or sub-states of cell assemblies somehow form up consciousness. And the more neurons are available, the more of "consciousness" you'll have.

One should never forget that the idea of something having consciousness or awareness depends on our perception. And perception is always influenced by our own ideas and assumptions. So if we look at someone pretending to be intelligent, we also assume they have the same level of conscience that pertain to those tasks, or more even. But that's not necessarily an accurate assumption.

A probably mistake in a previous post is related to the frequency of the network. Probably, we can't really discuss 'frequency', as there's no clear cycle or sweep in the biological network if we assume that neurons fire whenever they want and the "sweep" that would collect all neurons that are currently activated might just come in somewhat randomly. Computers are generally programmed to define proper "work" cycles against a single cycle that collects the results.

The frequency of brain waves has been defined as between 5 to 40Hz. So the figure of 1000Hz may be way off. If we regard 40Hz, things may become a lot easier. And if we work with simpler processors that work on this frequency and on equal units of work, perhaps it brings things closer to the real human brain.

From the perspective of processing, the GPU would enable a calculation speed of 250 times that of a CPU. And if we lower the frequency from 1000 to 40 Hz, that is another multiplication factor of 25. That brings the new number of neurons that can be processed to 2,500,000,000. This is only a factor of 40 lower than the human brain!

Thus, if we put 40 graphics cards into a computer, we'd probably close to the processing power of the brain itself. That'd be about 8000W versus 12W. Not too bad, because using CPU's that'd be about 2,000,000W. The remaining problem is the storage of network information to be able to use the network. That was set at 2Gb and with some smarter designs and optimizations or reductions in the number of connections this could be brought down to 500M or so, so that a network of 500,000 neurons could run on a single graphics card, but it's not yet enough. A single byte is possibly sufficient though for a neuron and you wouldn't typically use a single byte for processing on a CPU due to byte-alignment optimizations. On the GPU that shouldn't matter too much though.

It's time to delve into the design of the processors and see what the processors of a GPU can deal with efficiently. If they work on bytes rather than words, it makes things a lot easier to fit in memory, thereby increasing the size of the network.

And then it doesn't matter too much. Perhaps SLI can help to distribute tasks across networks and may help to assign specific tasks to specific cards, like visual processing, reasoning and so on. Graphics cards generally use texture maps and so on and those can be extracted from cards and loaded onto others in an effort to share information.