Monday, March 08, 2010

Inductive vs. imperative programming

For work I'm currently looking at LISP programming. Lisp is one of a family of functional programming languages like Haskell, OCaml, Erlang and Scheme. Lisp itself is the oldest of these and it is based on the manipulation of lists (..of lists...(...of lists)). If you look at Lisp code, you'll quickly get dizzy from all the parentheses on the screen.

One interesting property of Lisp is that it can be a huge stream of code intermixed with data, or data becoming code (messages), or code producing more data (recursion etc.). There have been some libraries that were written for Lisp, which gave it more capabilities including screen widgets and so forth. Standard Lisp however doesn't have unification and backtracking abilities like Prolog does. So, even though Lisp has been used in AI applications, it isn't directly comparable with Prolog for example.

This article is a still relevant consideration about the use of Prolog and why it hasn't taken off so much as it could have. Most programmers work out particular problems procedurally themselves (sometimes giving rise to very costly and sneaky bugs), many times due to incorrect control code statements and so on.

Logic programming is an area where you inductively reach a conclusion. When you state a fact, the interpreter confirms that this fact may be stated (is true). If you state a relation with a variable (a query), the interpreter tries to find all possible valuations that do not contradict with any other logical facts or statements made about these valuations. The result can be either null / nil / zero / no or a number of valuations that the interpreter knows about.

Logic programming is inductive programming. This means that some facts are declared ahead of time and then the program builds up some execution tree and finds out the rest. Imperative programming on the contrary means that the programmer decides the entire flow of what the program does, when it does it and how. This is the most common way of programming (C, PHP, C#, VB, C++, python, Java, etc.). So there is no 'hidden' code in any imperative programming besides the grouping of CPU instructions through assembly language -> 2nd generation languages -> 3rd generation languages and upwards.

A recent area of interest is called constraint programming, which is quite similar to logic programming, except that you don't really derive facts from other facts or prove them right/wrong. There are libraries available for a host of different programming languages, including the imperative programming ones.

Now, without a lot of very intelligent optimizations, these solvers would basically run loops over loops over loops and take a very long time to compute queries. This may have a large impact on the usefulness of inductive programming in real-world scenarios. Also, considering the article above, the success of some language doesn't really depend on its capabilities, but on how easy it is to get started with it and how well/fast someone can verify that some intention is working out correctly as intended. In that light, imperative languages are easier to follow and verify, because the entire execution diagram is right in front of you (except for the funny race conditions, deadlocks, non-reentrant code, library errors, static variables and so on).

Constraints are both problematic to code for correctly, but incredibly helpful for coming up with solutions within your own lifetime :). Some article I read used the example of a number of ships for which a schedule needs to be developed. Any berth can be taken at any particular moment in time. Scheduling problems are the best example for constraint programming, because they can have a large number of constraints associated with them and often have a very large number of substitutes (multiplying the number of loops in total).

The idea is that through constraints, you can decrease the number of items that are under consideration, but this is not the whole story. Sometimes constrains are imposed on the combination of values from domain R with domain S. In such cases, the rule can consistently be firing over some iteration mechanism that dumbly retries other valuations, continuously hitting the constraint. The general thing here is that research is focusing on smarter algorithms to bring down the number of iterations that are required.

Why is this all so relevant? Because planning and reasoning systems with formal declarations still depend on this brute-force mechanism of trying out combinations of different values. For very complex problems, it is very difficult to write imperative software that always computes the correct solutions and results for all cases, given an expanding set of constraints that may be imposed on them. The idea is that, once you have a generic iteration mechanism and a generic method for declaring rules and properties of objects, you only need to extend a so-called knowledge base and the system does the rest.

I think however that it is wise to start thinking about blends of programming languages. Either by expanding the compiler to embed the inductive code into the imperative code or by creating embeddable interpreters that can be loaded with particular objects, exposed inside the interpreter runtime and then fired off with a query to find any objects that apply. But certainly, sometimes by restating the problem or by starting off with an intelligent choice made by a human, one can sometimes just resolve the problem directly with procedural code. For schedules however, I don't see that happening so quickly.

Check out the following libraries and projects:

Python: http://labix.org/python-constraint

Minion: http://minion.sourceforge.net/cases.html

C++: http://research.microsoft.com/apps/pubs/default.aspx?id=64335

Java: http://www.emn.fr/z-info/choco-solver/index.html

2 comments:

Ludwig said...

Quite a useful overview, Gerard! At work, we are trying to solve what is called the resource constrained project scheduling problem (RCPSP) and have looked at Choco as an option. It's early days yet, so we don't quite know if we'll be able to solve it purely using constraint programming.

Another area we are looking into is an inventory management/purchase planning solution, which helps minimize total cost of holding inventory. This is also a well-studied problem in the literature, and any real problem is virtually intractable to imperative programming for precisely the reason you state (too many variables => loops in loops in loops in loops...)

We're still looking at an imperative technique, except using heuristics which will substantially reduce the computational complexity.

Gerard Toonstra said...

Sounds good! These RCSP problems sound really complicated :). Personally, I have doubts whether you'd really want to go down to that very detailed level, because the gain of the extra MW or whatever may not be worth the wait or extra investment in hardware to calculate it (or the system can't respond to the load), but that's someone talking without a clue about the specific domain.

The buggers are probably the allowed totals or the relations between domains that are costly to calculate.

It's incredibly useful if you get the business to commit to some 'sensible' base configuration before any other parts are tried out, or have a means to derive this yourself beforehand. A bit like selecting a CPU first before selecting other compatible computer parts (choice of CPU really brings the dimensionality down a lot).

Anyway, good luck with your project and hope that it all works out in the end!