Tuesday, November 30, 2010

Optimization in scheduling problems

The CSP solver talked about before is an effective way to calculate all possible solutions. Some problems however are either underconstrained or just have a very large number of possible solutions. In those cases, exploring the entire solution space may be very expensive; or certain application areas may be subjective to particular time windows in which the solution must have been developed (dynamic scheduling problems). Even in dynamic scheduling, there are differences with regards to the exact time window that is allowed. A scheduling system for shipping berths for example has a different allowed time window than a scheduling system for elevators.

The image shown here is a 'fitness' landscape. It's a bit of an imaginary landscape that displays how effective or costly a solution is. The peaks are desirable solutions, whereas the valleys are much less desirable. In this case, it's a one-dimensional graph, but for other solution spaces it could well be more of a 2D or even 3D space. The number of dimensions for the fitness landscape is actually determined by the number of cost functions you have.

This is where we get to the optimization part. A cost function is a function that determines the cost of a range of tuples that are (part of) a solution. This cost function is what guides the desirability of a certain solution and that is what basically drives the optimization. Ideally, one would like to find the best solution first. In reality, the best solution doesn't come first without a bit of back-tracking. Even worse, there is no guarantee in practical planners that the best solution will be found, only a solution that is 'good enough' given a certain threshold.

If you've paid attention, then I've talked about the use of a 'cost function', which actually only calculates the cost of a particular solution so far. So how does one actually form very good solutions? Because this sounds like it's more like trying out each possibility in turn until we're happy with the solution so far? This is why the cost function must be paired with a heuristic function. The cost function is used to calculate the actual cost/benefit. The heuristic function is used to calculate the expected cost or benefit for using the valuation in a particular domain variable over a slightly further horizon.

This is not exactly all. The cost functions do not have to be commensurate or agree with another. Sometimes a particular tuple choice is good for cost function A, but very bad for function B. So the balance between the importance cost functions must be determined prior to starting the solution process. Most cost functions are pretty linear, but it's also possible that continually preferring solutions that favour function A eventually raise the costs non-linearly, such that further in the solution space function B becomes cheaper than function A. Those are very difficult to solve.

There may also be other situations in which you don't want the ideal solution, but want to allow for a certain flexibility overall. For example, in a schedule for berthing ships where you expect changes to occur, you might want to favour a solution where a minor change in the schedule doesn't flip the entire schedule around, but where the change is constrained to one or two flips between tupelizations (because the more tuples are flipped due to the change, the more costly the change will have become).

Optimization is rather difficult to do effectively and it's an NP-complete problem. So either you arrange the computation power to go through all solutions, or find better methods to achieve proper solutions. There are two ways to do the optimization. You could try to do optimization at the same time as solving a CSP (basically, define the ordering of values for a variable when tuples are picked), or generate some schedule that is more or less good and then try to improve it through local search.

Local search basically means that you have a complete schedule, but pick two tuples of a set and flip valuations around (whilst preferably not violating the constraints) in the hopes of finding a schedule that is slightly better. Local search however, if you look at the image, is a bit tricky, because you may have to force the solution to go through a valley. And how can you tell whether the solution, when going up another hill, is actually going to be a solution with a higher peak than the one you had before? And if you're just going to try this out, how do you prevent the exploration of the entire solution space, which is what you wanted to prevent in the first place?

Basically, optimization is a bit like navigating a landscape in fog. You get only slight clues that there's a hill or a valley and at some point you must decide whether to carry on downhill to eventually find another monsterpeak, or stop searching altogether and call it your best solution. That is very difficult to do without a map. Constructing the map is equal to calculating the entire solution space.

Sunday, November 21, 2010

Sudoku, CSP and playing with sets

Sudoku is solvable as a constraint satisfaction problem. It is described as a problem where you should populate a grid of 9x9 tiles with numbers ranging from 1-9 in such a way, that there are no duplicate numbers in a row, column or local tile-block. Because an empty grid would allow you to derive any combination imaginable, the puzzle starts with a reasonable amount of number already filled in, usually in the order of thirty something. A naive approach in solving the puzzle uses a brute-force like mechanism: - Start from the first empty tile, determine which valuations can be made, continue to the next, etc... A much better and clever way to solve the puzzle is to find places where only one value is possible, fill this in and see if this modifies the state space, such that another single valuation can be made in a different location. I've taken some time to try to solve this puzzle using the CSP solver that is running on the database. The first naive approach took over an hour and still did not return a single solution. The second approach however worked out to a solution under three seconds. It can solve very easy puzzles as shown here, but also the hardest of them. Harder ones just take slightly longer, because there is less information. Here's how my state space is organized and queries operate:
  1. Create a table to denote the grid. Each record denotes a single tile as a column, row combination and also records to which block that tile belongs.
  2. Create another table that records per record the valuation for that single tile, the tuple.
  3. Inspect the puzzle and create tuple records for each already given number. These constrain the valuations you can make from there.
  4. Then launch this query every time there's still some tile left to evaluate:
select sc.col, sc.row, count(*) as count from sudoku_cell sc
left join tuple t on sc.col = t.col and sc.row = t.row, my_values mv
where mv.some_assignment not in ( select t.assignment from tuple t where t.col = sc.col )
and mv.some_assignment not in ( select t.assignment from tuple t where t.row = sc.row )
and mv.some_assignment not in ( select t.assignment from tuple t where t.block = sc.block )
and t.assignment is null group by sc.col, sc.row order by count asc;
The result is an ordering of tiles, described by the number of valuations that can be made for each specific tile, given the current state of the puzzle. Of course, when you have a tile there where one valuation can be made, that brings you one step closer to the solution.
How does Sudoku help us to understand human reasoning better and how computer algorithms work in comparison to human reasoning?
All this work with SQL got me thinking about the method that computers use in reasoning from the perspective of set theory. Reasoning in computers is highly dependent on set theory, especially when you use SQL like I do. Logic in computer programs are basically running sets through algorithms that determine if a member is part or not part of a set (unions, intersections, etc.). Some sets are made to relate to one another through attributes or keys from either member of each set.

When you want to do anything useful in SQL, you need discriminators that determine the scope of your set and then operations from one set on another. Sometimes the relationship is a parent-child or a direct relationship from one member to another, so you can join the elements together to form a superset. Sometimes you wish to eliminate those values that are apparent in another set without attempting to make a full join. A lot of logic in SQL revolves around the use of "WHERE x NOT IN" or "WHERE x IN" kind of statements. So how does this all work in general? No matter what the query, you typically start out in a form like:
SELECT x,y,z,... FROM TABLE abc;
and very likely, the next improvement to this query is an attached WHERE statement, which reduces the returned set to those values that do not violate the conditions. Another mechanism used for querying is to join tables together on some suggested relation and on the basis of that relationship, further limit the returned results (mostly on attributes in the rows of the related table). This allows you to make rather complicated queries like: "only return those cats that have more than 5 kittens" or "query all orders that have line items higher than threshold X, created between now and three months ago".

The reason for mentioning this is that it may help in the future to construct queries. If one set is empty, it cannot have any effect on another set, because they are fully disjoint. So in order to do anything useful, your query must return rows, at least one.

A tougher example is what I had to do for the Zebra puzzle. There are cases where you have a rule that discriminates members of a set. There may also be no rule. So you can't depend on any useful result returned from the rule query. The idea is then to return all rows unless there's a rule that dictates some should not be returned. The eventual query for solving the puzzle turned out as a very complex beast with multiple IN, NOT IN statements. It's probably helpful to describe them a bit here:

SELECT all colors that can be assigned to the current house being considered, where
  1. the color is not already used in a tuple,
  2. the color is not in any rule that specifically binds the color to a different house,
  3. If there's a rule that dictates the house I'm currently considering has a certain color (a set that is composed of one color), the color must be in that set (return 1). If no such rule exists, return all possible colors (so if no rule exists, allow any color as long as this particular rule is concerned),
  4. the color is not in a set, composed by rules that are active on houses to which I'm not a neighbor. So if a rule dictates that a house must be blue and I am not a neighbor to that house, then exclude blue from the set,
  5. the color is not in a set, composed by all colors that are determined by rules for which the rule dictates the neighbor of the currently considered house should have that color. So, if a rule dictates that my neighbor's house should be blue, then the currently considered house cannot be blue.
You can understand that such reasoning causes programs to explode in complexity. The logic interpreters like Prolog hide a lot of these details (backtracking, generation of possible values), but at some point you may want to start optimizing how sets are generated, which is where you get similar complexity. And you can only be effective when you know how exactly the interpreter is handling things for you.

In Sudoku there's a similar issue that is easier to understand, but not often modeled in solvers. For example, given the above specific sudoku puzzle, the computer finds that column 4, row 1 and row 7 both have one possible valuation. When I solved this puzzle by hand however, I was not immediately drawn to those possibilities, because there are lots of openings around those tiles, so instinctively not a place that could lead to a quick inference. What I did notice very quickly is that tile (9,8) can only contain a 7 because of the two columns 7 and 8 both containing a 7 and the rest of the tile block being populated. The constraints that the computer is using however still do allow a valuation of 1, 2, 3, 6 or 7.

This is the kind of knowledge of experts. Reasoning very quickly over many different sets of data, instantly reasoning over intersections, unions and so on. There are some visual combinations of elements that are very quick to analyze for humans, but which pose challenges for computers. Those quick analyses are very interesting to look at, because it's mostly expert knowledge that is not easily converted into some problem statement or constraint:
For a specific tile block, determine the set of possible valuations in that tile block that are still to be made. This is set X. Then count for each valuation the number of tiles it can be legally assigned to in the tile block. If there is a valuation with only one legal tile allocation, this is the placement to make.
In order to understand this, we need to have an idea about the relationship of one tile with another. This relationship is that one tile is placed in the same block as another tile. Limiting domains is a good way to increase performance if the cost of doing that is less than traversing the possibilities in brute-force. Such complicated relationships are not often modeled in reasoners due to their complexity, but as you can now see they are often and quickly used by human reasoners.

It's difficult to combine these results in one particular query or reasoning mechanism. Another way of looking at this is that they are different strategies for resolving the problem. The problem is only concerned with the tuples, the more there are found the better. Trying out different constraint combinations and understanding how at a higher level these change the solution space is a very important concern for such problem solvers. A downside of this is that the state space and complexity goes up significantly, but at the benefit of finding solutions much faster or much better than a standard CSP can guarantee.

Thursday, November 11, 2010

Constraint programming

I've blogged before about the difference between inductive and imperative programming. I've started to look more into constraint satisfaction problem programming, which actually able to provide you with solutions to a surprisingly larger amount of problems than you'd initially think. Typical examples of CSP solvers include the textbook examples like the n-queens puzzle, the region coloring scheme, the "SENDMOREMONEY" problem, the "Zebra" puzzle, eight number puzzle, Sudoku and so on. Real life examples tend to go down the route of static scheduling problems. I explicitly mention "static" here, because dynamic scheduling problems are those with the potential to vary significantly over a very short amount of time. A famous one for dynamic scheduling problems is the choice of the actions for a set of lifts in a busy skyscraper building. There are expectations from lift users to be served within a certain amount of time (requiring the insight in the required capacity at design time of course), but also the strategy of the elevators per individual and as a group needs to be adjusted.

But I digress, because static scheduling problems are rather similar without explicit time constraints on calculation duration. Typically, the values that you consider for CSP solutions are in the numeric, boolean or finite domain (sets). For example, you could have a domain that dictates whether a truck is used or not (boolean), how many hours it would take to perform a transport (integer), or in which timeslot or time of day a particular transport is initiated (set/finite domain).

There are various approaches to CSP. Many academic research papers mostly worry about the algebra and theory notations, mostly the functional and semantic correctness of the algorithm, which means that they mostly worry about constraint violation or not, not necessarily complexity or how long it would take to run the algorithm.

A naive approach starts from the brute-force approach. Just run through every possible combination in your entire domain set. Then verify for each combination you come up with, verify if any constraint is violated and then backtrack and try something else. This approach will take a very long time.

This is where it is useful to start applying as much knowledge as possible you have about your domain. I see the following concerns for resolving a CSP:
  • The primary concern is that the solution does not violate hard constraints.
  • The secondary concern is that it runs in reasonable time.
  • The tertiary concern is that it provides a good enough solution early on, ideally the first solution.
The concerns should be considered in the design and implementation. Actually combining variables from different domains together can be called tuplization, the action of constructing a list of tuples which form the solution. The domains may have the following properties:
  • Some domains have less elements than others and invoke more constraints. Start with the domain that has little elements, but most severely restricts future possibilities. Backtracking can be reduced by the order in which you process domains.
  • There is usually one domain, usually the first domain that you consider, which you only need to traverse once. So, if you have a domain {A,B,C,D}, then taking A and finding all combinations where A is valid, then B, then C will find you all the solutions. There is no further need to process {B,A,C,D}, {C,A,B,D}, {D,A,B,C}, etc... So traverse this domain once only in the order that is given.
  • Variables in domains may be ordered by (contextual) desirability.
  • Some domains have variables that can be used more than once, sometimes even unlimited, although they usually have some relationship with other variables in other domains that eventually reduces this domain as well.
Can you implement a CSP solver with an imperative programming language? Yes, but there are considerations to make on the approach, most specifically the algorithm you wish to use for exploring the search space. I consider that for CSP solvers that apply lots of knowledge benefit more from a depth-first algorithm instead of a breadth-first algorithm. The reason is that I do not think it is very effective to verify the constraints each and every time you wish to perform a selection, but you should maintain the state in memory as you pick the tuples out (and roll these back afterwards).

For example, consider a scheduling problem for a university. You have classrooms, professors, timeslots (hours) and students. These domains need to be combined in such a way that:
  1. Every student can follow all the registered courses
  2. All courses are taught the correct number of times per week
  3. A professor is teaching a course, but never teaching two courses at once
  4. A classroom is not used for two different courses at once
  5. There may be a preference for the time of day a course is taught
  6. Ideally, there are no hours between courses for any student on any given day (courses taught are consecutive)
  7. Some professors may have registered time off on particular days
  8. Some professors may indicate preferences for time off, but not as a hard constraint
  9. There may be preferences for using classrooms that are close together for consecutive courses
  10. Classrooms are typically used for particular departments of study
  11. Some classrooms are available for exceptions only, but are typically to be avoided (auditorium)
So the CSP here has a number of hard constraints (such things that when combined, would invalidate the overall solution), but also a number of preferences (those things that bear some cost when it needs to be used, but can be born or considered if there is no cheaper alternative).

In the above constraints, it is probably cheaper to maintain a large 'status list' for each room, professor, timeslot and student, such that it can be easily verified whether a tuplization, given the current state of the solution, would violate a constraint or not. Better yet, with a simple query, it is now possible to select only those variables from a domain which are still assignable, given the current choice of variables in another domain.

Many CSP solvers use C++ or Java constructs with huge libraries to maintain this state for you and attempt to expose a usable, although probably not complete, interface for you to manipulate. Any CSP engine should attempt to leave the manipulation of state space (those modifications you think you should perform to maintain a correct view of the situation) externally to the engine, but guide the user in the process of performing the tuplization. Or rather, it is rather easy to develop an engine that lets an application engineer insert a couple of domains or functions that provide these for every tuplization step, which also allows for pluggable points to perform changes when one tuple has been selected and of course steps to undo those changes.

Now connect a CSP engine to an SQL database, then you could use the functionality of the database engine (indexing, transaction management, transaction isolation, tables, type checking, flushing to disk, etc.). The SQL then allows you to query at each tuplization step what the remaining elements are for each domain. You only need to order those elements to improve the order of desirability of the selection you are making.

There are no guarantees for finding optimals here, so the ordering and other heuristics are still guesses overall (that a selection will not have severe negative effects on possible future selections, increasing the cost significantly). But you know that the selection process for variables in the domain only provide those variables that actually can be assigned sensibly, given the current partial assignment of variables in the tuple. This is why the order in which you process domains is important, it has consequences for how much knowledge you can apply in the model and the position where it can be applied.

Monday, November 08, 2010

Neural Dynamics

One of my main interests is understanding the dynamics within and between neurons that allows them to tune in to the behaviour of organisms which is in the interest of that organism at a higher level. In previous posts, I discussed a mechanism for these neurons to be excited by neurotransmitters released by a pre-synaptic neuron using, mostly, regulation of ionic fluxes (currents) through ion channels within the membrane.

The problem with these channels is that they are not linear in nature, but exponentially cause the opening of other channels (or inhibition).

Although it sounds like these are great mechanisms as a basis for intelligence, it's only the basis of operations for transmission of a signal from one end to another. Moreover, the channels do not typically behave the same way, because these are open for a different amount of time. So the voltage being measured is the resulting voltage from the sum of channels opening and generating some current. Each channel by itself can be associated with a particular probability for opening or closing and for a certain amount of time. Only at a macro-level when you assume, say, 1.000 channels, will it look like following some binomial or other probability distribution ( which is what also generates the nonlinear behavior at a macro level ).

So the fundamentals of neuroscience are providing a slightly more complete picture of the entire dynamics at work. Where the ion channels provide a mechanism for propagation, summing and integration of signals, it does not yet explain a useful integration and propagation as such. The synapse is still a very mysterious field of research with mostly speculation. It's still very much a mystery whether the famous Hebbian rule of "fire together, wire together" is a reality or more like some probabilistic phenomenon observed most of the time (but where the real dynamics are slightly more complicated).

Interesting discourses related to the power of integration of neurons can be found in the connections of axons from several pre-synaptics towards the dendritic tree of a post-synaptic one. Most neural models connect the axon more or less directly to the neuron cell body, thereby only allowing for the summation and integration of signals at the level of the soma, where all signals come together. The branches in the dendrite tree however may already perform a couple of integrations prior to the voltage potential reaching (or not) the ion channels near the soma, where presumably the action potential is actually truly activated and propagated.

The result here being that some signals, due to their proximity, can lead to very different results when added together at a local level, rather than at a global level. The underlying reason is that an inhibitory signal can entirely cancel out another signal of different magnitude (so the result is not necessarily mathematically correct). A good example of these effects is by looking at formulas in maths. Integration at the level of the soma would equate to a formula like:

Y = A - B + C + D - E

But integration at branching points in the dendritic tree can cancel out any prior summation of positive signals just by applying a signal of similar magnitude on a place higher up (lower up?) in the branch, so you could get:

( C + D - F - G + H + K + L - M - ....... whatever .... ) - E = 0
Y = A - B

The difference here being that the placement of connections in the dendritic tree is important and that there is only a certain range available for excitations or inhibitions. So besides 'just' integration of signals, this suggests that certain more or less 'logical' operations may also exist, which can be very useful for more complex separations of data and signals. It must be said that this tree is not necessarily typically "logical", as some effects have been noted where activations at the remotest end of the dendritic tree cause increases or decreases of the action potential at about 200-300 um into the axon. So the real effects of how neurons connect to one another's tree is still food for thought.

All this is speculation of course. The most useful thing that a neural algorithm needs is a better learning algorithm. Most of these algorithms in classical neural nets are back-propagating ones. This means you measure at the output, calculate the error, then back-propagate this error within the network to follow some kind of gradient. You need the derivatives from the neuron activation functions for this to happen.

If you want this to work in nature, you'd need a second brain to calculate the error and then have this applied inside the network. Hardly plausible. Somewhere is a mechanism at work which has local knowledge only and which adjusts the synaptic efficacy accordingly. This adjustments taking a certain amount of time and which is highly regulated by Long Term Potentiation (the ability of a neuron to become more excitable over a longer period of time). It seems that all in all, there are certain time windows where a neuron starts to synthesize proteins, which eventually modify the behavior of the neuron, change the structure or otherwise influence the synaptic gaps (by making more synaptic vesicles for example).