But I digress, because static scheduling problems are rather similar without explicit time constraints on calculation duration. Typically, the values that you consider for CSP solutions are in the numeric, boolean or finite domain (sets). For example, you could have a domain that dictates whether a truck is used or not (boolean), how many hours it would take to perform a transport (integer), or in which timeslot or time of day a particular transport is initiated (set/finite domain).
There are various approaches to CSP. Many academic research papers mostly worry about the algebra and theory notations, mostly the functional and semantic correctness of the algorithm, which means that they mostly worry about constraint violation or not, not necessarily complexity or how long it would take to run the algorithm.
A naive approach starts from the brute-force approach. Just run through every possible combination in your entire domain set. Then verify for each combination you come up with, verify if any constraint is violated and then backtrack and try something else. This approach will take a very long time.
This is where it is useful to start applying as much knowledge as possible you have about your domain. I see the following concerns for resolving a CSP:
- The primary concern is that the solution does not violate hard constraints.
- The secondary concern is that it runs in reasonable time.
- The tertiary concern is that it provides a good enough solution early on, ideally the first solution.
- Some domains have less elements than others and invoke more constraints. Start with the domain that has little elements, but most severely restricts future possibilities. Backtracking can be reduced by the order in which you process domains.
- There is usually one domain, usually the first domain that you consider, which you only need to traverse once. So, if you have a domain {A,B,C,D}, then taking A and finding all combinations where A is valid, then B, then C will find you all the solutions. There is no further need to process {B,A,C,D}, {C,A,B,D}, {D,A,B,C}, etc... So traverse this domain once only in the order that is given.
- Variables in domains may be ordered by (contextual) desirability.
- Some domains have variables that can be used more than once, sometimes even unlimited, although they usually have some relationship with other variables in other domains that eventually reduces this domain as well.
For example, consider a scheduling problem for a university. You have classrooms, professors, timeslots (hours) and students. These domains need to be combined in such a way that:
- Every student can follow all the registered courses
- All courses are taught the correct number of times per week
- A professor is teaching a course, but never teaching two courses at once
- A classroom is not used for two different courses at once
- There may be a preference for the time of day a course is taught
- Ideally, there are no hours between courses for any student on any given day (courses taught are consecutive)
- Some professors may have registered time off on particular days
- Some professors may indicate preferences for time off, but not as a hard constraint
- There may be preferences for using classrooms that are close together for consecutive courses
- Classrooms are typically used for particular departments of study
- Some classrooms are available for exceptions only, but are typically to be avoided (auditorium)
In the above constraints, it is probably cheaper to maintain a large 'status list' for each room, professor, timeslot and student, such that it can be easily verified whether a tuplization, given the current state of the solution, would violate a constraint or not. Better yet, with a simple query, it is now possible to select only those variables from a domain which are still assignable, given the current choice of variables in another domain.
Many CSP solvers use C++ or Java constructs with huge libraries to maintain this state for you and attempt to expose a usable, although probably not complete, interface for you to manipulate. Any CSP engine should attempt to leave the manipulation of state space (those modifications you think you should perform to maintain a correct view of the situation) externally to the engine, but guide the user in the process of performing the tuplization. Or rather, it is rather easy to develop an engine that lets an application engineer insert a couple of domains or functions that provide these for every tuplization step, which also allows for pluggable points to perform changes when one tuple has been selected and of course steps to undo those changes.
Now connect a CSP engine to an SQL database, then you could use the functionality of the database engine (indexing, transaction management, transaction isolation, tables, type checking, flushing to disk, etc.). The SQL then allows you to query at each tuplization step what the remaining elements are for each domain. You only need to order those elements to improve the order of desirability of the selection you are making.
There are no guarantees for finding optimals here, so the ordering and other heuristics are still guesses overall (that a selection will not have severe negative effects on possible future selections, increasing the cost significantly). But you know that the selection process for variables in the domain only provide those variables that actually can be assigned sensibly, given the current partial assignment of variables in the tuple. This is why the order in which you process domains is important, it has consequences for how much knowledge you can apply in the model and the position where it can be applied.
No comments:
Post a Comment