Things are swell, because the predictions are on either side of the line. Basically, you have a good chance of finding a very good solution for this, as it looks like a well-posed problem. The (re)estimations are very regular positive and negative, yielding very good values for a linear regression of your found values, such that you can predict new values which you haven't initially trained for (for which you know the actual value).
Well, the netflix prize is facing an ill-posed problem. This basically means that there are a number of different solutions possible, or there are cases where the differences become so small, that it generates numerical instability when calculating that solution. I've basically seen this happen in some of my implementations when doing the blending.
So, what's done is that a regularization factor is introduced, which is giving penalties to larger factors, thereby conditioning the problem better (reducing the number of solutions). See an example what part of a graph might look like:
Okay, so in this case, we may have a prediction function which is pretty complicated and for a large number of values, which are actually connected to other dimensions, it's looking like this for part of the range and domain. If you'd grab this part of the graph and then try to refit this with different approaches into a better estimation overall, then chances are very high you'd have numerical instability here. Notice how many black points are below the line. The red line (the real values) are way above that. Even though the distance between the point and the line aren't necessary disturbing, when trying to do a regression on this sub-space of the graph, you'd end up with some very bad factors. It may be better to just follow the red line along the bottom a bit, because if you'd regress towards different factors, you'd get the same problem elsewhere along the road.
Now, for a little bit of philosophy, read Occam's razor. This basically states: "The simplest explanation for a phenomenon is most likely the correct explanation".
Other methods for inferring evolutionary relationships use parsimony in a more traditional way. Likelihood methods for phylogeny use parsimony as they do for all likelihood tests, with hypotheses requiring few differing parameters (i.e., numbers of different rates of character change or different frequencies of character state transitions) being treated as null hypotheses relative to hypotheses requiring many differing parameters. Thus, complex hypotheses must predict data much better than do simple hypotheses before researchers reject the simple hypotheses. Recent advances employ information theory, a close cousin of likelihood, which uses Occam's Razor in the same way.Thus, it's better to use models with fewer parameters than it is to use models that are using many, many different parameters. Find those models which describe the problem well, then merge them together.
Back to Tikhonov... If you're also a contender in the netflix prize, check out the following source, which is heavily chopped down in complexity, improved in readability and derived from the source of linalg, which was developed by gravity. This source is not likely usable without the use of their linalg source codes, since it uses PseudoInverse.cpp for example:
Link to main.cpp
The regfactor and regfactorbias also exist in linalg and those are key towards bringing down the rmse. Remember that we potentially have numerical instability, which is why we want to introduce regularization. The regularization conditions the problem, after which better solutions are found. Better solutions provide better fits, reducing error. You'd prefer these regularization terms as close to 0.0f as possible, because when this reg term goes towards 0.00f, you'll go towards the unregularized solution of least squares regression. It'd be great to get there, but the problem itself is ill-posed, so we need some regularization at least.
I did rip out the fancy parts like cross-validation (used to find optimal values) and rmse reporting, such that you can follow the code better. In the pseudo-inverse, it's basically executing the linear least square solver.
The returned array is an array of weights, which are used to multiply with a bias of 1.0f, another weight for algo1, another for algo2... If you like, you could easily add other algorithms and this thing will give you reasonable weights in return.