A Radial Mind: June 2007

Wednesday, June 27, 2007

Global Warming (An Inconvenient Truth)

I watched the (now) famous film of Al Gore yesterday. It's about global warming and the lack of human response against CO2 emissions and our pollution. I recommend watching the film and establish your own opinion. When assessing the topic with other sources, please use credible sources and not believe every article that run-of-the-mill media throws at you. There is a lot of propaganda out there that is not based on scientific results, or attempt to derail scientific results so far that are clear indications. Watch out for that!

http://www.climatecrisis.net/

Some companies believe that complying with gas emissions and other more rigorous standards will put them out of business due to unfair competition. Well, maybe. But that would point out that our social standards, those standards that we are looking for as consumers, need a bit of a push, plus significant consensus-building at a UN-level to establish better environmental standards worldwide. Why bicker about trade equality and so on when in a couple of years time there won't be a planet to bicker on? It's quite gullible!

What if all of this is simply untrue as some magazines try to make us believe? Bruce Schneier wrote a very interesting article on rare risk and overreactions. I'm making this point, because the investment on fighting terrorism has been very high over the past few years due to immediate emotional involvement, while global warming is a much more serious problem, but not felt as immediately needing a resolution. So, in a way it applies to this article as well:

http://www.schneier.com/blog/archives/2007/05/rare_risk_and_o_1.html

So I highly recommend seeing Al Gore's film. I will change my ways and opinions because of it. While you're at it, watch this video too:

http://www.youtube.com/watch?v=5g8cmWZOX8Q

1992, Rio Earth Summit.... so.... what has actually happened since? Back 15 years ago, we had people that were already seriously concerned... Where are the significant changes and modifications that have taken place since then?

Even though we're adults and we think of ourselves as pretty smart in general, we ourselves still seem to act like children when it concerns the custody of this planet. This role of custodian seems to be necessarily replaced by the role of a full-time janitor. So, when do we start to clean up this mess and make sure it doesn't repeat itself?

Thursday, June 21, 2007

GWT 1.4, Tomcat 6 and Comet tutorial

I've been experimenting with Comet applications a little bit and integrated this with GWT 1.4. The tutorial is sort of working, but still needs a bit of work in order to become more stable.

The tutorial is here:

http://gtoonstra.googlepages.com/cometwithgwtandtomcat

Please send me any comments, bug fixes, etc..

Tuesday, June 19, 2007

Contextual search...

This blog is called "radialmind" for a reason. It is based on my perception that the mind is radial and not linear. The whole concept is rather easy to explain. It is easier to consume a book by going through the hierarchy of it, that is, the TOC, the individual chapters, the paragraphs and the lines than it is to read the book from start to finish. This is the same concept that people from "mindmapper" use for example to document your ideas. It's not a linear documentation, it is radial.

You'll notice that when you start to read the book linearly from the start, each time you hit a header or paragraph header, you need to tell your mind to switch context. That is, put the particular following content into a particular context. If the hierarchy of the book is very poor, it will be very difficult to follow and read. This is because, I believe, you need to start back at the core of your context and take a different path to another part of the context where you will fill in the information that you are going to read.

Searching the web has some similar problems. All major search engines produce linear search results. There have been some search engines that do this differently in a sort of "related-words" kind of way, but these have been very poor because it takes a long time to get to your actual context from the point where you are (or where the web page is, rather).

I think it makes sense for words to provide an initial search context and then connect to other contexts of through verbs.

This post ties back to my post about "semantic web search". Rather than focusing on "nouns", entities, we should focus on contextualizing information through verbs and interaction.

So... maybe... as a thought and discussion to develop... nouns provide an initial context to the search that may be dead wrong. But the verbs further contextualize your thoughts into the specific items that you are looking for?

Room for further thought in this particular domain...

Comet and Server push

There is something I'm checking out in my spare time. A technology that exists for a while called "comet". There are other alternatives in server-push, but I prefer first to look into the one that seems a bit more standardized and thought out. I've read other threads in newsgroups that state comet is overkill in many situations.

Server-push isn't actually "push" in the sense that the server initiates the connection. It's more like a delayed client-pull with features on the server that prevent excessive resource drainage.

In this model, the client connects to the server and waits for information. Browsers have timeouts, sometimes servers do, so it will reconnect every x seconds if there is no data sent over the link. So, on error it reconnects. If the connection errors x times, it will stop connecting and display an error about unavailable services.

Comet has in the meantime be implemented in Tomcat as well as Jetty. Jetty has documented in more details how they implement server-side processing and has some statistics about expected resource usage and server loads.

A problem with a server could be the number of connections, but the first thing that runs out are processing threads.

The comet model basically uses asynchronous I/O processing with thread pools. So it's a way to multiplex your client connections to get time allocation by one of the threads in the thread pool. This of course requires the same session and client information to be available. A thread gets assigned to a connection when there is data to be read or when an internal server event (through the application layer actually) writes data to a client channel.

Using this model, the browser will also have to implement some event switch. It receives the data and does something with it. This could be a chat window or popup and so on. In general, a browser can only have two connections to the server. This may be a little bit limiting for data transfers, since the comet connection continuously consumes one connection. This makes image loads and asynchronous data loads take longer and become serialized. One way around this is to use virtual hosting techniques to assign the comet connection to some kind of HTTP event server and the other connections to other web servers that process data requests. A potential problem here is data security with javascript applications that are not always allowed when connecting to a different server than where the script came from.

Well, the usual applications arise from comet technology. chat and so on. But there should be other possibilities as well if we get the network security right (and issues like NAT and so on!). Consider connections from browser to browser without an intervening server. The immediate services are basically event-driven client applications that react on server events. You might connect to a certain site and whatever is going on in the system will notify you of that occurrence. This is a very interesting feature for many sites, since the user will no longer be 100% responsible to pull all that information to him.

Moreover, from a database access point of view, perhaps persistence frameworks can finally become more consistent. If the server knows that you are watching some kind of information and possibly editing it, any other user that edits the information before you should cause an event to be sent to your browser to notify you there is new unseen data to consider. The browser might even retrieve the new data and show it alongside. It should not be too difficult to get this done. There are caching mechanisms for example that can help in detecting which objects are being viewed by whom and when objects get refreshed in the cache through an edit.

Saturday, June 16, 2007

High Precision Event Timer

I've been suffering a bit from a somewhat sluggish machine at times. Sometimes I run VMWare and that is especially annoyingly slow. I already posted the max_cstate thing (powersaving functionality).

Here is another thing I tried. "hpet=disable" on the kernel start line in /boot/grub/menu.lst.

What is HPET?

I am still experimenting with hpet disabled, but so far the Gnome desktop seems more responsive and things slightly faster. I used to get some 'stutters' on the mouse cursor sometimes and more processing times, but things seem to run better actually without the precision timer.

At the same time I noticed that some applications crashed suddenly. Not very frequent, but when load was high.

I'll keep this value for now and see what happens. I can always revert.

FaceBook. The new web?

Web 2.0 and YouTube gave us "user-generated content". It is where we post our videos, audio, photos, text, blogs etc. online for everyone to see. 90% of everything is junk (maybe like this blog :).

The other 10% is funny, interesting, insightful, challenging, or whatever. Some later developments are new ways to play around with that content or host even new things that people didn't think of before. There are a million ways for example that we can interact with one another. Yahoo Pipes is all about processing news and information and delivering it to you through a kind of processing pipe.

FaceBook is slightly different. You can inject content and pictures on a simple level, but you can also host embedded applications integrated with FaceBook. FaceBook is a bit like an existing portal on the web somewhere and then you can request your services to be integrated through this portal and use their API to interact with other services of FaceBook. If you consider "infra-structure", this is what FaceBook provides. You provide immediate business logic that is hopefully new to everyone.

Here are examples of this new kind of thing. The previos link shows the reasoning behind FaceBook, which sounds very interesting.

One of the last lines reads:

the Facebook Platform is primarily for use by either big companies, or venture-backed startups with the funding and capability to handle the slightly insane scale requirements.

Yes. If something is really successful and with the current efficiency of our social networking capabilities, "novelties" travel through our network at an insane speed. Not necessarily faster than general broadcasting, but there's also no filtering by a third party in the case of broadcasters. It could be that a 3rd party through other interests decides to downplay or diminish a certain event, which, when taken as "raw information" might be very important for everyone to know.

These snowball effects can increase load on any server farm in an instant. If you manage to get your company's link on CNN, BBC or Slashdot or any other large site, you'll certainly be sure of a lot of traffic instantly that may last for a day or two. If you consider social networking sites where people might actually return daily, if the services provided there are really good there is an exponential growth pattern and insane growth requirements. Just ordered that big iron? The next day you'll order 10 more. Whoops, your bandwidth is running out. Whoops, the firewall got attacked. One angry user just launched a bot-net attack on your servers.

Infrastructure, infrastructure, infrastructure and lots of investment, instantly. And on the business side you need to keep things interesting, or the network will quickly drain out. What happens when another site comes up that offers similar services and something new that you didn't think off? Is there any sense of "loyalty"? You're not talking to individuals necessarily. It would be interesting to see how individuals behave as part of a social networking site. Do they exhibit more a kind of "flock" behaviour (they go where "the rest" goes?) or are their actions still based on individual decisions?

If we can recognize "flocking behaviour", this may be good when the business grows... but wow, it can be very bad for business if the flock heads the other way.. there is no stopping it!

Here is another interesting post on one of the facebook blogs:

There is a valuable lesson in all of this. There is a ton of money in developing platforms that make it easier for people to express themselves quickly and easily. Following this thread I can imagine the future value of virtual worlds such as second life where users can pick and choose everything down to their clothing, height, etc with the click of a button. Life is a story. Those applications (software as well as physical devices) that make it easier for people to share their story for others to watch unfold will be the ultimate winners when all is said and done.

Thursday, June 14, 2007

Semantic Search

I'm not an expert at websearch, but here goes... Some rants and ramblings on semantic web search.

I watched a program this week with a well-known philosopher. The program was about technology and media mostly, as well as social networking sites and so on.

One question asked during this program was whether semantic websearch would soon be a possibility and when exactly this is likely to be happening. The response was, from the philosopher, that he didn't think semantic search would ever take off and is basically dead in the water. The argument was that the context and meaning of certain words differs from one person to the next.

Although this is true, then maybe semantic search does not really mean searching for things in a general context that is known to be true, but search in specific contexts that the search engine understands belongs to that person, his perceptions and beliefs (formed by life experiences, human contact, environment, country culture, tradition and so on).

One thing that I suspect is not mostly used in web search is the verb. Most searches strictly use nouns, but the context of that noun can differ enormously if it is not accompanied with a verb. The verb would put things into a more specific context to a great amount, but it is not yet in a personalized context.

Steve Yegge blogs about the differences in "verb" and "noun" thinking from the perspective of a programming language. You could say that programming languages are in a way means of communication with a machine, to express ideas and so on.

Anyway, as I said, I have no idea to what amount search engines currently use verbs or contextualize searches to be more specific. It might consider search history as one way of improving hits, but this is not very reliable as our priorities and contexts can change very rapidly.

Regarding implementations of such a search engine... It would be a search engine that exists today with the added difference that user interaction (with user profiling) would add a context indication to particular pages. I don't think it is necessary to actually define all contexts prior to classifications. If you work with neural networks for example, the computer has no idea what it is doing, but the end result of each calculation comes close to what is expected.

It would be a great idea for research. To tie a neural network at both ends for a search engine and see what comes out. The difficulty with this neural network is of course how to heuristically define numbers based on the page... Or rather, how to encode the content of the page in such a way that together with the input of words and the user profile, the end result will be a particular score.

Another approach is to focus more on the verbs and start counting occurrences and take that as a contextual factor.

Perhaps the most limiting thing in search is that the search itself is badly expressed with words? I have a certain contextual idea of things that I am looking for... What is the best way to tell a machine to go looking for that particular context? We could store user's profiles, focus on verbs and all of that, but what about location or approximate location?

Some search engines provide advanced searches and this may be very helpful in this regard. In order to get anywhere, I guess it makes sense to include psychologists and anthropologists in the discussion to understand thought, expression and context better. There may be ways to convert these things in different ways to gain a more meaningful communication dialogue with a machine.

People mostly consider semantic search to be : "teaching the machine". Punishing it when the results are not what you are looking for, rewarding it when it is exactly on the mark. But if the context differs from one person to the next, there is a never-ending cycle of punishment and the machine just gets confused. Some things that are in the same context for everybody will get very high search ranks. But searching should be more effective than that. It should also aim to expose the niches.

Monday, June 04, 2007

Python

Tonight decided to take a look at Python. I know a bit of Perl and did create a couple of scripts in this "language". More formal and stricter languages like Java, C# and Visual Basic seem easier to learn, but compared to Python generate a whole lot more code.

I was pleasantly surprised by the offered modules from Python and how little code it takes to accomplish something. The documentation is quite up to speed and it offers some quite ingenious unit testing capabilities. You just wrap it into the docs. It then becomes a test case plus an example for somebody else to use.

Now, I find Python slightly easier to use in comparison to Perl. It's slightly less cryptic and uses more the concept of "function" than "operator". I was very pleasantly surprised to see it has support for SMTP, WWW, XML, UNIX, threads, concurrency, data types, (easy) iterators, functions, modularization, serialization, profiling, C module extensions, classes, embedding and so on. It's in a way sort of comparable with certain features of Java.

My advice: Give the tutorial a spin once... You'll get to know the capabilities and the rest from there is just reference work!

http://www.python.org/doc/

Low-level == innovation, high-level == entrepreneurship

If you work in a place where innovation is stimulated (like C.E.S.A.R), there are certain ideas that you can truly identify as innovation and others that are new or providing services that do not yet exist, but are not necessarily "innovation".

I'm looking at a couple of ideas all around for new technological "break-throughs" and a good number of these ideas, described as innovation, actually are "new systems" that just do not yet exist in the market. There are big problems here for the execution of these ideas:

Large systems are very risky to build and the effort required to actually complete them is always at least two times higher than the estimated effort. (see Vista, see internal "large" projects).
To make money of these "larger" systems is difficult. You start from a couple of clients maybe (if you succeed in selling it), but this is not the beginning when you start to make money. That is only after x years.
Support, after-care etc. are difficult to arrange. Your "development" doesn't just stop right after development is terminated. You may actually need more people to be able to seel than you did during development. Do you really want to start a company?
Maybe the system does not exist in the market, but is the problem actually more or less resolved in other ways? (is there truly market need?)
By the time the system is finished, the market may be gone.

Thus, you should consider:

If you want to start a business, aim for some niche market, develop a large system and focus on customers and selling product/services. This is not innovation in its entirety. Maybe only one line of code will be.
If you want to innovate, but not start a business as usual, focus on smaller parts at a lower level in the system.

Lower level innovations are for example search algorithms, distribution libraries, audio/video codecs, operating systems, embedded software, graph algorithms or the combination of the above to solve a *very* specific problem in technology. The focus should be on developing *technology* not directly applicable to any market problem. You'll have to focus on better efficiency in most of the cases of innovation.

A Radial Mind

Wednesday, June 27, 2007

Global Warming (An Inconvenient Truth)

Thursday, June 21, 2007

GWT 1.4, Tomcat 6 and Comet tutorial

Tuesday, June 19, 2007

Contextual search...

Comet and Server push

Saturday, June 16, 2007

High Precision Event Timer

FaceBook. The new web?

Thursday, June 14, 2007

Semantic Search

Monday, June 04, 2007

Python

Low-level == innovation, high-level == entrepreneurship

About Me

Followers

Search This Blog

My Blog List

Links

Blog Archive

Wednesday, June 27, 2007

Thursday, June 21, 2007

Tuesday, June 19, 2007

Saturday, June 16, 2007

Thursday, June 14, 2007

Monday, June 04, 2007

About Me

Subscribe To

Followers

Search This Blog

My Blog List

Links

Blog Archive