Wednesday, December 28, 2005

Patty on SourceForge?

I've worked a bit more on the profiling tool for the JDK 1.5.0, which uses JVMTI. The benefits of this profiler are, in comparison with others:
  • You can target the classes you wish to profile for method execution and coverage analysis
  • There is no need to instrument the classes at build time in the ant script
  • Using JVMTI, any Java process can be analysed. There are no sources required.
  • For an existing project to use the profiling tool means modifying the startup-line of the application to include the profiling agent in the JVM. That is all.
  • Because it targets specific classes for analysis, the rest of the classes not targeted run at full speed ( unless code coverage analysis is turned on ).
  • It currently analyses thread contention, method execution times of the targeted classes and code coverage of the targeted classes.
  • It will (soon) analyse the heap based on how much memory is consumed and can be reached by all objects of a particular class. For example, this allows you to analyse how much memory is referenced by an HTTP session or a singleton.
  • It will eventually include a web interface on Tomcat, where some links can be used as a command interface to the profiling agent. This allows you to instrument/deinstrument classes at runtime and request specific information about heap analysis, etc.
Because it was going on so nicely, I've requested a new opensource project on SourceForge, where I hope the project is going to be hosted. Keep you posted on the acceptance and the link.

[edit]

Patty got approved and is released here: patty

Home page, roadmap, documentation, screenshots, etc. will be added soon. Binaries and source have already been posted.

Tuesday, December 27, 2005

Java Virtual Machine Tool Interface

I'm playing around with JVMTI now.

JVMTI allows you to do a couple of useful things:
  • Query all objects of a particular class and iterate over them to see what other objects are reachable from those classes. This also allows you to analyse the amount of heap consumed by certain classes, like singletons or HTTP sessions.
  • Native method entry / exit. A bytecode instrumentation implementation for execution profiling is more efficient though, because it only hits the classes that are instrumented.
  • Redefine classes at runtime. Useful for instrumenting/deinstrumenting classes on demand. This also allows you to run a profiler in production at no overhead.
  • Analyse thread contention on synchronized blocks.
  • Single step through the code to analyse code coverage. This should probably only be done during unit testing.
In my implementation, I'm using 2 computers. One is running the JVM with the JVMTI agent, the other is running a daemon that receives events and records them in memory.

Ideally, I wish to run a GUI on top of the daemon with some command abilities, so that a control socket into the JVM can be used to send commands to the profiling application. This GUI is likely to become some sort of simple Tomcat / Struts application that renders a couple of views in real-time on a webserver, based on the received information so far. That can also be used to print heap memory usage / changes in realtime.

The idea is to do the following:
  • Collect real-time information about code execution, avg/min/max.
  • Collect code coverage statistics per class and method.
  • Instrument classes at runtime to analyse method execution timings.
  • Request memory usage of reachable classes from a particular class type and analyse this over time when the application is running.
  • Request memory usage of all objects per class type and put this into a list / diagram.
  • Send garbage collection requests at runtime for analysing memory leaks and the impact of long-running sessions.
  • Deinstrument classes on demand.
  • Analyse thread contention.
  • ... Some more features in the future.
Sun has researched another technology called JFluid, which has even more features than JVMTI and is just becoming productized for use in NetBeans probably. I don't really need so many features though, just some good in-depth information what goes on inside the JVM.

Monday, December 19, 2005

Source Code Versioning systems

I've been using a couple of versioning systems now. Some I like,
some I highly dislike.

I've used MS VSS, cmm, cvs, ClearCase, svn and many have interesting
features, but none comes close to satisfying all of the needs that
I have as a developer. Some of them are rich in features, but slow
in operation.

This is one of the things I am thinking of. Developing a module
that can use a web-based front-end on Tomcat to organise sources,
backed up by a database. A client-side daemon process connects to
the server-process and retrieves XML-based repository information plus
optionally different streams of codebases.

CVS almost forces you to look at a single view at a time, unless you
manually set it to do something different. CVS keeps the repo files
inside the same directories, which makes compilation and source management
quite difficult unnecessarily.

Most of the tools were written in C/C++ with the largest overhead in
file management. Which shows clearly that with Java, where files can
be managed easier ( no more string-length checking, etc. ), this can
be implemented much easier and quicker. I am thinking of using Derby
as a local database implementation, rather than individual files to
record information, and to use XML for transferring repository information.

What I want is a system that:
  • Shows baselines and streams of the repository in an easy manner, like an overview and I can choose from them.
  • Shows a description of what the baseline means
  • Lets a user download the whole baseline or stream locally without too much fuzz ( setup view, create local repo, etc... )
  • Can import a new project/baseline through a GUI system
  • Automatically updates local files when updates happen remotely.
  • Does not create alien files within the source directory
  • Can let users work with multiple streams locally, without having to redownload or do extra stuff for diffs, etc.
  • Should not use a network-mapped drive.
  • Allows easy baseline management.
  • Enables good overviews for extra meta-information on repos ( no lines, no changes, what changes are, searching for changes, etc ), maybe even up to annotation.
Well, just some good ideas anyway, now up for the design :)